Re: AS OF queries
On 25.12.2017 06:26, Craig Ringer wrote: On 24 December 2017 at 04:53, konstantin knizhnik mailto:k.knizh...@postgrespro.ru>> wrote: But what if I just forbid to change recent_global_xmin? If it is stalled at FirstNormalTransactionId and never changed? Will it protect all versions from been deleted? That's totally impractical, you'd have unbounded bloat and a nonfunctional system in no time. You'd need a mechanism - akin to what we have with replication slots - to set a threshold for age. Well, there are systems with "never delete" and "append only" semantic. For example, I have participated in SciDB project: database for scientific applications. One of the key requirements for scientific researches is reproducibility. From the database point of view it means that we need to store all raw data and never delete it. If you performed some measurements and made some conclusions based on this results, then everybody should be able to repeat it, even if later you find some errors in input data and made corrections or just add more data. So one of the SciDB requirements was to store all versions. Delete operation should just mark data as been deleted (although later we have to add true delete:) But I agree with you: in most cases more flexible policy of managing versions is needed. I am not sure that it should be similar with logical replication slot. Here semantic is quite clear: we preserve segments of WAL until them are replicated to the subscribers. With time travel situation is less obscure: we may specify some threshold for age - keep data for example for one year. But what if somebody later wants to access older data? At this moment them are already lost... It seems to me that version pinning policy mostly depends on source of the data. If them have "append only" semantic (like as raw scientific data, trading data, measurements from IoT sensors...) then it will be desirable to keep all version forever. If we speak about OLTP tables (like accounts in pgbench), then may be time travel is not the proper mechanism for such data at all. I think that in addition to logged/unlogged tables it will be useful to support historical/non-historical tables. Historical table should support time travel, while non-historical (default) acts like normal table. It is already possible in Postgres to disable autovacuum for particular tables. But unfortunately trick with snapshot (doesn't matter how we setup oldest xmin horizon) affect all tables. There is similar (but not the same) problem with logical replication: assume that we need to replicate only one small table. But we have to pin in WAL all updates of other huge table which is not involved in logical replication at all. > Then there's another issue that logical replication has had to deal > with -- catalog changes. You can't start looking at tuples that have a > different structure than the current catalog unless you can figure out > how to use the logical replication infrastructure to use the old > catalogs. That's a huge problem to bite off and probably can just be > left for another day if you can find a way to reliably detect the > problem and raise an error if the schema is inconsistent. Yes, catalog changes this is another problem of time travel. I do not know any suitable way to handle several different catalog snapshots in one query. I doubt it's practical unless you can extract it to subplans that can be materialized separately. Even then, UDTs, rowtype results, etc... Well, I am really not sure about user's demands to time travel. This is one of the reasons of initiating this discussion in hackers... May be it is not the best place for such discussion, because there are mostly Postgres developers and not users... At least, from experience of few SciDB customers, I can tell that we didn't have problems with schema evolution: mostly schema is simple, static and well defined. There was problems with incorrect import of data (this is why we have to add real delete), with splitting data in chunks (partitioning),... The question is how we should handle such catalog changes if them are happen. Ideally we should not allow to move back beyond this point. Unfortunately it is not so easy to implement. I think you can learn a lot from studying logical decoding here. Working with multimaster and shardman I have to learn a lot about logical replication. It is really powerful and flexible mechanism ... with a lot of limitations and problems: lack of catalog replication, inefficient bulk insert, various race conditions,... But I think that time travel and logical replication are really serving different goals so require different approaches. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The
Re: Huge backend memory footprint
On 22.12.2017 16:21, Andres Freund wrote: Hi, On 2017-12-22 16:07:23 +0300, Konstantin Knizhnik wrote: While my experiments with pthreads version of Postgres I find out that I can not create more than 100k backends even at the system with 4Tb of RAM. I don't think this is a problem we need to address at this point. Would you care to argue otherwise? I am not sure that be able to support 100k sessions is really the most topical goal. Some modern systems has hundreds of cores and to be able to utilize them we need thousands of backends. But not yet 100k backends. If Postgres can efficiently support arbitrary number of backends (actually managing more than 100k network connections is problematic in any case) then we do not need to use connection pooling. And it definitely has some clear advantages: use of session variables, prepared statements,... Also any intermediate proxy can only decrease speed. But certainly there are a lot of problems except this few arrays which cause quadratic increase of backends memory footprint with increasing max_connections. Postgres backens are heavy wieght: each of them maintains is own private relation and catalog caches, prepared statement cache,... I expect that for real large databases them will consume most of the backend's memory. So may be the right apporach is to replace this private caches with shared caches. For now we've so much more relevant limitations, e.g. the O(N) complexity of computing a snapshot, that I think addressing these concerns is a waste of time at this point. Well, we have CSN patch which now is stable enough and shows reaults comparable with mainstream Postgres. So may be snapshot calculation is not the most critical problem... Greetings, Andres Freund -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Proposal: Local indexes for partitioned table
On 2017/12/25 13:52, Amit Langote wrote: > Hi Alvaro, > > On 2017/12/23 0:10, Alvaro Herrera wrote: >> I believe these are all fixed by the attached delta patch. > > @@ -1676,7 +1694,12 @@ psql_completion(const char *text, int start, int end) > "UNION SELECT 'ALL IN TABLESPACE'"); > /* ALTER INDEX */ > else if (Matches3("ALTER", "INDEX", MatchAny)) > -COMPLETE_WITH_LIST5("ALTER COLUMN", "OWNER TO", "RENAME TO", > "SET", "RESET"); > +COMPLETE_WITH_LIST7("ALTER COLUMN", "OWNER TO", "RENAME TO", "SET", > +"RESET", "ATTACH PARTITION"); > > This should be COMPLETE_WITH_LIST6(). I noticed a few things I thought I'd report. I was using the latest patch (v8 + the delta patch). 1. The following crashes because of an Assert in ATExecCmd(): ALTER TABLE an_index DETACH PARTITION ... It seems ATPrepCmd() should not pass ATT_INDEX to ATSimplePermissions() for the ATDetachPartition sub-command type. 2. Something strange about how dependencies are managed: create table p (a char) partition by list (a); create table pa partition of p for values in ('a');; create table pb partition of p for values in ('b'); create index on p (a); \d pa Table "public.pa" Column | Type | Collation | Nullable | Default |---+--+---+--+- a | character(1) | | | Partition of: p FOR VALUES IN ('a') Indexes: "pa_a_idx" btree (a) drop table pa; ERROR: cannot drop table pa because other objects depend on it DETAIL: index p_a_idx depends on table pa HINT: Use DROP ... CASCADE to drop the dependent objects too. set client_min_messages to debug2; drop table pa; DEBUG: drop auto-cascades to index pa_a_idx DEBUG: drop auto-cascades to index pb_a_idx DEBUG: drop auto-cascades to type pa DEBUG: drop auto-cascades to type pa[] ERROR: cannot drop table pa because other objects depend on it DETAIL: index p_a_idx depends on table pa HINT: Use DROP ... CASCADE to drop the dependent objects too. Thanks, Amit
Re: Unique indexes & constraints on partitioned tables
Hi Alvaro, On 2017/12/23 6:29, Alvaro Herrera wrote: > Hello, > > I'm giving this patch its own thread for mental sanity, but this is > essentially what already posted in [1], plus some doc fixes. This patch > depends on the main "local partitioned indexes" in that thread, last > version of which is at [2]. Thanks for working on this. Have you considered what happens when ON CONFLICT code tries to depend on such an index (a partitioned unique index)? It seems we'll need some new code in the executor for the same. I tried the following after applying your patch: create table p (a char) partition by list (a); create table pa partition of p for values in ('a');; create table pb partition of p for values in ('b'); create unique index on p (a); insert into p values ('a', 1); INSERT 0 1 insert into p values ('a', 1); ERROR: duplicate key value violates unique constraint "pa_a_idx" DETAIL: Key (a)=(a) already exists. insert into p values ('a', 1) on conflict do nothing; INSERT 0 0 Fine so far... but insert into p values ('a', 1) on conflict (a) do nothing; ERROR: unexpected failure to find arbiter index or insert into p values ('a', 1) on conflict (a) do update set b = excluded.b; ERROR: unexpected failure to find arbiter index I mentioned this case at [1] and had a WIP patch to address that. Please find it attached here. It is to be applied on top of both of your patches. Thanks, Amit [1] https://www.postgresql.org/message-id/a26f7823-6c7d-3f41-c5fb-7d50dd2f4848%40lab.ntt.co.jp >From 490a8302b71cbe78e7028879b0dffa2c43d03f33 Mon Sep 17 00:00:00 2001 From: amit Date: Mon, 25 Dec 2017 18:45:33 +0900 Subject: [PATCH 3/3] Teach executor to handle ON CONFLICT (key) on partitioned tables --- src/backend/executor/execIndexing.c | 14 +--- src/backend/executor/nodeModifyTable.c| 32 +++ src/test/regress/expected/insert_conflict.out | 27 +- src/test/regress/sql/insert_conflict.sql | 18 ++- 4 files changed, 77 insertions(+), 14 deletions(-) diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c index 89e189fa71..f76a2ede76 100644 --- a/src/backend/executor/execIndexing.c +++ b/src/backend/executor/execIndexing.c @@ -531,10 +531,18 @@ ExecCheckIndexConstraints(TupleTableSlot *slot, if (!indexInfo->ii_ReadyForInserts) continue; - /* When specific arbiter indexes requested, only examine them */ + /* +* When specific arbiter indexes requested, only examine them. If +* this is a partition (after a tuple is routed to it from the +* parent into which the original tuple has been inserted), we must +* check the parent index id, instead of our own id, because that's +* the one that appears in the arbiterIndexes list. +*/ if (arbiterIndexes != NIL && - !list_member_oid(arbiterIndexes, - indexRelation->rd_index->indexrelid)) + !(list_member_oid(arbiterIndexes, + indexRelation->rd_index->indexrelid) || + list_member_oid(arbiterIndexes, + indexRelation->rd_index->indparentidx))) continue; if (!indexRelation->rd_index->indimmediate) diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index afb83ed3ae..94f819006a 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -2186,6 +2186,38 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) resultRelInfo->ri_onConflictSetWhere = qualexpr; } + + /* Build the above information for each leaf partition rel */ + for (i = 0; i < mtstate->mt_num_partitions; i++) + { + Relationpartrel; + List *leaf_oc_set; + + resultRelInfo = mtstate->mt_partitions[i]; + partrel = resultRelInfo->ri_RelationDesc; + + /* varno = node->nominalRelation */ + leaf_oc_set = map_partition_varattnos(node->onConflictSet, + node->nominalRelation, + partrel, rel, NULL); + resultRelInfo->ri_onConflictSetProj = + ExecBuildProjectionInfo(leaf_oc_set, econtext, + mtstate->mt_conflproj, &mtstate->ps,
Re: General purpose hashing func in pgbench
Hello Fabien, 24/12/2017 11:12, Fabien COELHO пишет: > > Yep. The ugliness is significantly linked to the choice of name. With > MM2_MUL and MM2_ROT ISTM that it is more readable: > >> k *= MM2_MUL; >> k ^= k >> MM2_ROT; >> k *= MM2_MUL; >> result ^= k; >> result *= MM2_MUL; Ok, will do. > >> [...] So I'd better leave it the way it is. Actually I was thinking >> to do the same to fnv1a too : ) > > I think that the implementation style should be homogeneous, so I'd > suggest at least to stick to one style. > > I noticed from the source of all human knowledege (aka Wikipedia:-) > that there seems to be a murmur3 successor. Have you considered it? > One good reason to skip it would be that the implementation is long > and complex. I'm not sure about a 8-byte input simplified version. Murmur2 naturally supports 8-byte data. Murmur3 has 32- and 128-bit versions. So to handle int64 I could 1) split input value into two halfs and combine somehow the results of 32 bit version or 2) use 128-bit version and discard higher bytes. Btw, postgres core already has a 32bit murmur3 implementation, but it only uses the finalization part of algorithm (see murmurhash32). As my colleague Yura Sokolov told me in a recent conversation it alone provides pretty good randomization. I haven't tried it yet though. > > Just a question: Have you looked at SipHash24? > > https://en.wikipedia.org/wiki/SipHash > > The interesting point is that it can use a key and seems somehow > cryptographically secure, for a similar cost. However the how to > decide for/control the key is unclear. > Not yet. As I can understand from the wiki its main feature is to prevent attacks with crafted input data. How can it be useful in benchmarking? Unless it shows superior performance and randomization. -- Ildar Musin Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: General purpose hashing func in pgbench
Hello, I noticed from the source of all human knowledege (aka Wikipedia:-) that there seems to be a murmur3 successor. Have you considered it? One good reason to skip it would be that the implementation is long and complex. I'm not sure about a 8-byte input simplified version. Murmur2 naturally supports 8-byte data. Murmur3 has 32- and 128-bit versions. Ok. So it would make sense to keep the 64 bit murmur2 version. Pgbench ints are 64 bits, seems logical to keep them that way, so 32 bits versions do not look too interesting. Just a question: Have you looked at SipHash24? Not yet. As I can understand from the wiki its main feature is to prevent attacks with crafted input data. How can it be useful in benchmarking? No and yes:-) The attack prevention is pretty useless in the context. However, the key can be used if controlled so that different values do not have the same randomization in different part of the script, so as to avoid using the same patterns for different things if not desirable. For the pgbench pseudo-random permutation I'm looking at, the key can be specified to control whether the same pattern or not should be used. -- Fabien.
Re: [HACKERS] Flexible configuration for full-text search
Hi Arthur, Thank you for the review. On Thu, 21 Dec 2017 17:46:42 +0300 Arthur Zakirov wrote: > I noticed that there are typos in the documentation. And I think it > is necessary to keep information about previous sintax. The syntax > will be supported anyway. For example, information about TSL_FILTER > was removed. And it will be good to add more examples of the new > syntax. In the current version of the patch, configurations written in old syntax are rewritten into the same configuration in the new syntax. Since new syntax doesn't support a TSL_FILTER, it was removed from the documentation. It is possible to store configurations written in old syntax in a special way and simulate a TSL_FILTER behavior for them. But it will lead to maintenance of two different behavior of the FTS depends on a version of the syntax used during configuration. Do you think we should keep both behaviors? > The result of ts_debug() function was changed. Is it possible to keep > the old ts_debug() result? To be specific, 'dictionaries' column is > text now, not array, as I understood. It will be good to keep the > result for the sake of backward compatibility. Columns' 'dictionaries' and 'dictionary' type were changed to text because after the patch the configuration may be not a plain array of dictionaries but a complex expression tree. In the column 'dictionaries' the result is textual representation of configuration and it is the same as a result of \dF+ description of the configuration. I decide to rename newly added column to 'configuration' and keep column 'dictionaries' with an array of all dictionaries used in configuration (no matter how). Also, I fixed a bug in 'command' output of the ts_debug in some cases. Additionally, I added some examples to documentation regarding multilingual search and combination of exact and linguistic-aware search and fixed typos.diff --git a/doc/src/sgml/ref/alter_tsconfig.sgml b/doc/src/sgml/ref/alter_tsconfig.sgml index ebe0b94b27..a1f483e10b 100644 --- a/doc/src/sgml/ref/alter_tsconfig.sgml +++ b/doc/src/sgml/ref/alter_tsconfig.sgml @@ -21,8 +21,12 @@ PostgreSQL documentation +ALTER TEXT SEARCH CONFIGURATION name +ADD MAPPING FOR token_type [, ... ] WITH config ALTER TEXT SEARCH CONFIGURATION name ADD MAPPING FOR token_type [, ... ] WITH dictionary_name [, ... ] +ALTER TEXT SEARCH CONFIGURATION name +ALTER MAPPING FOR token_type [, ... ] WITH config ALTER TEXT SEARCH CONFIGURATION name ALTER MAPPING FOR token_type [, ... ] WITH dictionary_name [, ... ] ALTER TEXT SEARCH CONFIGURATION name @@ -88,6 +92,17 @@ ALTER TEXT SEARCH CONFIGURATION name SET SCHEMA + +config + + + The dictionaries tree expression. The dictionary expression + is a triple of condition/command/else that define way to process + the text. The ELSE part is optional. + + + + old_dictionary @@ -133,7 +148,7 @@ ALTER TEXT SEARCH CONFIGURATION name SET SCHEMA - + The ADD MAPPING FOR form installs a list of dictionaries to be @@ -154,6 +169,53 @@ ALTER TEXT SEARCH CONFIGURATION name SET SCHEMA + + Dictionaries Map Config + + + Format + +Formally config is one of: + + +* dictionary_name + +* config { UNION | INTERSECT | EXCEPT | MAP } config + +* CASE config +WHEN [ NO ] MATCH THEN { KEEP | config } +[ ELSE config ] + END + + + + + Description + +config can be used +in three different formats. The most simple format is name of dictionary to +use for tokens processing. + + +In order to use more than one dictionary +simultaneously user should interconnect dictionaries by operators. Operators +UNION, EXCEPT and +INTERSECT have same meaning as in operations on sets. +Special operator MAP gets output of left subexpression +and uses it as an input to right subexpression. + + +The third format of config is similar to +CASE/WHEN/THEN/ELSE structure. It's consists of three +replaceable parts. First one is configuration which is used to construct lexemes set +for matching condition. If the condition is triggered, the command is executed. +Use command KEEP to avoid repeating of the same +configuration in condition and command part. However, command may differ from +the condition. The ELSE branch is executed otherwise. + + + + Examples @@ -167,6 +229,34 @@ ALTER TEXT SEARCH CONFIGURATION name SET SCHEMA + + + Next example shows how to analyse documents in both English and German languages. + english_hunspell and german_hunspell + return result only if a word is recognized. Otherwise, stemmer dictionaries + are used to process a token. + + + +ALTER TEXT SEARCH CONFIGURATION my_config + ALTER MAPPING FOR asciiword, word WITH + CASE english_hunspell WHEN MATCH THEN KEEP ELSE english_stem END +UNION + CASE german_hunspell WHEN MATCH THEN
Re: genomic locus
I think I can wrangle this type into GiST just by tweaking consistent(), union(), and picksplit(), if I manage to express my needs in C without breaking too many things. My first attempt segfaulted. Actually, consistent() can determ actual query data type by strategy number. See examples in ltree, intarray If all goes to plan, I will end up with an index tree partitioned by contig at the top level and geometrically down from there. That will be as close as I can get to an array of config-specific indices, without having to store data in separate tables. What do you think of that? I have some doubt that you can distinguish root page, but it's possible to distinguish leaf pages, intarray and tsearch do that. Reading your plan, I found an idea for GIN: key for GIN is a pair of (contig, one genome position). So, any search for interset operation with be actually a range search from (contig, start) to (contig, end) I have a low-level technical question. Because I can’t anticipate the maximum length of contig names (and do not want to waste space), I have made the new locus type a varlena, like this: #include "utils/varlena.h" typedef struct LOCUS { int32 l_len_; /* varlena header (do not touch directly!) */ int32 start; int32 end; char contig[FLEXIBLE_ARRAY_MEMBER]; } LOCUS; #define LOCUS_SIZE(str) (offsetof(LOCUS, contig) + sizeof(str)) sizeof? or strlen ? That flexible array member messes with me every time I need to copy it while deriving a new locus object from an existing one (or from a pair). What I ended up doing is this: LOCUS *l = PG_GETARG_LOCUS_P(0); LOCUS *new_locus; char *contig; int size; new_locus = (LOCUS *) palloc0(sizeof(*new_locus)); contig = pstrdup(l->contig); // need this to determine the length of contig l->contig should be null-terminated for pstrdup, but if so, you don't need to pstrdup() it - you could use l->contig directly below. BTW, LOCUS_SIZE should add 1 byte for '\0' character in this case. name at runtime size = LOCUS_SIZE(contig); SET_VARSIZE(new_locus, size); strcpy(new_locus->contig, contig); Is there a more direct way to clone a varlena structure (possibly assigning an differently-sized contig to it)? One that is also memory-safe? Store length of contig in LOCUS struct. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/
Re: General purpose hashing func in pgbench
25/12/2017 17:12, Fabien COELHO пишет: > > However, the key can be used if controlled so that different values do > not have the same randomization in different part of the script, so as > to avoid using the same patterns for different things if not desirable. Original murmur algorithm accepts seed as a parameter, which can be used for this purpose. I used value itself as a seed in the patch, but I can turn it into a function argument. > > For the pgbench pseudo-random permutation I'm looking at, the key can > be specified to control whether the same pattern or not should be used. > Just curious, which algorithm are you intended to choose? -- Ildar Musin Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: BUGFIX: standby disconnect can corrupt serialized reorder buffers
Hi, thanks for writing the patch. On 05/12/17 06:58, Craig Ringer wrote: > Hi all > > [...] >> The cause appears to be that walsender.c's ProcessRepliesIfAny writes a > LOG for unexpected EOF then calls proc_exit(0). But serialized txn > cleanup is done by > ReorderBufferRestoreCleanup, as called by ReorderBufferCleanupTXN, which > is invoked from the PG_CATCH() in ReorderBufferCommit() and on various > normal exits. It won't get called if we proc_exit() without an ERROR, so > we leave stale data lying around. > > It's not a problem on crash restart because StartupReorderBuffer already > does the required delete. > > ReorderBufferSerializeTXN, which spills the txns to disk, doesn't appear > to have any guard to ensure that the segment files don't already exist > when it goes to serialize a snapshot. Adding one there would probably be > expensive; we don't know the last lsn of the txn yet, so to be really > safe we'd have to do a directory listing and scan for any txn-$OURXID-* > entries. > > So to fix, I suggest that we should do a > slot-specific StartupReorderBuffer-style deletion when we start a new > decoding session on a slot, per attached patch. > > It might be nice to also add a hook on proc exit, so we don't have stale > buffers lying around until next decoding session, but I didn't want to > add new global state to reorderbuffer.c so I've left that for now. Hmm, can't we simply call the new cleanup function in ReplicationSlotRelease()? That's called during process exit and we know there about slot so no extra global variables are needed. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: compress method for spgist - 2
Now, this patch is ready for committer from my point of view. Thank you, pushed -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/
Re: [HACKERS] Replication status in logical replication
On 21/11/17 22:06, Masahiko Sawada wrote: > > After investigation, I found out that my previous patch was wrong > direction. I should have changed XLogSendLogical() so that we can > check the read LSN and set WalSndCaughtUp = true even after read a > record without wait. Attached updated patch passed 'make check-world'. > Please review it. > Hi, This version looks good to me and seems to be in line with what we do in physical replication. Marking as ready for committer. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: General purpose hashing func in pgbench
Hello, However, the key can be used if controlled so that different values do not have the same randomization in different part of the script, so as to avoid using the same patterns for different things if not desirable. Original murmur algorithm accepts seed as a parameter, which can be used for this purpose. I used value itself as a seed in the patch, but I can turn it into a function argument. Hmmm. Possibly. However it needs a default, something like x = hash(v[, key]) with key some pseudo-random value? For the pgbench pseudo-random permutation I'm looking at, the key can be specified to control whether the same pattern or not should be used. Just curious, which algorithm are you intended to choose? I have found nothing convincing, so I'm inventing. Most "permutation" functions are really cryptographic cyphers which are quite expensive, and require powers of two, which is not what is needed. ISTM that there are some constructs to deal with arbitrary sizes based on cryptographic functions, but that would make it too expensive for the purpose. Currently I use the key as a seed for a cheap a prng, two rounds of overlapping (to deal with non power of two) xor (from the prng output) and prime-multiply-modulo to scatter the value. See attached work in progress. If you have a pointer for a good and cheap generic (any size) keyed permutation, feel free to share it. -- Fabien.// $Id: test_hash_2.c 1049 2017-12-24 17:52:23Z fabien $ /* * Pseudo-Random Permutation (PRP): the usual approach is a 4 stage * (Luby-Rackoff) Feistel network based on some "good" hash function. * However this construction only works for even powers of 2. * Unbalance is possible as well, but it must still be a power of 2 size. * TOO BAD. */ #include #include #include #include #include //#include #include // PRI?64 #include // pow // #define DEBUG #define SEED 999484257552941047LL typedef enum { false, true } bool; /* overall parameters */ #define N_MASKS 6 #define PRP_ROUNDS 8 // NOT GOOD WITH POWER OF 2 #define ROUNDS 2 static int64_t primes[PRP_ROUNDS] = { // a few 25-27 bits primes 33303727LL, 49979693LL, 67867979LL, 86028157LL, 104395303LL, 122949829LL, 141650963LL, 160481219LL }; /* compute largest hierarchical mask to hold 0 .. size-2 * YES, size-2: If size is a power of 2, we want the half mask */ static int64_t compute_mask(const int64_t size) { // 8-bit shortcut up to 2**32 int64_t m = size > 0xLL ? 0xLL: size > 0xffLL ? 0xffLL: size > 0xLL ? 0xLL: size > 0xffLL ? 0xffLL: 0LL; // remaining bits do { m = (m << 1) | 1LL; } while (size-1 > m); m >>= 1; return m; } /* Donald Knuth linear congruential generator */ #define DK_MUL 6364136223846793005LL #define DK_INC 1442695040888963407LL /* do not use all small bits */ #define PRP_SHIFT 13 /* * Apply pseudo-random permutation with a seed. * * The key idea behind "pseudo-random"... is that it is not random at all, * kind of an oxymoron. So it really means "steered-up permutation", whatever * good it may bring. * * Result in [0, size) is a permutation for inputs in the same set, * at least if size is under 112.9E9 (otherwise it is unclear because * of int64_t overflow in the multiply-modulo phase). * * The result are not fully convincing on very small sizes, but this * is not the typical use case. * * Function performance is reasonable. * * THIS FUNCTION IS ** NOT ** CRYPTOGRAPHICALLY SECURE AT ALL. * PLEASE DO NOT USE FOR SUCH PURPOSE. */ static int64_t pseudorandom_perm(const int64_t data, const int64_t size, const int64_t iseed) { assert(size > 0); uint64_t seed = iseed; uint64_t v = (uint64_t) data % size; assert(0 <= v && v < size); int64_t mask = compute_mask(size); // for small seeds: // seed = seed * DK_MUL + DK_INC; // fprintf(stderr, "mask=%ld (0x%lx) size=%ld\n", // mask, (uint64_t) mask, size); assert(mask < size && size <= 2 * mask + 2); for (int i = 0, p = 0; i < ROUNDS; i++, p++) { // first "half" whitening if (v <= mask) v ^= (seed >> PRP_SHIFT) & mask; seed = seed * DK_MUL + DK_INC; // assert(0 <= v && v < size); // and second overlapping "half" whitening and reversing if (size-v-1 <= mask) v = size - mask + ((size - v - 1) ^ ((seed >> PRP_SHIFT) & mask)) - 1; seed = seed * DK_MUL + DK_INC; // assert(0 <= v && v < size); // given the prime sizes, at most 2 primes are skipped for a given size while (size % primes[p] == 0) p++; // scatter v = (primes[p] * v + (seed >> PRP_SHIFT)) % size; seed = seed * DK_MUL + DK_INC; // assert(0 <= v && v < size); // bit rotate? bit permute? others? } /* // first "half" whitening if (v <= mask) v ^= (seed >> PRP_SHIFT) & mask; seed = seed * DK_MUL + DK_INC; // assert(0 <= v && v < size); // and second overlapping "half" whitening and reve
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On 12/24/2017 10:00 AM, Erik Rijkers wrote: > > logical replication of 2 instances is OK but 3 and up fail with: > > TRAP: FailedAssertion("!(last_lsn < change->lsn)", File: > "reorderbuffer.c", Line: 1773) > > I can cobble up a script but I hope you have enough from the assertion > to see what's going wrong... The assertion says that the iterator produces changes in order that does not correlate with LSN. But I have a hard time understanding how that could happen, particularly because according to the line number this happens in ReorderBufferCommit(), i.e. the current (non-streaming) case. So instructions to reproduce the issue would be very helpful. >>> >>> Using: >>> >>> 0001-Introduce-logical_work_mem-to-limit-ReorderBuffer-v2.patch >>> 0002-Issue-XLOG_XACT_ASSIGNMENT-with-wal_level-logical-v2.patch >>> 0003-Issue-individual-invalidations-with-wal_level-log-v2.patch >>> 0004-Extend-the-output-plugin-API-with-stream-methods-v2.patch >>> 0005-Implement-streaming-mode-in-ReorderBuffer-v2.patch >>> 0006-Add-support-for-streaming-to-built-in-replication-v2.patch >>> >>> As you expected the problem is the same with these new patches. >>> >>> I have now tested more, and seen that it not always fails. I guess that >>> it here fails 3 times out of 4. But the laptop I'm using at the moment >>> is old and slow -- it may well be a factor as we've seen before [1]. >>> >>> Attached is the bash that I put together. I tested with >>> NUM_INSTANCES=2, which yields success, and NUM_INSTANCES=3, which fails >>> often. This same program run with HEAD never seems to fail (I tried a >>> few dozen times). >>> >> >> Thanks. Unfortunately I still can't reproduce the issue. I even tried >> running it in valgrind, to see if there are some memory access issues >> (which should also slow it down significantly). > > One wonders again if 2ndquadrant shouldn't invest in some old hardware ;) > Well, I've done tests on various machines, including some really slow ones, and I still haven't managed to reproduce the failures using your script. So I don't think that would really help. But I have reproduced it by using a custom stress test script. Turns out the asserts are overly strict - instead of Assert(prev_lsn < current_lsn); it should have been Assert(prev_lsn <= current_lsn); because some XLOG records may contain multiple rows (e.g. MULTI_INSERT). The attached v3 fixes this issue, and also a couple of other thinkos: 1) The AssertChangeLsnOrder assert check was somewhat broken. 2) We've been sending aborts for all subtransactions, even those not yet streamed. So downstream got confused and fell over because of an assert. 3) The streamed transactions were written to /tmp, using filenames using subscription OID and XID of the toplevel transaction. That's fine, as long as there's just a single replica running - if there are more, the filenames will clash, causing really strange failures. So move the files to base/pgsql_tmp where regular temporary files are written. I'm not claiming this is perfect, perhaps we need to invent another location. FWIW I believe the relation sync cache is somewhat broken by the streaming. I thought resetting it would be good enough, but it's more complicated (and trickier) than that. I'm aware of it, and I'll look into that next - but probably not before 2018. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 0001-Introduce-logical_work_mem-to-limit-ReorderBuffer-v3.patch.gz Description: application/gzip 0002-Issue-XLOG_XACT_ASSIGNMENT-with-wal_level-logical-v3.patch.gz Description: application/gzip 0003-Issue-individual-invalidations-with-wal_level-log-v3.patch.gz Description: application/gzip 0004-Extend-the-output-plugin-API-with-stream-methods-v3.patch.gz Description: application/gzip 0005-Implement-streaming-mode-in-ReorderBuffer-v3.patch.gz Description: application/gzip 0006-Add-support-for-streaming-to-built-in-replication-v3.patch.gz Description: application/gzip
Re: Unique indexes & constraints on partitioned tables
Amit Langote wrote: > Have you considered what happens when ON CONFLICT code tries to depend on > such an index (a partitioned unique index)? Not yet, but it was on my list of things to fix. Thanks for working on it -- I'll be reviewing this soon. > +create table parted_conflict_test_2 partition of parted_conflict_test for > values in (2); > +create unique index on parted_conflict_test (a); > +insert into parted_conflict_test values (1, 'a') on conflict (a) do nothing; > +insert into parted_conflict_test values (1, 'b') on conflict (a) do update > set b = excluded.b; > +insert into parted_conflict_test values (1, 'b') on conflict (a) do update > set b = excluded.b where parted_conflict_test.b = 'a'; > +select * from parted_conflict_test; > + a | b > +---+--- > + 1 | b > +(1 row) sweet. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
That indeed fixed the problem: running that same pgbench test, I see no crashes anymore (on any of 3 different machines, and with several pgbench parameters). Thank you, Erik Rijkers
How to Works with Centos
Hello, The company I’m working for develops a product which uses Centos 6/7 (different versions of the product) and also uses Vertica and PostgreSQL. During the course of the development of the latest version of our product, we ran into problems that lead us to contact Vertica’s R&D team, which in turn suggested we upgrade our database to the latest version as the current one is not supported. When the product was first developed we used PostgreSQL 9.2, and after upgrading to Centos 7 we considered upgrading our PostgreSQL version as well. Trying to learn from the aforementioned incident, we floated idea of upgrading our PostgreSQL version to the latest one in order to get the performance improvement, latest features, bug fixes and *support*. Upon doing so, our DevOps team in response insisted (and still insists) that we keep using version 9.2 as it is part of the Centos 7 distribution, and they believe that version to be “best practice”, even though PostgreSQL 9.2 is *no longer supported*. My question is: Is working with the default distribution’s version (9.2) really the “best practice”, even though it is no longer supported? I have looked for documentation regarding the matter but could not find any answers. I would greatly appreciate your assistance with the matter. Thanks in advance, Benjamin
Re: How to Works with Centos
On 25 December 2017 at 09:39, Benyamin Guedj wrote: > > Upon doing so, our DevOps team in response insisted (and still insists) that > we keep using version 9.2 as it is part of the Centos 7 distribution, and > they believe that version to be “best practice”, even though PostgreSQL 9.2 > is no longer supported. > > My question is: > > Is working with the default distribution’s version (9.2) really the “best > practice”, even though it is no longer supported? > clearly no, our versioning page says (https://www.postgresql.org/support/versioning/): """ The PostgreSQL project aims to fully support a major release for five years. After its end-of-life (EOL) month ends, a major version receives one final minor release. After that final minor release, bug fixing ceases for that major version. """ so, if bug fixing ceases for a non-supported version it's clearly no "best practice" to continue using it. so you have two options: 1) use the packages from yum.postgresql.org for a supported version 2) get commercial support for your out-of-community-support verssion but even if you do 2, that would be a preparatory step looking forward to upgrade to a newer version -- Jaime Casanova www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: How to Works with Centos
On Mon, Dec 25, 2017 at 06:48:09PM -0500, Jaime Casanova wrote: > so you have two options: > > 1) use the packages from yum.postgresql.org for a supported version > 2) get commercial support for your out-of-community-support verssion > > but even if you do 2, that would be a preparatory step looking > forward to upgrade to a newer version You need to think long-term here. The product that you are developing and/or maintaining will need to stay around for a couple of years as well, those are years where you should keep up with the community support window of 5 years for a major version of PostgreSQL. That's what I do on the stuff I work with, and the outcome is much better at the end as there is no need to finish with a private fork of an out-of-support version, normally at least. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] Replication status in logical replication
On Tue, Dec 26, 2017 at 1:10 AM, Petr Jelinek wrote: > On 21/11/17 22:06, Masahiko Sawada wrote: >> >> After investigation, I found out that my previous patch was wrong >> direction. I should have changed XLogSendLogical() so that we can >> check the read LSN and set WalSndCaughtUp = true even after read a >> record without wait. Attached updated patch passed 'make check-world'. >> Please review it. >> > > Hi, > > This version looks good to me and seems to be in line with what we do in > physical replication. > > Marking as ready for committer. > Thank you for reviewing this patch! I think this patch can be back-patched to 9.4 as Simon mentioned. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: AS OF queries
On Thu, Dec 21, 2017 at 3:57 AM, Alvaro Hernandez wrote: > > > On 20/12/17 14:48, Konstantin Knizhnik wrote: > > > > On 20.12.2017 16:12, Laurenz Albe wrote: > > Konstantin Knizhnik wrote: > > I wonder if Postgres community is interested in supporting time travel > queries in PostgreSQL (something like AS OF queries in Oracle: > https://docs.oracle.com/cd/B14117_01/appdev.101/b10795/adfns_fl.htm). > As far as I know something similar is now developed for MariaDB. > > I think that would be a good thing to have that could make > the DBA's work easier - all the requests to restore a table > to the state from an hour ago. > > > Please notice that it is necessary to configure postgres in proper way in > order to be able to perform time travels. > > > This makes sense. BTW, I believe this feature would be an amazing > addition to PostgreSQL. > > > If you do not disable autovacuum, then old versions will be just cleaned-up. > If transaction commit timestamps are not tracked, then it is not possible to > locate required timeline. > > So DBA should make a decision in advance whether this feature is needed or > not. > It is not a proper instrument for restoring/auditing existed database which > was not configured to keep all versions. > > May be it is better to add special configuration parameter for this feature > which should implicitly toggle > autovacuum and track_commit_timestamp parameters). > > > Downthread a "moving xid horizon" is proposed. I believe this is not too > user friendly. I'd rather use a timestamp horizon (e.g. "up to 2 days ago"). > Given that the commit timestamp is tracked, I don't think this is an issue. > This is the same as the undo_retention in Oracle, which is expressed in > seconds. I agree but since we cannot have same xid beyond xid wraparounds we would have to remove old tuples even if we're still in the time interval Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Getting rid of "tuple concurrently updated" elog()s with concurrent DDLs (at least ALTER TABLE)
Hi all, Triggering "tuple concurrently updated" from heapam.c, which is an elog(), is not that difficult for some DDL commands. For example with ALTER ROLE, just create a role: psql --no-psqlrc -c 'create role popo' And then run that in two sessions and the error will show up, triggered from CatalogTupleUpdate(): while true; do psql --no-psqlrc -c "alter role popo password 'foo'"; done In this case, upgrading the lock taken on pg_authid from RowExclusiveLock to ShareRowExclusiveLock prevents concurrent sessions to update each other, which is what the patch attached does. I think that we should get rid of all those errors, for many applications doing concurrent DDLs getting this error is surprising, and could be thought as a corrupted instance. I am just touching the tip of the iceberg here, as I have the gut feeling that there are other objects where it is possible to trigger the failure, and an analysis effort is going to be costly. So I'd like to get first the temperature about such analysis. So, opinions? -- Michael diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml index 24613e3c75..a93b31ae6d 100644 --- a/doc/src/sgml/mvcc.sgml +++ b/doc/src/sgml/mvcc.sgml @@ -971,6 +971,7 @@ ERROR: could not serialize access due to read/write dependencies among transact Acquired by CREATE COLLATION, + ALTER ROLE, CREATE TRIGGER, and many forms of ALTER TABLE (see ). diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c index f2941352d7..d2e300d949 100644 --- a/src/backend/commands/user.c +++ b/src/backend/commands/user.c @@ -671,9 +671,11 @@ AlterRole(AlterRoleStmt *stmt) bypassrls = intVal(dbypassRLS->arg); /* -* Scan the pg_authid relation to be certain the user exists. +* Scan the pg_authid relation to be certain the user exists. Take +* ShareRowExclusiveLock to prevent other sessions to update this +* tuple in parallel with another ALTER ROLE command. */ - pg_authid_rel = heap_open(AuthIdRelationId, RowExclusiveLock); + pg_authid_rel = heap_open(AuthIdRelationId, ShareRowExclusiveLock); pg_authid_dsc = RelationGetDescr(pg_authid_rel); tuple = get_rolespec_tuple(stmt->role); signature.asc Description: PGP signature
Deadlock in multiple CIC.
c3d09b3bd23f5f6 fixed it so concurrent CIC would not deadlock (or at least not as reliably as before) by dropping its own snapshot before waiting for all the other ones to go away. With commit 8aa3e47510b969354ea02a, concurrent CREATE INDEX CONCURRENTLY on different tables in the same database started deadlocking against each other again quite reliably. I think the solution is simply to drop the catalog snapshot as well, as in the attached. reported by Jeremy Finzel here: https://www.postgresql.org/message-id/flat/CAMa1XUhHjCv8Qkx0WOr1Mpm_R4qxN26EibwCrj0Oor2YBUFUTg%40mail.gmail.com#cama1xuhhjcv8qkx0wor1mpm_r4qxn26eibwcrj0oor2ybuf...@mail.gmail.com Cheers, Jeff diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index 97091dd..d4ac2e1 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -861,6 +861,7 @@ DefineIndex(Oid relationId, PopActiveSnapshot(); UnregisterSnapshot(snapshot); + InvalidateCatalogSnapshotConditionally(); /* * The index is now valid in the sense that it contains all currently
Ethiopian calendar year(DATE TYPE) are different from the Gregorian calendar year
In Ethiopia the year have 13 months and but in Gregorian calendar the year have 12 months, The Ethiopian society's are want to use his Ethiopian calendar year,i have the algorithm, but the way i add to the postgresql source code as EXTENSION? or to the backend(to kernel)? 13th month have 5 days and in fourth year 13th month have 6 days.that means 1 year have 365.25 days,my algorithm converts Gregorian calendar year to Ethiopian calendar year.
Re: Getting rid of "tuple concurrently updated" elog()s with concurrent DDLs (at least ALTER TABLE)
On December 26, 2017 5:55:03 AM GMT+01:00, Michael Paquier wrote: >Hi all, > >Triggering "tuple concurrently updated" from heapam.c, which is an >elog(), >is not that difficult for some DDL commands. For example with ALTER >ROLE, >just create a role: >psql --no-psqlrc -c 'create role popo' > >And then run that in two sessions and the error will show up, triggered >from CatalogTupleUpdate(): >while true; do psql --no-psqlrc -c "alter role popo password 'foo'"; >done > >In this case, upgrading the lock taken on pg_authid from >RowExclusiveLock >to ShareRowExclusiveLock prevents concurrent sessions to update each >other, >which is what the patch attached does. > >I think that we should get rid of all those errors, for many >applications >doing concurrent DDLs getting this error is surprising, and could be >thought >as a corrupted instance. I am just touching the tip of the iceberg >here, as >I have the gut feeling that there are other objects where it is >possible to >trigger the failure, and an analysis effort is going to be costly. So >I'd >like to get first the temperature about such analysis. > >So, opinions? You're proposing to lock the entire relation against many forms of concurrent DDL, just to get rid of that error? That seems unacceptable. Isn't the canonical way to solve this to take object locks? Andres Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.