Re: [PATCH] use separate PartitionedRelOptions structure to store partitioned table options
Hello, On Mon, Oct 7, 2019 at 6:43 PM Nikolay Shaplov wrote: > В письме от понедельник, 7 октября 2019 г. 14:57:14 MSK пользователь Michael > Paquier написал: > > On Sun, Oct 06, 2019 at 03:47:46PM +0300, Nikolay Shaplov wrote: > > > The idea of this patch is following: If you read the code, partitioned > > > tables do not have any options (you will not find RELOPT_KIND_PARTITIONED > > > in boolRelOpts, intRelOpts, realRelOpts, stringRelOpts and enumRelOpts in > > > reloption.c), but it uses StdRdOptions to store them (these no options). > > I am not even sure that we actually need that. What kind of reloption > > you would think would suit into this subset? > > Actually I do not know. But the author of partitioned patch, added a stub for > partitioned tables to have some reloptions in future. But this stub is > designed to use StdRdOptions. Which is not correct, as I presume. So here I am > correcting the stub. I wrote the patch that introduced RELOPT_KIND_PARTITIONED. Yes, it was added as a stub relopt_kind to eventually apply to reloptions that could be sensibly applied to partitioned tables. We never got around to working on making the existing reloptions relevant to partitioned tables, nor did we add any new partitioned table-specific reloptions, so it remained an unused relopt_kind. IIUC, this patch invents PartitionedRelOptions as the binary representation for future RELOPT_KIND_PARTITIONED parameters. As long as others are on board with using different *Options structs for different object kinds, I see no problem with this idea. Thanks, Amit
Re: Regarding extension
On Fri, 4 Oct 2019 at 13:18, Tom Lane wrote: > Natarajan R writes: > > Me: Thanks Tomas, But this is for that particular database only, I want > > to get the *list of database Id's* on which my extension is installed > > during *PG_INIT* itself... > > You can't. In the first place, that information simply isn't obtainable, > because a session running within one database doesn't have access to the > catalogs of other databases in the cluster. (You could perhaps imagine > firing up connections to other DBs a la dblink/postgres_fdw, but that will > fail because you won't necessarily have permissions to connect to every > database.) In the second place, it's a pretty terrible design to be > attempting any sort of database access within _PG_init, because that > precludes loading that module outside a transaction; for example you > will not be able to preload it via shared_preload_libraries or allied > features. > Absolutely agreed. Having done this myself, it's much, much harder than you'd expect and not something I suggest anyone try unless it's absolutely necessary. It'd be an absolute dream if extensions could create their own shared catalogs; that'd make life *so* much easier. But I seem to recall looking at that and nope-ing right out. That was a while ago so I should probably revisit it. Anyhow: BDR and pglogical are extensions that do need to concern itself with what's in various databases, so this is an issue I've worked with day to day for some time. BDR1 used a custom security label and the pg_shseclabel catalog to mark databases that were BDR-enabled. It launched a worker that connected to database InvalidOid, so it could read only the global shared catalogs, then it scanned them to find out which DBs to launch individual workers for. This interacted poorly with pg_dump/pg_restore and proved fragile, so I don't recommend it. pglogical instead launches a static bgworker with no DB connections. On startup or when it gets a suitable message over its extension shmem segment + a latch set, it launches new workers for each DB. Each worker inspects the DB to check for the presence of the pglogical extension and exits if it isn't found. All in all, it's pretty clumsy, though it works very well. We have to do our own process management and registration. Workarounds must be put in place for processes failing to launch then a new process taking their shmem slot and various other things. pglogical lands up having to duplicate quite a bit of the bgw and postmaster management infrastructure because it's not extensible and it has some serious deficiencies in error/crash handling. (We also have our own dependency management, lock management, shared cache invalidations, syscache/catcache-like mechanism, and other areas where we'd rather extend Pg's infrastructure but can't. Being able to create our own dependency types, custom lock types/methods, custom syscaches we could get invalidations for, etc would just be amazing. But each would likely be a major effort to get into core, if we could get it accepted at all given the "in core users" argument, and we'd have to keep the old method around anyway...) -- Craig Ringer http://www.2ndQuadrant.com/ 2ndQuadrant - PostgreSQL Solutions for the Enterprise
RE: Global shared meta cache
Hi, Konstantin I'm very sorry for the late response and thank you for your feedback. (I re-sent this email because my email address changed and couldn't deliver to hackers.) >From: Konstantin Knizhnik [mailto:k.knizh...@postgrespro.ru] > >Takeshi-san, > >I am sorry for late response - I just waited new version of the patch >from you for review. Though I haven't incorporated your idea, I made PoC patch, which supports regular create table, select, and drop table. TBH, current patch is not sophisticated so much. It failed some installcheck items with global catalog cache on and has around 2k LOC. >I read your last proposal and it seems to be very reasonable. > From my point of view we can not reach acceptable level of performance >if we do not have local cache at all. >So, as you proposed, we should maintain local cache for uncommitted data. Yeah, I did this in my patch. >I think that size of global cache should be limited (you have introduced GUC >for it). >In principle it is possible to use dynamic shared memory and have >unlimited global cache. >But I do not see much sense in it. Yes. I limit the size for global cache. Right now it doesn't support eviction policy like LRU. >I do not completely understand from your description when are are going >to evict entry from local cache? >Just once transaction is committed? I think it will be more efficient >to also specify memory threshold for local cache size and use LRU or >some other eviction policy to remove data from local cache. >So if working set (accessed relations) fits in local cache limit, there >will be no performance penalty comparing with current implementation. >There should be completely on difference on pgbench or other benchmarks >with relatively small number of relations. > >If entry is not found in local cache, then we should look for it in >global cache and in case of double cache miss - read it from the disk. >I do not completely understand why we need to store references to >global cache entries in local cache and use reference counters for global >cache entries. >Why we can not maintain just two independent caches? > >While there are really databases with hundreds and even thousands of >tables, application is still used to work with only some small subset of them. >So I think that "working set" can still fit in memory. This is why I >think that in case of local cache miss and global cache hit, we should >copy data from global cache to local cache to make it possible to access it in >future without any sycnhronization. > >As far as we need to keep all uncommitted data in local cache, there is >still a chance of local memory overflow (if some transaction creates or >alters too much number of tables). >But I think that it is very exotic and rare use case. The problem with >memory overflow usually takes place if we have large number of >backends, each maintaining its own catalog cache. >So I think that we should have "soft" limit for local cache and "hard" >limit for global cache. Oh, I didn't come up this idea at all. So local cache is sort of 1st cache and global cache is second cache. That sounds great. It would be good for performance and also setting two guc parameter for limiting local cache and global cache gives complete memory control for DBA. Yeah, uncommitted data should be in local but it's the only exception. No need to keep track of reference to global cache from local cache header seems less complex for implementation. I'll look into the design. >I didn't think much about cache invalidation. I read your proposal, but >frankly speaking do not understand why it should be so complicated. >Why we can't immediately invalidate entry in global cache and lazily >(as it is done now using invalidation signals) invalidate local caches? > I was overthinking about when local/global cache is evicted. Simply the process reads the sinval messages then invalidate it. If the refcount is not zero, the process mark it dead to prevent other process from finding the obsoleted cache from global hash table. The refcount of global cache is raised between SearchSysCache() and ReleaseSysCache(). Invalidation of global cache with refcount up would cause invalid memory access. Regards, Takeshi Ideriha 0002-POC-global-catalog-cache.patch Description: 0002-POC-global-catalog-cache.patch 0001-MemoryContext-for-shared-memory-based-on-DSA.patch Description: 0001-MemoryContext-for-shared-memory-based-on-DSA.patch
Re: dropping column prevented due to inherited index
Hi Ashutosh, On Mon, Oct 7, 2019 at 1:39 PM Ashutosh Sharma wrote: > I think we could have first deleted all the dependency of child object > on parent and then deleted the child itself using performDeletion(). So, there are two objects to consider in this case -- column and an index that depends on it. For columns, we don't store any dependency records for the dependency between a child column and its corresponding parent column. That dependency is explicitly managed by the code and the attinhcount flag, which if set, prevents the column from being dropped on its own. Indexes do rely on dependency records for the dependency between a child index and its corresponding parent index. This dependency prevents a child index from being dropped if the corresponding parent index is also not being dropped. In this case, recursively dropping a child's column will in turn try to drop the depending child index. findDependentObject() complains because it can't allow a child index to be dropped, because it can't establish that the corresponding parent index is also being dropped. That's because the parent index will be dropped when the parent's column will be dropped, which due to current coding of ATExecDropColumn() is *after* all the child columns and indexes are dropped. If we drop the parent column and depending index first and then recurse to children as my proposed patch does, then the parent index would already have been dropped when dropping the child column and the depending index. > As an example let's consider the case of toast table where we first > delete the dependency of toast relation on main relation and then > delete the toast table itself otherwise the toast table deletion would > fail. But, the problem I see here is that currently we do not have any > entry in pg_attribute table that would tell us about the dependency of > child column on parent. I couldn't imagine how that trick could be implemented for this case. :( Thanks, Amit
Re: Remove some code for old unsupported versions of MSVC
On 2019-10-07 08:52, Michael Paquier wrote: > On Fri, Oct 04, 2019 at 04:35:59PM +0200, Peter Eisentraut wrote: >> As of d9dd406fe281d22d5238d3c26a7182543c711e74, we require MSVC 2013, >> which means _MSC_VER >= 1800. This means that conditionals about >> older versions of _MSC_VER can be removed or simplified. >> >> Previous code was also in some cases handling MinGW, where _MSC_VER is >> not defined at all, incorrectly, such as in pg_ctl.c and win32_port.h, >> leading to some compiler warnings. This should now be handled better. > > Thanks Peter for cleaning up this code. I have looked at it, did some > testing and it looks good to me. No spots are visibly missing. pushed -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: Global shared meta cache
Hi, Alvaro > >The last patch we got here (a prototype) was almost a year ago. There was >substantial discussion about it, but no new version of the patch has been >posted. Are >we getting a proper patch soon, or did we give up on the approach entirely? I'm sorry for the late response. I started to work for it again for coming up commitfest. Regards, Takeshi Ideriha
RE: Wrong value in metapage of GIN INDEX.
Moon-san, kuroda.keisuke-san On Thu, Aug 29, 2019 at 8:20 AM, Moon, Insung wrote: > =# CREATE INDEX foo_idx ON foo USING gin (i jsonb_ops); > =# SELECT * FROM gin_metapage_info(get_raw_page('foo_idx', 0)) WITH > (fastupdate=off); This is not important thing but some mistakes are here. =# CREATE INDEX foo_idx ON foo USING gin (i jsonb_ops) WITH (fastupdate=off); =# SELECT * FROM gin_metapage_info(get_raw_page('foo_idx', 0)); > In this example, the nentries value should be 10001 because the gin index > stores duplicate values in one leaf(posting > tree or posting list). ... > The patch is very simple. > Fix to increase the value of nEntries only when a non-duplicate GIN index > leaf added. Does nentries show the number of entries in the leaf pages? If so, the fix seems to be correct. -- Yoshikazu Imai
Re: dropping column prevented due to inherited index
Apologies for not helping much here; I'm on vacation for a couple of weeks. On 2019-Oct-08, Amit Langote wrote: > I couldn't imagine how that trick could be implemented for this case. :( Can't we pass around an ObjectAddresses pointer to which each recursion level adds the object(s) it wants to delete, and in the outermost level we drop everything in it? I vaguely recall we implemented something like that somewhere within the past year, but all I can find is 20bef2c3110a. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: dropping column prevented due to inherited index
On Tue, Oct 8, 2019 at 6:15 PM Alvaro Herrera wrote: > Apologies for not helping much here; I'm on vacation for a couple of > weeks. No worries, please take your time. > On 2019-Oct-08, Amit Langote wrote: > > > I couldn't imagine how that trick could be implemented for this case. :( > > Can't we pass around an ObjectAddresses pointer to which each recursion > level adds the object(s) it wants to delete, and in the outermost level > we drop everything in it? I vaguely recall we implemented something > like that somewhere within the past year, but all I can find is > 20bef2c3110a. I thought about doing something like that, but wasn't sure if introducing that much complexity is warranted. Thanks, Amit
Re: Transparent Data Encryption (TDE) and encrypted files
On Mon, 7 Oct 2019 at 18:02, Bruce Momjian wrote: > Well, do to encryption properly, there is the requirement of the nonce. > If you ever rewrite a bit, you technically have to have a new nonce. > For WAL, since it is append-only, you can use the WAL file name. For > heap/index files, we change the LSN on every rewrite (with > wal_log_hints=on), and we never use the same LSN for writing multiple > relations, so LSN+page-offset is a sufficient nonce. > > For clog, it is not append-only, and bytes are rewritten (from zero to > non-zero), so there would have to be a new nonce for every clog file > write to the file system. We can store the nonce in a separate file, > but the clog contents and nonce would have to be always synchronized or > the file could not be properly read. Basically every file we want to > encrypt, needs this kind of study. > Yes. That is the reason why our current version doesn't encrypt SLRU's. There is some security in encrypting without a nonce when considering an attack vector that only sees one version of the encrypted page. But I think to make headway on this we need to figure out if TDE feature is useful withour SLRU encryption (I think yes), and how hard would it be to properly encrypt SLRU's? Would the solution be acceptable for inclusion? I can think of 3 options: a) A separate nonce storage. Seems pretty bad complexity wise. New data-structures would need to be created. SLRU writes would need to be WAL logged with a full page image. b) Inline nonces, number of items per SLRU page is variable depending on if encryption is enabled or not. c) Inline nonces we reserve a header structure on all SLRU pages. pg_upgrade needs to rewrite persistent SLRUs. None of the options seem great, but c) has the benefit of also carving out the space for SLRU checksums. > As I also said to Stephen, the people who are discussing this here > > should *really really really* be looking at the Cybertec patch instead > > of trying to invent everything from scratch - unless that patch has, > > Someone from Cybertec is on the voice calls we have, and is actively > involved. > As far as I can tell no-one from us is on the call. I personally missed the invitation when it was sent out. I would gladly share our learnings, a lot of what I see here is retreading what we already went through with our patch. However, I think that at the very least the conclusions, problems to work on and WIP patch should be shared on list. It's hard for anybody outside to have any input if there are no concrete design proposals or code to review. Moreover, I think e-mail is a much better media for having a reasoned discussion about technical design decisions. > > In other words: maybe I'm wrong here, but it looks to me like we're > > laboriously reinventing the wheel when we could be working on > > improving the working prototype. > > The work being done is building on that prototype. > We would like to help on that front. Regards, Ants Aasma Web: https://www.cybertec-postgresql.com
Re: Change atoi to strtol in same place
On Tue, 8 Oct 2019 at 19:46, Joe Nelson wrote: > > David Rowley wrote: > > The translation string here must be consistent over all platforms. I > > think this will cause issues if the translation string uses %ld and > > the platform requires %lld? > > A very good and subtle point. I'll change it to %lld so that a single > format string will work everywhere. The way to do this is to make a temp buffer and snprintf into that buffer then use %s. See basebackup.c where it does: char buf[64]; snprintf(buf, sizeof(buf), INT64_FORMAT, total_checksum_failures); ereport(WARNING, (errmsg("%s total checksum verification failures", buf))); as an example. > > I think you should maybe aim for 2 patches here. > > > > Patch 1: ... > > > > Patch 2: Add additional validation where we don't currently do > > anything. e.g pg_dump -j > > > > We can then see if there's any merit in patch 2 of if it's adding more > > complexity than is really needed. > > Are you saying that my current patch adds extra constraints for > pg_dump's -j argument, or that in the future we could do that? Because I > don't believe the current patch adds any appreciable complexity for that > particular argument, other than ensuring the value is positive, which > doesn't seem too contentious. > Maybe we can see whether we can reach consensus on the current > parse-and-check combo patch, and if discussion drags on much longer then > try to split it up? I just think you're more likely to get a committer onside if you made it so they didn't have to consider if throwing errors where we previously didn't would be a bad thing. It's quite common to get core infrastructure in first then followup with code that uses it. This would be core infrastructure plus some less controversial usages of it, then follow up with more. This was really just a suggestion. I didn't dig into the patch in enough detail to decide on how many places could raise an error that would have silently just have done something unexpected beforehand. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays
On Sat, Oct 5, 2019 at 3:10 AM Andres Freund wrote: > > On 2019-10-04 17:08:29 -0400, Tom Lane wrote: > > Andres Freund writes: > > > On 2019-10-04 16:31:29 -0400, Bruce Momjian wrote: > > >> Yeah, it is certainly weird that you have to assign the first array > > >> element to get the rest to be zeros. By using a macro, we can document > > >> this behavior in one place. > > > > > IDK, to me this seems like something one just has to learn about C, with > > > the macro just obfuscating that already required knowledge. It's not > > > like this only applies to stack variables initializes with {0}. It's > > > also true of global variables, or function-local static ones, for > > > example. > > > > Huh? For both those cases, the *default* behavior, going back to K&R C, > > is that the variable initializes to all-bits-zero. There's no need to > > write anything extra. > > What I mean is that if there's any initialization, it's to all zeroes, > except for the parts explicitly initialized explicitly. And all that the > {0} does, is that the rest of the fields are initialized the way other > such initialization happens. > You have a point and I think over time everyone will know this. However, so many people advocating for having a macro with a comment to be more explicit about this behavior shows that this is not equally obvious to everyone or at least they think that it will help future patch authors. Now, I think as the usage ({0}) already exists in the code, so I think if we decide to use a macro, then ideally those places should also be changed. I am not telling that it must be done in the same patch, we can even do it as a separate patch. I am personally still in the camp of people advocating the use of macro for this purpose. It is quite possible after reading your points, some people might change their opinion or some others also share their opinion against using a macro in which case we can drop the idea of using a macro. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Transparent Data Encryption (TDE) and encrypted files
Ants Aasma wrote: > On Mon, 7 Oct 2019 at 18:02, Bruce Momjian wrote: > >> Well, do to encryption properly, there is the requirement of the nonce. >> If you ever rewrite a bit, you technically have to have a new nonce. >> For WAL, since it is append-only, you can use the WAL file name. For >> heap/index files, we change the LSN on every rewrite (with >> wal_log_hints=on), and we never use the same LSN for writing multiple >> relations, so LSN+page-offset is a sufficient nonce. >> >> For clog, it is not append-only, and bytes are rewritten (from zero to >> non-zero), so there would have to be a new nonce for every clog file >> write to the file system. We can store the nonce in a separate file, >> but the clog contents and nonce would have to be always synchronized or >> the file could not be properly read. Basically every file we want to >> encrypt, needs this kind of study. > Yes. That is the reason why our current version doesn't encrypt > SLRU's. Actually there was one more problem: the AES-CBC cipher (or AES-XTS in the earlier patch version) process an encryption block of 16 bytes at a time. Thus if only a part of the block gets written (a torn page write), decryption of the block results in garbage. Unlike relations, there's nothing like full-page write for SLRU pages, so there's no way to recover from this problem. However, if the current plan is to use the CTR mode, this problem should not happen. -- Antonin Houska Web: https://www.cybertec-postgresql.com
Re: [PATCH] Do not use StdRdOptions in Access Methods
В письме от понедельник, 7 октября 2019 г. 18:55:20 MSK пользователь Dent John написал: > I like the new approach. And I agree with the ambition — to split out the > representation from StdRdOptions. Thanks. > However, with that change, in the AM’s *options() function, it looks as if > you could simply add new fields to the relopt_parse_elt list. That’s still > not true, because parseRelOptions() will fail to find a matching entry, > causing numoptions to be left zero, and an early exit made. (At least, > that’s if I correctly understand how things work.) > > I think that is fine as an interim limitation, because your change has not > yet fully broken the connection to the boolRelOpts, intRelOpts, realRelOpts > and stringRelOpts strutures. But perhaps a comment would help to make it > clear. Perhaps add something like this above the tab[]: "When adding or > changing a relopt in the relopt_parse_elt tab[], ensure the corresponding > change is made to boolRelOpts, intRelOpts, realRelOpts or stringRelOpts." Whoa-whoa! I am not inventing something new here. Same code is already used in brin (brin.c:820), gin (ginutils.c:602) and gist (gistutils.c:909) indexes. I've just copied the idea, to do all index code uniform. This does not mean that these code can't be improved, but as far as I can guess it is better to do it in small steps, first make option code uniform, and then improve all of it this way or another... So I here I would suggest to discuss it I copied this code correctly, without going very deeply into discussion how we can improve code I've used as a source for cloning. And then I have ideas how to do it better. But this will come later... -- Software Developer: https://www.upwork.com/freelancers/~014a87e140ff02c0da Body-oriented Therapist: https://vk.com/nataraj_rebalancing (Russian)
Re: [PATCH] Add some useful asserts into View Options macroses
В письме от понедельник, 7 октября 2019 г. 12:59:27 MSK пользователь Robert Haas написал: > > This thread is a follow up to the thread > > https://www.postgresql.org/message-id/2620882.s52SJui4ql@x200m where I've > > been trying to remove StdRdOptions structure and replace it with unique > > structure for each relation kind. > > > > I've decided to split that patch into smaller parts. > > > > This part adds some asserts to ViewOptions macroses. > > Since an option pointer there is converted into (ViewOptions *) it would > > be > > really good to make sure that this macros is called in proper context, and > > we do the convertation properly. At least when running tests with asserts > > turned on. > Seems like a good idea. Should we try to do something similar for the > macros in that header file that cast to StdRdOptions? That would not be as easy as for ViewOptions. For example as for the current master code, fillfactor from StdRdOptions is used in Toast, Heap, Hash index, nbtree index, and spgist index. This will make RelationGetFillFactor macros a bit complicated for example. Now I have patches that limits usage of StdRdOptions to Heap and Toast. When StdRdOptions is not that widely used, we whould be able to add asserts for it, it will not make the code too complex. So I would suggest to do ViewOptions asserts now, and keep dealing with StdRdOptions for later. When we are finished with my current patches, I will take care about it. -- Software Developer: https://www.upwork.com/freelancers/~014a87e140ff02c0da Body-oriented Therapist: https://vk.com/nataraj_rebalancing (Russian)
Re: [PATCH] use separate PartitionedRelOptions structure to store partitioned table options
В письме от вторник, 8 октября 2019 г. 16:00:49 MSK пользователь Amit Langote написал: > > > > The idea of this patch is following: If you read the code, partitioned > > > > tables do not have any options (you will not find > > > > RELOPT_KIND_PARTITIONED > > > > in boolRelOpts, intRelOpts, realRelOpts, stringRelOpts and enumRelOpts > > > > in > > > > reloption.c), but it uses StdRdOptions to store them (these no > > > > options). > > > > > > I am not even sure that we actually need that. What kind of reloption > > > you would think would suit into this subset? > > > > Actually I do not know. But the author of partitioned patch, added a stub > > for partitioned tables to have some reloptions in future. But this stub > > is designed to use StdRdOptions. Which is not correct, as I presume. So > > here I am correcting the stub. > > I wrote the patch that introduced RELOPT_KIND_PARTITIONED. Yes, it > was added as a stub relopt_kind to eventually apply to reloptions that > could be sensibly applied to partitioned tables. We never got around > to working on making the existing reloptions relevant to partitioned > tables, nor did we add any new partitioned table-specific reloptions, > so it remained an unused relopt_kind. Thank you for clarifying thing. > IIUC, this patch invents PartitionedRelOptions as the binary > representation for future RELOPT_KIND_PARTITIONED parameters. As long > as others are on board with using different *Options structs for > different object kinds, I see no problem with this idea. Yes, this is correct. Some Access Methods already use it's own Options structure. As far as I can guess StdRdOptions still exists only for historical reasons, and became quite a mess because of adding all kind of stuff there. Better to separate it. BTW, as far as you are familiar with this part of the code, may be you will join the review if this particular patch? -- Software Developer: https://www.upwork.com/freelancers/~014a87e140ff02c0da Body-oriented Therapist: https://vk.com/nataraj_rebalancing (Russian)
Re: ICU for global collation
Hi everyone, like the others before me we (the university of Münster) are happy to see this feature as well. Thank you this. When I applied the patch two weeks ago I run into the issue that initdb did not recognize the new parameters (collation-provider and icu-locale) but I guess it was caused by my own stupidity. > When trying databases defined with ICU locales, I see that backends > that serve such databases seem to have their LC_CTYPE inherited from > the environment (as opposed to a per-database fixed value). I am able to recreate the issue described by Daniel on my machine. Now it works as expected. I just had to update the patch since commit 3f6b3be3 had modified two lines which resulted in conflicts. You find the updated patch as attachement to this mail. Best regards, Marius Timmer -- Westfälische Wilhelms-Universität Münster (WWU) Zentrum für Informationsverarbeitung (ZIV) Röntgenstraße 7-13 Besucheradresse: Einsteinstraße 60 - Raum 107 48149 Münster +49 251 83 31158 marius.tim...@uni-muenster.de https://www.uni-muenster.de/ZIV From 0b520194ed164feaeac94af25ddf1429cf4ab24f Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Mon, 7 Oct 2019 12:21:36 +0200 Subject: [PATCH v1.2] Add option to use ICU as global collation provider This adds the option to use ICU as the default collation provider for either the whole cluster or a database. New options for initdb, createdb, and CREATE DATABASE are used to select this. --- doc/src/sgml/ref/createdb.sgml | 9 ++ doc/src/sgml/ref/initdb.sgml | 23 src/backend/access/hash/hashfunc.c | 18 ++- src/backend/commands/dbcommands.c| 52 - src/backend/regex/regc_pg_locale.c | 7 +- src/backend/utils/adt/formatting.c | 6 + src/backend/utils/adt/like.c | 20 +++- src/backend/utils/adt/like_support.c | 2 + src/backend/utils/adt/pg_locale.c| 168 --- src/backend/utils/adt/varchar.c | 22 +++- src/backend/utils/adt/varlena.c | 26 - src/backend/utils/init/postinit.c| 21 src/bin/initdb/Makefile | 2 + src/bin/initdb/initdb.c | 63 -- src/bin/initdb/t/001_initdb.pl | 18 ++- src/bin/pg_dump/pg_dump.c| 16 +++ src/bin/psql/describe.c | 8 ++ src/bin/scripts/Makefile | 2 + src/bin/scripts/createdb.c | 9 ++ src/bin/scripts/t/020_createdb.pl| 19 ++- src/include/catalog/pg_database.dat | 2 +- src/include/catalog/pg_database.h| 3 + src/include/utils/pg_locale.h| 6 + 23 files changed, 417 insertions(+), 105 deletions(-) diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml index 8fc8128bf9..5b73afad91 100644 --- a/doc/src/sgml/ref/createdb.sgml +++ b/doc/src/sgml/ref/createdb.sgml @@ -85,6 +85,15 @@ PostgreSQL documentation + + --collation-provider={libc|icu} + + +Specifies the collation provider for the database. + + + + -D tablespace --tablespace=tablespace diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml index da5c8f5307..9ad7b2e112 100644 --- a/doc/src/sgml/ref/initdb.sgml +++ b/doc/src/sgml/ref/initdb.sgml @@ -165,6 +165,18 @@ PostgreSQL documentation + + --collation-provider={libc|icu} + + +This option sets the collation provider for databases created in the +new cluster. It can be overridden in the CREATE +DATABASE command when new databases are subsequently +created. The default is libc. + + + + -D directory --pgdata=directory @@ -209,6 +221,17 @@ PostgreSQL documentation + + --icu-locale=locale + + +Specifies the ICU locale if the ICU collation provider is used. If +this is not specified, the value from the --locale +option is used. + + + + -k --data-checksums diff --git a/src/backend/access/hash/hashfunc.c b/src/backend/access/hash/hashfunc.c index 6ec1ec3df3..2f8f220549 100644 --- a/src/backend/access/hash/hashfunc.c +++ b/src/backend/access/hash/hashfunc.c @@ -255,8 +255,13 @@ hashtext(PG_FUNCTION_ARGS) errmsg("could not determine which collation to use for string hashing"), errhint("Use the COLLATE clause to set the collation explicitly."))); - if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID) - mylocale = pg_newlocale_from_collation(collid); + if (!lc_collate_is_c(collid)) + { + if (collid != DEFAULT_COLLATION_OID) + mylocale = pg_newlocale_from_collation(collid); + else if (global_locale.provider == COLLPROVIDER_ICU) + mylocale = &global_locale; + } if (!mylocale || mylocale->deterministic) { @@ -311,8 +316,13 @@ hashtextextended(PG_FUNCTION_ARGS) errmsg("could not determine which collation to use for string hashing
Re: Transparent Data Encryption (TDE) and encrypted files
Robert Haas wrote: > On Mon, Oct 7, 2019 at 3:01 PM Antonin Houska wrote: > > However the design doesn't seem to be stable enough at the > > moment for coding to make sense. > > Well, I think the question is whether working further on your patch > could produce some things that everyone would agree are a step > forward. It would have made a lot of sense several months ago (Masahiko Sawada actually used parts of our patch in the previous version of his patch (see [1]), but the requirement to use a different IV for each execution of the encryption changes things quite a bit. Besides the relation pages and SLRU (CLOG), which are already being discussed elsewhere in the thread, let's consider other two file types: * Temporary files (buffile.c): we derive the IV from PID of the process that created the file + segment number + block within the segment. This information does not change if you need to write the same block again. If new IV should be used for each encryption run, we can simply introduce an in-memory counter that generates the IV for each block. However it becomes trickier if the temporary file is shared by multiple backends. I think it might still be easier to expose the IV values to other backends via shared memory than to store them on disk ... * "Buffered transient file". This is to be used instead of OpenTransientFile() if user needs the option to encrypt the file. (Our patch adds this API to buffile.c. Currently we use it in reorderbuffer.c to encrypt the data changes produced by logical decoding, but there should be more use cases.) In this case we cannot keep the IVs in memory because user can close the file anytime and open it much later. So we derive the IV by hashing the file path. However if we should generate the IV again and again, we need to store it on disk in another way, probably one IV value per block (PGAlignedBlock). However since our implementation of both these file types shares some code, it might yet be easier if the shared temporary file also stored the IV on disk instead of exposing it via shared memory ... Perhaps this is what I can work on, but I definitely need some feedback. [1] https://www.postgresql.org/message-id/CAD21AoBjrbxvaMpTApX1cEsO=8N=nc2xVZPB0d9e-VjJ=ya...@mail.gmail.com -- Antonin Houska Web: https://www.cybertec-postgresql.com
Re: dropping column prevented due to inherited index
On Tue, Oct 8, 2019 at 2:12 PM Amit Langote wrote: > > Hi Ashutosh, > > On Mon, Oct 7, 2019 at 1:39 PM Ashutosh Sharma wrote: > > I think we could have first deleted all the dependency of child object > > on parent and then deleted the child itself using performDeletion(). > > So, there are two objects to consider in this case -- column and an > index that depends on it. > > For columns, we don't store any dependency records for the dependency > between a child column and its corresponding parent column. That > dependency is explicitly managed by the code and the attinhcount flag, > which if set, prevents the column from being dropped on its own. > > Indexes do rely on dependency records for the dependency between a > child index and its corresponding parent index. This dependency > prevents a child index from being dropped if the corresponding parent > index is also not being dropped. > > In this case, recursively dropping a child's column will in turn try > to drop the depending child index. findDependentObject() complains > because it can't allow a child index to be dropped, because it can't > establish that the corresponding parent index is also being dropped. > That's because the parent index will be dropped when the parent's > column will be dropped, which due to current coding of > ATExecDropColumn() is *after* all the child columns and indexes are > dropped. If we drop the parent column and depending index first and > then recurse to children as my proposed patch does, then the parent > index would already have been dropped when dropping the child column > and the depending index. > > > As an example let's consider the case of toast table where we first > > delete the dependency of toast relation on main relation and then > > delete the toast table itself otherwise the toast table deletion would > > fail. But, the problem I see here is that currently we do not have any > > entry in pg_attribute table that would tell us about the dependency of > > child column on parent. > > I couldn't imagine how that trick could be implemented for this case. :( > I don't think that is possible because presently in pg_attribute table we do not have any column indicating that there exists index on the given attribute. If we were knowing that then we could find out if the given index on child table have been inherited from parent and if so, we could delete all the dependencies on the child table index first and then delete the column itself in the child table but that doesn't seem to be doable here. So, although the standard way that I feel to perform an object deletion is to first remove all it's dependencies from the pg_depend table and then delete the object itself but considering the information available in the relevant catalog table that doesn't seem to be possible. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com
Created feature for to_date() conversion using patterns 'YYYY-WW', 'YYYY-WW-D', 'YYYY-MM-W' and 'YYYY-MM-W-D'
Hi, some days ago I ran into a problem with the to_date() function. I originally described it on StackExchange: https://dba.stackexchange.com/questions/250111/unexpected-behaviour-for-to-date-with-week-number-and-week-day The problem: If you want to parse a date string with year, week and day of week, you can do this using the ISO week pattern: 'IYYY-IW-ID'. This works as expected: date string | to_date() + '2019-1-1' | 2018-12-31 -> Monday of the first week of the year (defined as the week that includes the 4th of January) '2019-1-2' | 2019-01-01 '2019-1-3' | 2019-01-02 '2019-1-4' | 2019-01-03 '2019-1-5' | 2019-01-04 '2019-1-6' | 2019-01-05 '2019-1-7' | 2019-01-06 '2019-2-1' | 2019-01-07 '2019-2-2' | 2019-01-08 But if you are trying this with the non-ISO pattern '-WW-D', the result was not expected: date string | to_date() - '2019-1-1' | 2019-01-01 '2019-1-2' | 2019-01-01 '2019-1-3' | 2019-01-01 '2019-1-4' | 2019-01-01 '2019-1-5' | 2019-01-01 '2019-1-6' | 2019-01-01 '2019-1-7' | 2019-01-01 '2019-2-1' | 2019-01-08 '2019-2-2' | 2019-01-08 As you can see, the 'D' part of the pattern doesn't influence the resulting date. The answer of Laurenz Albe pointed to a part of the documentation, I missed so far: "In to_timestamp and to_date, weekday names or numbers (DAY, D, and related field types) are accepted but are ignored for purposes of computing the result. The same is true for quarter (Q) fields." (https://www.postgresql.org/docs/12/functions-formatting.html) So, I had a look at the relevant code part. I decided to try a patch by myself. Now it works as I would expect it: date string | to_date() - '2019-1-1' | 2018-12-30 -> Sunday (!) of the first week of the year (the first week is at the first day of year) '2019-1-2' | 2018-12-31 '2019-1-3' | 2019-01-01 '2019-1-4' | 2019-01-02 '2019-1-5' | 2019-01-03 '2019-1-6' | 2019-01-04 '2019-1-7' | 2019-01-05 '2019-2-1' | 2019-01-06 '2019-2-2' | 2019-01-07 Furthermore, if you left the 'D' part, the date would be always set to the first day of the corresponding week (in that case it is Sunday, in contrast to the ISO week, which starts mondays). To be consistent, I added similar code for the week of month pattern ('W'). So, using the pattern '-MM-W-D' yields in: date string | to_date() --- '2018-12-5-1' | 2018-12-23 '2018-12-6-1' | 2018-12-30 '2019-1-1-1' | 2018-12-30 -> First day (Su) of the first week of the first month of the year '2019-2-2-1' | 2019-02-03 -> First day (Su) of the second week of February '2019-10-3-5' | 2019-10-17 -> Fifth day (Th) of the third week of October If you left the 'D', it would be set to 1 as well. The code can be seen here: https://github.com/S-Man42/postgres/commit/534e6bd70e23864f385d60ecf187496c7f4387c9 I hope, keeping the code style of the surrounding code (especially the ISO code) is ok for you. Now the questions: 1. Although the ignorance of the 'D' pattern is well documented, does the new behaviour might be interesting for you? 2. Does it work as you'd expect it? 3. Because this could be my very first contribution to the PostgreSQL code base, I really want you to be as critical as possible. I am not quite sure if I didn't miss something important. 4. Currently something like '2019-1-8' does not throw an exception but results in the same as '2019-2-1' (8th is the same as the 1st of the next week). On the other hand, currently, the ISO week conversion gives out the result of '2019-1-7' for every 'D' >= 7. I am not sure if this is better. I think a consistent exception handling should be discussed separately (date roll over vs. out of range exception vs. ISO week behaviour) So far, I am very curious about your opinions! Kind regards, Mark/S-Man42
Re: Standby accepts recovery_target_timeline setting?
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Mon, Oct 7, 2019 at 9:14 AM Fujii Masao wrote: > > Agreed, too. Do you have any idea to implement that? I've not found out > > "smart" way to do that yet. > > > > One idea is, as Michael suggested, to use SetConfigOption() for all the > > archive recovery parameters at the beginning of the startup process as > > follows, > > to forcibly set the default values if crash recovery is running. But this > > seems not smart for me. > > > > SetConfigOption("restore_command", ...); > > SetConfigOption("archive_cleanup_command", ...); > > SetConfigOption("recovery_end_command", ...); > > ... > > > > Maybe we should make GUC mechanism notice signal files and ignore > > archive recovery-related parameters when none of those files exist? > > This change seems overkill at least in v12, though. > > I think this approach is going in the wrong direction. In every other > part of the system, it's the job of the code around the GUC system to > use parameters when they're relevant and ignore them when they should > be ignored. Deciding that the parameters that were formerly part of > recovery.conf are an exception to that rule and that the GUC system is > responsible for making sure they're set only when we pay attention to > them seems like it's bringing back or exacerbating a code-level split > between recovery.conf parameters and postgresql.conf parameters when, > meanwhile, we've been wanting to eradicate that split so that the > things we allow for postgresql.conf parameters -- e.g. changing them > while they are running -- can be applied to these parameters also. I don't think we necessairly need to be thinking about trying to eliminate all differences between certain former recovery.conf settings and things like work_mem, even as we make it such that those former settings can be changed while we're running. > I don't particularly like the use of SetConfigOption() either, > although it does have some precedent in autovacuum, for example. It's pretty explicitly the job of SetConfigOption to manage the fact that only certain options can be set at certain times, as noted at the top of guc.h where we're talking about GUC contexts (and which SetConfigOption references as being what it's job is to manage- guc.c:6776 currently). > Generally, it's right and proper that the GUC system sets the > variables to which the parameters it controls are tied -- and then the > rest of the code has to do the right thing around that. It sounds like > the patch that got rid of recovery.conf wasn't considered carefully > enough, and missed the fact that it was introducing some inadvertent > behavior changes. That's too bad, but let's not overreact. It seems > totally fine to me to just add ad-hoc checks that rule out > inappropriately relying on these parameters while performing crash > recovery - and be done with it. The patch that got rid of recovery.conf also removed the inherent understanding and knowledge that there are certain options that can only be set (and make sense ...) at certain times- namely, when we're doing recovery. Having these options set at other times is entirely wrong and will be confusing to both users, and, as seen, code. From a user perspective, what happens when you've started up PG as a primary, since you don't have a signal file in place to indicate that you're doing recovery, and you have a recovery_target set, so some user does "show recovery_target_name" and sees a value? How is that sensible? Those options should only be set when we're actually doing recovery, which is governed by the signal file. Recovery is absolutely a specific kind of state that the system is in, not unlike postmaster, we've even got a specific pg_is_in_recovery() function for it. Having these options end up set but then hacking all of the other code that looks at them to check if we're actually in recovery or not would end up being both confusing to users as well as an ongoing source of bugs (which has already been made clear by the fact that we're having this discussion...). Wouldn't that also mean we would need to hack the 'show' code, to blank out the recovery_target_name variable if we aren't in recovery? Ugh. Thanks, Stephen signature.asc Description: PGP signature
Re: pg_upgrade fails with non-standard ACL
Greetings, * Anastasia Lubennikova (a.lubennik...@postgrespro.ru) wrote: > Cool. It seems that everyone agrees on this patch. Thanks for working on this, I took a quick look over it and I do have some concerns. > I attached the updated version. Now it prints a better error message > and generates an SQL script instead of multiple warnings. The attached test > script shows that. Have you tested this with extensions, where the user has changed the privileges on the extension? I'm concerned that we'll throw out warnings and tell users to go REVOKE privileges on any case where the privileges on an extension's object were changed, but that shouldn't be necessary and we should be excluding those. Changes to privileges on extensions should be handled just fine using the existing code, at least for upgrades of PG. Otherwise, at least imv, the code could use more comments (inside the functions, not just function-level...) and there's a few wording improvements that could be made. Again, not a full endorsement, as I just took a quick look, but it generally seems like a reasonable approach to go in and the issue with extensions was the only thing that came to mind as a potential problem. Thanks, Stephen signature.asc Description: PGP signature
Re: Ordering of header file inclusion
Amit Kapila writes: > On Wed, Oct 2, 2019 at 2:57 PM vignesh C wrote: >> I noticed that some of the header files inclusion is not ordered as >> per the usual standard that is followed. >> The attached patch contains the fix for the order in which the header >> files are included. >> Let me know your thoughts on the same. > +1. FWIW, I'm not on board with reordering system-header inclusions. Some platforms have (had?) ordering dependencies for those, and where that's true, it's seldom alphabetical. It's only our own headers where we can safely expect that any arbitrary order will work. > I think we shouldn't remove the extra line as part of the above change. I would take out the blank lines between our own #includes. Those are totally arbitrary and unnecessary. The whole point of style rules like this one is to reduce the differences between the way one person might write something and the way that someone else might. Deciding to throw in a blank line is surely in the realm of unnecessary differences. regards, tom lane
Re: Standby accepts recovery_target_timeline setting?
On Tue, Oct 8, 2019 at 10:58 PM Stephen Frost wrote: > > Greetings, > > * Robert Haas (robertmh...@gmail.com) wrote: > > On Mon, Oct 7, 2019 at 9:14 AM Fujii Masao wrote: > > > Agreed, too. Do you have any idea to implement that? I've not found out > > > "smart" way to do that yet. > > > > > > One idea is, as Michael suggested, to use SetConfigOption() for all the > > > archive recovery parameters at the beginning of the startup process as > > > follows, > > > to forcibly set the default values if crash recovery is running. But this > > > seems not smart for me. > > > > > > SetConfigOption("restore_command", ...); > > > SetConfigOption("archive_cleanup_command", ...); > > > SetConfigOption("recovery_end_command", ...); > > > ... > > > > > > Maybe we should make GUC mechanism notice signal files and ignore > > > archive recovery-related parameters when none of those files exist? > > > This change seems overkill at least in v12, though. > > > > I think this approach is going in the wrong direction. In every other > > part of the system, it's the job of the code around the GUC system to > > use parameters when they're relevant and ignore them when they should > > be ignored. Deciding that the parameters that were formerly part of > > recovery.conf are an exception to that rule and that the GUC system is > > responsible for making sure they're set only when we pay attention to > > them seems like it's bringing back or exacerbating a code-level split > > between recovery.conf parameters and postgresql.conf parameters when, > > meanwhile, we've been wanting to eradicate that split so that the > > things we allow for postgresql.conf parameters -- e.g. changing them > > while they are running -- can be applied to these parameters also. > > I don't think we necessairly need to be thinking about trying to > eliminate all differences between certain former recovery.conf settings > and things like work_mem, even as we make it such that those former > settings can be changed while we're running. > > > I don't particularly like the use of SetConfigOption() either, > > although it does have some precedent in autovacuum, for example. > > It's pretty explicitly the job of SetConfigOption to manage the fact > that only certain options can be set at certain times, as noted at the > top of guc.h where we're talking about GUC contexts (and which > SetConfigOption references as being what it's job is to manage- > guc.c:6776 currently). > > > Generally, it's right and proper that the GUC system sets the > > variables to which the parameters it controls are tied -- and then the > > rest of the code has to do the right thing around that. It sounds like > > the patch that got rid of recovery.conf wasn't considered carefully > > enough, and missed the fact that it was introducing some inadvertent > > behavior changes. That's too bad, but let's not overreact. It seems > > totally fine to me to just add ad-hoc checks that rule out > > inappropriately relying on these parameters while performing crash > > recovery - and be done with it. Yeah, I agree. > The patch that got rid of recovery.conf also removed the inherent > understanding and knowledge that there are certain options that can only > be set (and make sense ...) at certain times- namely, when we're doing > recovery. Having these options set at other times is entirely wrong and > will be confusing to both users, and, as seen, code. From a user > perspective, what happens when you've started up PG as a primary, since > you don't have a signal file in place to indicate that you're doing > recovery, and you have a recovery_target set, so some user does > "show recovery_target_name" and sees a value? How is that sensible? > > Those options should only be set when we're actually doing recovery, > which is governed by the signal file. Recovery is absolutely a specific > kind of state that the system is in, not unlike postmaster, we've even > got a specific pg_is_in_recovery() function for it. > > Having these options end up set but then hacking all of the other code > that looks at them to check if we're actually in recovery or not would > end up being both confusing to users as well as an ongoing source of > bugs (which has already been made clear by the fact that we're having > this discussion...). Wouldn't that also mean we would need to hack the > 'show' code, to blank out the recovery_target_name variable if we aren't > in recovery? Ugh. Isn't this overkill? This doesn't seem the problem only for recovery-related settings. We have already have the similar issue with other settings. For example, log_directory parameter is ignored when logging_collector is not enabled. But SHOW log_directory reports the setting value even when logging_collector is disabled. This seems the similar issue and might be confusing, but we could live with that. Regards, -- Fujii Masao
Re: Created feature for to_date() conversion using patterns 'YYYY-WW', 'YYYY-WW-D', 'YYYY-MM-W' and 'YYYY-MM-W-D'
Hi, I apologize for the mistake. For the mailing list correspondence I created this mail account. But I forgot to change the sender name. So, the "postgres" name appeared as sender name in the mailing list. I changed it. Kind regards, Mark/S-Man42 Hi, some days ago I ran into a problem with the to_date() function. I originally described it on StackExchange: https://dba.stackexchange.com/questions/250111/unexpected-behaviour-for-to-date-with-week-number-and-week-day The problem: If you want to parse a date string with year, week and day of week, you can do this using the ISO week pattern: 'IYYY-IW-ID'. This works as expected: date string | to_date() + '2019-1-1' | 2018-12-31 -> Monday of the first week of the year (defined as the week that includes the 4th of January) '2019-1-2' | 2019-01-01 '2019-1-3' | 2019-01-02 '2019-1-4' | 2019-01-03 '2019-1-5' | 2019-01-04 '2019-1-6' | 2019-01-05 '2019-1-7' | 2019-01-06 '2019-2-1' | 2019-01-07 '2019-2-2' | 2019-01-08 But if you are trying this with the non-ISO pattern '-WW-D', the result was not expected: date string | to_date() - '2019-1-1' | 2019-01-01 '2019-1-2' | 2019-01-01 '2019-1-3' | 2019-01-01 '2019-1-4' | 2019-01-01 '2019-1-5' | 2019-01-01 '2019-1-6' | 2019-01-01 '2019-1-7' | 2019-01-01 '2019-2-1' | 2019-01-08 '2019-2-2' | 2019-01-08 As you can see, the 'D' part of the pattern doesn't influence the resulting date. The answer of Laurenz Albe pointed to a part of the documentation, I missed so far: "In to_timestamp and to_date, weekday names or numbers (DAY, D, and related field types) are accepted but are ignored for purposes of computing the result. The same is true for quarter (Q) fields." (https://www.postgresql.org/docs/12/functions-formatting.html) So, I had a look at the relevant code part. I decided to try a patch by myself. Now it works as I would expect it: date string | to_date() - '2019-1-1' | 2018-12-30 -> Sunday (!) of the first week of the year (the first week is at the first day of year) '2019-1-2' | 2018-12-31 '2019-1-3' | 2019-01-01 '2019-1-4' | 2019-01-02 '2019-1-5' | 2019-01-03 '2019-1-6' | 2019-01-04 '2019-1-7' | 2019-01-05 '2019-2-1' | 2019-01-06 '2019-2-2' | 2019-01-07 Furthermore, if you left the 'D' part, the date would be always set to the first day of the corresponding week (in that case it is Sunday, in contrast to the ISO week, which starts mondays). To be consistent, I added similar code for the week of month pattern ('W'). So, using the pattern '-MM-W-D' yields in: date string | to_date() --- '2018-12-5-1' | 2018-12-23 '2018-12-6-1' | 2018-12-30 '2019-1-1-1' | 2018-12-30 -> First day (Su) of the first week of the first month of the year '2019-2-2-1' | 2019-02-03 -> First day (Su) of the second week of February '2019-10-3-5' | 2019-10-17 -> Fifth day (Th) of the third week of October If you left the 'D', it would be set to 1 as well. The code can be seen here: https://github.com/S-Man42/postgres/commit/534e6bd70e23864f385d60ecf187496c7f4387c9 I hope, keeping the code style of the surrounding code (especially the ISO code) is ok for you. Now the questions: 1. Although the ignorance of the 'D' pattern is well documented, does the new behaviour might be interesting for you? 2. Does it work as you'd expect it? 3. Because this could be my very first contribution to the PostgreSQL code base, I really want you to be as critical as possible. I am not quite sure if I didn't miss something important. 4. Currently something like '2019-1-8' does not throw an exception but results in the same as '2019-2-1' (8th is the same as the 1st of the next week). On the other hand, currently, the ISO week conversion gives out the result of '2019-1-7' for every 'D' >= 7. I am not sure if this is better. I think a consistent exception handling should be discussed separately (date roll over vs. out of range exception vs. ISO week behaviour) So far, I am very curious about your opinions! Kind regards, Mark/S-Man42
Re: let's make the list of reportable GUCs configurable (was Re: Add %r substitution for psql prompts to show recovery status)
On Fri, 12 Jul 2019 at 01:27, Robert Haas wrote: > > We have steadfastly refused to provide protocol-level tools for things > like "please change my user ID, and don't let anyone change it again > via SQL," and that's a huge problem for things like connection poolers > which can't parse all the SQL flowing through the connection (because > figuring out what it does requires solving the Halting Problem) and > wouldn't want to if they could for performance reasons. I think that's > a huge mistake. I very strongly agree. The inability to limit SET and RESET of SESSION AUTHORIZATION and ROLE is a huge pain point and it's far from the only one. -- Craig Ringer http://www.2ndQuadrant.com/ 2ndQuadrant - PostgreSQL Solutions for the Enterprise
Re: let's make the list of reportable GUCs configurable (was Re: Add %r substitution for psql prompts to show recovery status)
On Thu, 11 Jul 2019 at 04:34, Tom Lane wrote: > Robert Haas writes: > > On Wed, Jul 10, 2019 at 9:59 AM Dave Cramer wrote: > >> I'm still a bit conflicted about what to do with search_path as I do > believe this is potentially a security issue. > >> It may be that we always want to report that and possibly back patch it. > > > I don't see that as a feasible option unless we make the logic that > > does the reporting smarter. If it changes transiently inside of a > > security-definer function, and then changes back, my recollection is > > that right now we would report both changes. I think that could cause > > a serious efficiency problem if you are calling such a function in a > > loop. > > And, even more to the point, what's the client side going to do with > the information? If there was a security problem inside the > security-definer function, it's too late. And the client can't do > much about it anyway. > Yep. For search_path I definitely agree. In other places I've (ab)used GUC_REPORT to push information back to the client as a workaround for the lack of protocol extensibility / custom messages. It's a little less ugly than abusing NOTICE messages. I'd prefer a real way to add optional/extension messages, but in the absence of that extension-defined GUCs may have good reasons to want to report multiple changes within a single statement/function/etc. With that said, most of the time I think we could argue that it's reasonable to fire a GUC_REPORT if the *top-level* GUC nestlevel values change. That'd solve the search_path spam issue while still giving Dave what he wants, amongst other things. BTW, a good argument for the client wanting to be notified of search_path changes might be for clients that want to cache name=>oid mappings for types, relations, etc. The JDBC driver would be able to benefit from that if it could trust that the same name would map to the same type (for a top-level statement) in future, but currently it can't. -- Craig Ringer http://www.2ndQuadrant.com/ 2ndQuadrant - PostgreSQL Solutions for the Enterprise
Re: Standby accepts recovery_target_timeline setting?
Greetings, * Fujii Masao (masao.fu...@gmail.com) wrote: > On Tue, Oct 8, 2019 at 10:58 PM Stephen Frost wrote: > > Having these options end up set but then hacking all of the other code > > that looks at them to check if we're actually in recovery or not would > > end up being both confusing to users as well as an ongoing source of > > bugs (which has already been made clear by the fact that we're having > > this discussion...). Wouldn't that also mean we would need to hack the > > 'show' code, to blank out the recovery_target_name variable if we aren't > > in recovery? Ugh. > > Isn't this overkill? This doesn't seem the problem only for recovery-related > settings. We have already have the similar issue with other settings. > For example, log_directory parameter is ignored when logging_collector is > not enabled. But SHOW log_directory reports the setting value even when > logging_collector is disabled. This seems the similar issue and might be > confusing, but we could live with that. I agree it's a similar issue. I disagree that it's actually sensible for us to do so and would rather contend that it's confusing and not good. We certainly do a lot of smart things in PG, but showing the value of variables that aren't accurate, and we *know* they aren't, hardly seems like something we should be saying "this is good and ok, so let's do more of this." I'd rather argue that this just shows that we need to come up with a solution in this area. I don't think it's *as* big of a deal when it comes to logging_collector/log_directory because, at least there, you don't even start to get into the same code paths where it matters, like you end up doing with the recovery targets and crash recovery, so the chances of bugs creeping in are less in the log_directory case. I still don't think it's great though and, yes, would prefer that we avoid having log_directory set when logging_collector is in use. Thanks, Stephen signature.asc Description: PGP signature
pg_init
I want to read pg_database from pg_init... Is using heap_open() is possible? or else any other way is there ?
Re: pg_init
On Tue, Oct 08, 2019 at 10:03:03PM +0530, Natarajan R wrote: I want to read pg_database from pg_init... Is using heap_open() is possible? or else any other way is there ? This is way too vague question - I have no idea what you mean by pg_init, for example. And it's probably a good idea to explain what you're trying to achieve. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pg_init
Tomas Vondra wrote: > On Tue, Oct 08, 2019 at 10:03:03PM +0530, Natarajan R wrote: > >I want to read pg_database from pg_init... > > > >Is using heap_open() is possible? or else any other way is there ? > > This is way too vague question - I have no idea what you mean by > pg_init, for example. And it's probably a good idea to explain what > you're trying to achieve. This question was familiar to me so I searched the archives. It seems related to https://www.postgresql.org/message-id/17058.1570166272%40sss.pgh.pa.us -- Antonin Houska Web: https://www.cybertec-postgresql.com
Re: PATCH: Add uri percent-encoding for binary data
On Mon, Oct 7, 2019 at 9:52 PM Bruce Momjian wrote: > > On Mon, Oct 7, 2019 at 09:14:38AM +0200, Anders Åstrand wrote: > > Hello > > > > Attached is a patch for adding uri as an encoding option for > > encode/decode. It uses what's called "percent-encoding" in rfc3986 > > (https://tools.ietf.org/html/rfc3986#section-2.1). > > Oh, that's a cool idea. Can you add it to the commit-fest? > > https://commitfest.postgresql.org/25/ > > Thanks for your reply! I added it but was unsure of what topic was appropriate and couldn't find a description of them anywhere. I went with Miscellaneous for now.
Re: Transparent Data Encryption (TDE) and encrypted files
Greetings, * Magnus Hagander (mag...@hagander.net) wrote: > Unless we are *absolutely* certain, I bet someone will be able to find a > side-channel that somehow leaks some data or data-about-data, if we don't > encrypt everything. If nothing else, you can get use patterns out of it, > and you can make a lot from that. (E.g. by whether transactions are using > multixacts or not you can potentially determine which transaction they are, > if you know what type of transactions are being issued by the application. > In the simplest case, there might be a single pattern where multixacts end > up actually being used, and in that case being able to see the multixact > data tells you a lot about the system). Thanks for bringing up the concern but this still doesn't strike me, at least, as being a huge gaping hole that people will have large issues with. In other words, I don't agree that this is a high bandwidth side channel and I don't think that it, alone, brings up a strong need to encrypt clog and multixact. > As for other things -- by default, we store the log files in text format in > the data directory. That contains *loads* of sensitive data in a lot of > cases. Will those also be encrypted? imv, this is a largely independent thing, as I said elsewhere, and has its own set of challenges and considerations to deal with. Thanks, Stephen signature.asc Description: PGP signature
Re: Non-null values of recovery functions after promote or crash of primary
Greetings, * Martín Marqués (mar...@2ndquadrant.com) wrote: > pg_last_wal_receive_lsn() > pg_last_wal_replay_lsn() > pg_last_xact_replay_timestamp() > > Under normal circumstances we would expect to receive NULLs from all > three functions on a primary node, and code comments back up my thoughts. Agreed. > The problem is, what if the node is a standby which was promoted without > restarting, or that had to perform crash recovery? > > So during the time it's recovering the values in ` XLogCtl` are updated > with recovery information, and once the recovery finishes, due to crash > recovery reaching a consistent state, or a promotion of a standby > happening, those values are not reset to startup defaults. > > That's when you start seeing non-null values returned by > `pg_last_wal_replay_lsn()`and `pg_last_xact_replay_timestamp()`. > > Now, I don't know if we should call this a bug, or an undocumented > anomaly. We could fix the bug by resetting the values from ` XLogCtl` > after finishing recovery, or document that we might see non-NULL values > in certain cases. IMV, and not unlike other similar cases I've talked about on another thread, these should be cleared when the system is promoted as they're otherwise confusing and nonsensical. Thanks, Stephen signature.asc Description: PGP signature
Re: PATCH: Add uri percent-encoding for binary data
On Mon, Oct 7, 2019 at 11:38 PM Isaac Morland wrote: > > On Mon, 7 Oct 2019 at 03:15, Anders Åstrand wrote: >> >> Hello >> >> Attached is a patch for adding uri as an encoding option for >> encode/decode. It uses what's called "percent-encoding" in rfc3986 >> (https://tools.ietf.org/html/rfc3986#section-2.1). >> >> The background for this patch is that I could easily build urls in >> plpgsql, but doing the actual encoding of the url parts is painfully >> slow. The list of available encodings for encode/decode looks quite >> arbitrary to me, so I can't see any reason this one couldn't be in >> there. >> >> In modern web scenarios one would probably most likely want to encode >> the utf8 representation of a text string for inclusion in a url, in >> which case correct invocation would be ENCODE(CONVERT_TO('some text in >> database encoding goes here', 'UTF8'), 'uri'), but uri >> percent-encoding can of course also be used for other text encodings >> and arbitrary binary data. > > > This seems like a useful idea to me. I've used the equivalent in Python and > it provides more options: > > https://docs.python.org/3/library/urllib.parse.html#url-quoting > > I suggest reviewing that documentation there, because there are a few details > that need to be checked carefully. Whether or not space should be encoded as > plus and whether certain byte values should be exempt from %-encoding is > something that depends on the application. Unfortunately, as far as I can > tell there isn't a single version of URL encoding that satisfies all > situations (thus explaining the complexity of the Python implementation). It > might be feasible to suppress some of the Python options (I'm wondering about > the safe= parameter) but I'm pretty sure you at least need the equivalent of > quote and quote_plus. Thanks a lot for your reply! I agree that some (but not all) of the options available to that python lib could be helpful for developers wanting to build urls without having to encode the separate parts of it and stitching it together, but not necessary for this patch to be useful. For generic uri encoding the slash (/) must be percent encoded, because it has special meaning in the standard. Some other extra characters may appear unencoded though depending on context, but it's generally safer to just encode them all and not hope that the encoder will know about the context and skip over certain characters. This does bring up an interesting point however. Maybe decode should validate that only characters that are allowed unencoded appear in the input? Luckily, the plus-encoding of spaces are not part of the uri standard at all but instead part of the format referred to as application/x-www-form-urlencoded data. Fortunately that format is close to dying now that forms more often post json. Regards, Anders
Re: v12 and pg_restore -f-
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Andrew Gierth writes: > > "Tom" == Tom Lane writes: > > Tom> Perhaps we could change the back branches so that they interpret > > Tom> "-f -" as "write to stdout", but without enforcing that you use > > Tom> that syntax. > > > We should definitely do that. I agree that this would be a reasonable course of action. Really, it should have always meant that... > > Tom> Alternatively, we could revert the v12 behavior change. On the > > Tom> whole that might be the wiser course. I do not think the costs and > > Tom> benefits of this change were all that carefully thought through. > > > Failing to specify -d is a _really fricking common_ mistake for > > inexperienced users, who may not realize that the fact that they're > > seeing a ton of SQL on their terminal is not the normal result. > > Seriously, this comes up on a regular basis on IRC (which is why I > > suggested initially that we should do something about it). > > No doubt, but that seems like a really poor excuse for breaking > maintenance scripts in a way that basically can't be fixed. Even > with the change suggested above, scripts couldn't rely on "-f -" > working anytime soon, because you couldn't be sure whether a > back-rev pg_restore had the update or not. Maintenance scripts break across major versions. We completely demolished everything around how recovery works, and some idea that you could craft up something easy that would work in a backwards-compatible way is outright ridiculous, so I don't see why we're so concerned about a change to how pg_restore works here. > The idea I'm leaning to after more thought is that we should change > *all* the branches to accept "-f -", but not throw an error if you > don't use it. Several years from now, we could put the error back in; > but not until there's a plausible argument that nobody is running > old versions of pg_restore anymore. No, I don't agree with this, at all. Thanks, Stephen signature.asc Description: PGP signature
Re: abort-time portal cleanup
Hi, On 2019-10-07 12:14:52 -0400, Robert Haas wrote: > > - if (portal->status == PORTAL_READY) > > - MarkPortalFailed(portal); > > > > Why it is safe to remove this check? It has been explained in commit > > 7981c342 why we need that check. I don't see any explanation in email > > or patch which justifies this code removal. Is it because you removed > > PortalCleanup? If so, that is still called from PortalDrop? > > All MarkPortalFailed() does is change the status to PORTAL_FAILED and > call the cleanup hook. PortalDrop() calls the cleanup hook, and we > don't need to change the status if we're removing it completely. Note that currently PortalCleanup() behaves differently depending on whether the portal is set to failed or not... - Andres
Re: Transparent Data Encryption (TDE) and encrypted files
On Tue, Oct 8, 2019 at 7:52 AM Antonin Houska wrote: > * Temporary files (buffile.c): we derive the IV from PID of the process that > created the file + segment number + block within the segment. This > information does not change if you need to write the same block again. If > new IV should be used for each encryption run, we can simply introduce an > in-memory counter that generates the IV for each block. However it becomes > trickier if the temporary file is shared by multiple backends. I think it > might still be easier to expose the IV values to other backends via shared > memory than to store them on disk ... > > * "Buffered transient file". This is to be used instead of OpenTransientFile() > if user needs the option to encrypt the file. (Our patch adds this API to > buffile.c. Currently we use it in reorderbuffer.c to encrypt the data > changes produced by logical decoding, but there should be more use cases.) > > In this case we cannot keep the IVs in memory because user can close the > file anytime and open it much later. So we derive the IV by hashing the file > path. However if we should generate the IV again and again, we need to store > it on disk in another way, probably one IV value per block (PGAlignedBlock). > > However since our implementation of both these file types shares some code, > it might yet be easier if the shared temporary file also stored the IV on > disk instead of exposing it via shared memory ... > > Perhaps this is what I can work on, but I definitely need some feedback. I think this would be a valuable thing upon which to work. I'm not sure exactly what the right solution is, but it seems to me that it would be a good thing if we tried to reuse the same solution in as many places as possible. I don't know if it's realistic to use the same method for storing IVs for temporary/transient files as we do for SLRUs, but it would be nice if it were. I think that one problem with trying to store the data in memory is that these files get big enough that N bytes/block could still be pretty big. For instance, if you're sorting 100GB of data with 8GB of work_mem, you'll need to write 13 tapes and then merge them. Supposing an IV of 12 bytes/block, the IV vector for each 8GB tape will be 12MB, so once you've written all 12 types and are ready to merge them, you're going to have 156MB of IV data floating around. If you keep it in memory, it ought to count against your work_mem budget, and while it's not a big fraction of your available memory, it's also not negligible. Worse (but less realistic) cases can also be constructed. To avoid this kind of problem, you could write the IV data to disk. But notice that tuplesort.c goes to a lot of work to make I/O sequential, and that helps performance. If you have to intersperse reads of separate IV files with the reads of the main data files, you're going to degrade the I/O pattern. It would really be best if the IVs were in line with the data itself, I think. (The same probably applies, and for not unrelated reasons, to SLRU data, if we're going to try to encrypt that.) Now, if you could store some kind of an IV "seed" where we only need one per buffile rather than one per block, then that'd probably be fine to story in memory. But I don't see how that would work given that we can overwrite already-written blocks and need a new IV if we do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Non-null values of recovery functions after promote or crash of primary
Hi, > IMV, and not unlike other similar cases I've talked about on another > thread, these should be cleared when the system is promoted as they're > otherwise confusing and nonsensical. Keep in mind that this also happens when the server crashes and has to perform crash recovery. In that case the server was always a primary. -- Martín Marquéshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: pgsql: Remove pqsignal() from libpq's official exports list.
Christoph Berg writes: > Re: Tom Lane 2018-09-28 >> Remove pqsignal() from libpq's official exports list. > This is starting to hurt in several places: > 04 11:41 mha@xindi:~$ psql > 04 11:41 /usr/lib/postgresql/9.2/bin/psql: symbol lookup error: >/usr/lib/postgresql/9.2/bin/psql: undefined symbol: > pqsignal I poked into this a little. Reviewing the commit history, pqsignal() was a part of libpq (so far as frontend code is concerned) up until 9.3, when commit da5aeccf6 moved it into libpgport. Since then we've expected client programs to get it from libpgport not libpq, and indeed they do so AFAICT --- I can reproduce the described failure with 9.2 and below psql linking to current libpq.so, but not with 9.3 and up. libpq itself indeed has no need for pqsignal at all, if --enable-thread-safety is set, which it usually is these days. I also notice that we've made at least one nontrivial semantics change to pqsignal: commit 873ab9721 made it use SA_RESTART for SIGALRM handlers, which it did not do before 9.3. So really, none of the post-9.2 versions of libpq have faithfully duplicated what an older client would expect from pqsignal. This isn't at all academic, because I see that pgbench uses pqsignal(SIGALRM,...), and so does pg_test_fsync. Now, I don't see any indication that we've adjusted either of those programs for the different behavior, so maybe it's fine. But we've been treating this function as strictly internal, and so I'm not pleased with the idea of putting it back into the exported symbol list. I'm especially not pleased with doing so to support pre-9.3 client programs. Those have been below our support horizon for some time; notably, they (presumably) haven't been patched for CVE-2018-1058. Why are you still shipping them in current OS releases? > pg_repack linked against libpq5 11 breaks with libpq5 12: This probably means it needs to be linked with libpgport not only libpq. Having said all that, if we conclude we can't break compatibility with this legacy code quite yet, I'd be inclined to put a separate, clearly-marked-as-legacy-code version of pqsignal() back into libpq, using the pre-9.3 SA_RESTART semantics. But I'd like some pretty well-defined sunset time for that, because it'd just be trouble waiting to happen. When are you going to remove 9.2 psql? regards, tom lane
Re: Transparent Data Encryption (TDE) and encrypted files
Hello. On Tue, Oct 8, 2019 at 8:52 PM Antonin Houska wrote: > > Robert Haas wrote: > > > On Mon, Oct 7, 2019 at 3:01 PM Antonin Houska wrote: > > > However the design doesn't seem to be stable enough at the > > > moment for coding to make sense. > > > > Well, I think the question is whether working further on your patch > > could produce some things that everyone would agree are a step > > forward. > > It would have made a lot of sense several months ago (Masahiko Sawada actually > used parts of our patch in the previous version of his patch (see [1]), but > the requirement to use a different IV for each execution of the encryption > changes things quite a bit. > > Besides the relation pages and SLRU (CLOG), which are already being discussed > elsewhere in the thread, let's consider other two file types: > > * Temporary files (buffile.c): we derive the IV from PID of the process that > created the file + segment number + block within the segment. This > information does not change if you need to write the same block again. If > new IV should be used for each encryption run, we can simply introduce an > in-memory counter that generates the IV for each block. However it becomes > trickier if the temporary file is shared by multiple backends. I think it > might still be easier to expose the IV values to other backends via shared > memory than to store them on disk ... I think encrypt a temporary file in a slightly different way. Previously, I had a lot of trouble with IV uniqueness, but I have proposed a unique encryption key for each file. First, in the case of the CTR mode to be used, 32 bits are used for the counter in the 128-bit nonce value. Here, the counter increases every time 16 bytes are encrypted, and theoretically, if nonce 96 bits are the same, a total of 64 GiB can be encrypted. Therefore, in the case of buffile.c that creates a temporary file due to lack of work_mem, it is possible to use up to 1GiB per file, so it is possible to encrypt to a simple IV value sufficiently safely. The problem is that a vulnerability occurs when 96-bit nonce values excluding Counter are the same values. I also tried to generate IV using PID (32bit) + tempCounter (64bit) at first, but in the worst-case PID and tempCounter are used in the same values. Therefore, the uniqueness of the encryption key was considered without considering the uniqueness of the IV value. The encryption key uses a separate key for each file, as described earlier. First, it generates a hash value randomly for the file, and uses the hash value and KEK (or MDEK) to derive and use the key with HMAC-SHA256. In this case, there is no need to store the encryption key separately if it is not necessary to keep it in a separate IV file or memory. (IV is a hash value of 64 bits and a counter of 32 bits.) Also, currently, the temporary file name is specified by the current PID.tempFileCounter, but if this is set to PID.tempFileCounter.hashvalue, we can encrypt and decrypt in any process thinking about. Reference URL https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption > > * "Buffered transient file". This is to be used instead of OpenTransientFile() > if user needs the option to encrypt the file. (Our patch adds this API to > buffile.c. Currently we use it in reorderbuffer.c to encrypt the data > changes produced by logical decoding, but there should be more use cases.) Agreed. Best regards. Moon. > > In this case we cannot keep the IVs in memory because user can close the > file anytime and open it much later. So we derive the IV by hashing the file > path. However if we should generate the IV again and again, we need to store > it on disk in another way, probably one IV value per block (PGAlignedBlock). > > However since our implementation of both these file types shares some code, > it might yet be easier if the shared temporary file also stored the IV on > disk instead of exposing it via shared memory ... > > Perhaps this is what I can work on, but I definitely need some feedback. > > [1] > https://www.postgresql.org/message-id/CAD21AoBjrbxvaMpTApX1cEsO=8N=nc2xVZPB0d9e-VjJ=ya...@mail.gmail.com > > -- > Antonin Houska > Web: https://www.cybertec-postgresql.com > >
Re: [HACKERS] Block level parallel vacuum
On Sat, Oct 5, 2019 at 4:36 PM Dilip Kumar wrote: > > Few more comments > > > 1. > +static int > +compute_parallel_workers(Relation onerel, int nrequested, int nindexes) > +{ > + int parallel_workers; > + bool leaderparticipates = true; > > Seems like this function is not using onerel parameter so we can remove this. > Fixed. > > 2. > + > + /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */ > + maxtuples = compute_max_dead_tuples(nblocks, true); > + est_deadtuples = MAXALIGN(add_size(SizeOfLVDeadTuples, > +mul_size(sizeof(ItemPointerData), maxtuples))); > + shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples); > + shm_toc_estimate_keys(&pcxt->estimator, 1); > + > + /* Finally, estimate VACUUM_KEY_QUERY_TEXT space */ > + querylen = strlen(debug_query_string); > > for consistency with other comments change > VACUUM_KEY_QUERY_TEXT to PARALLEL_VACUUM_KEY_QUERY_TEXT > Fixed. > > 3. > @@ -2888,6 +2888,8 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map, > (!wraparound ? VACOPT_SKIP_LOCKED : 0); > tab->at_params.index_cleanup = VACOPT_TERNARY_DEFAULT; > tab->at_params.truncate = VACOPT_TERNARY_DEFAULT; > + /* parallel lazy vacuum is not supported for autovacuum */ > + tab->at_params.nworkers = -1; > > What is the reason for the same? Can we explain in the comments? I think it's just that we don't want to support parallel auto vacuum because it can consume more CPU resources in spite of background job, which might be an unexpected behavior of autovacuum. I've changed the comment. Regards, -- Masahiko Sawada
Re: [HACKERS] Block level parallel vacuum
On Fri, Oct 4, 2019 at 8:55 PM vignesh C wrote: > > On Fri, Oct 4, 2019 at 4:18 PM Amit Kapila wrote: > > > > On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada > > wrote: > >> > >> On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila > >> wrote: > >> > > One comment: Thank you for reviewing this patch. > We can check if parallel_workers is within range something within > MAX_PARALLEL_WORKER_LIMIT. > + int parallel_workers = 0; > + > + if (optarg != NULL) > + { > + parallel_workers = atoi(optarg); > + if (parallel_workers <= 0) > + { > + pg_log_error("number of parallel workers must be at least 1"); > + exit(1); > + } > + } > Fixed. Regards, -- Masahiko Sawada
Re: format of pg_upgrade loadable_libraries warning
On Mon, Oct 7, 2019 at 01:47:57PM -0400, Bruce Momjian wrote: > On Fri, Oct 4, 2019 at 11:55:21PM -0400, Tom Lane wrote: > > Bruce Momjian writes: > > > On Fri, Oct 4, 2019 at 05:40:08PM -0500, Justin Pryzby wrote: > > >> I would argue to include in 12.1, since 12 is what most everyone will > > >> use for > > >> upgrades, and patch for .1 will help people upgrading for 11 of the next > > >> 12 > > >> months. (But, your patch is more general than mine). > > > > > No, there might be tools that depend on the existing format, and this is > > > the first report of confusion I have read. > > > > Translations will also lag behind any such change. Speaking of which, > > it might be a good idea to include some translator: annotations to > > help translators understand what context these fragmentary phrases > > are used in. I'd actually say that my biggest concern with these > > messages is whether they can translate into something sane. > > Uh, I looked at the pg_ugprade code and the error message is: > > pg_fatal("Your installation contains \"contrib/isn\" functions which > rely on the\n" > "bigint data type. Your old and new clusters pass bigint > values\n" > "differently so this cluster cannot currently be upgraded. > You can\n" > "manually upgrade databases that use \"contrib/isn\" > facilities and remove\n" > "\"contrib/isn\" from the old cluster and restart the > upgrade. A list of\n" > "the problem functions is in the file:\n" > "%s\n\n", output_path); > > and the "in database" (which I have changed to capitalized "In database" > in the attached patch), looks like: > >fprintf(script, "In database: %s\n", active_db->db_name); > > meaning it _isn't_ an output error message, but rather something that > appears in an error file. I don't think either of these are translated. > Is that wrong? Patch applied to head. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
WIP: raise error when submitting invalid ALTER SYSTEM command
Hi all, ALTER SYSTEM currently does not raise error upon invalid entry. Take for example: alter system set superuser_reserved_connections = 10; > ALTER SYSTEM alter system set max_connections = 5; > ALTER SYSTEM The database will now fail to restart without manual intervention by way of editing the autoconf file (which says "# Do not edit this file manually!" on its first line). The attached WIP patch is intended to raise an error on invalid ALTER SYSTEM commands before being written out to the filesystem. Hopefully this behavior is more intuitive. Thanks -- Jordan Deitch https://id.rsa.pub/ 001-alter-system-raise-error-on-invalid-combination.patch Description: Binary data
Re: maintenance_work_mem used by Vacuum
On Tue, Oct 8, 2019 at 2:45 PM Amit Kapila wrote: > > On Tue, Oct 8, 2019 at 1:48 AM Peter Geoghegan wrote: > > > > On Mon, Oct 7, 2019 at 12:28 PM Robert Haas wrote: > > > I would say that sucks, because it makes it harder to set > > > maintenance_work_mem correctly. Not sure how hard it would be to fix, > > > though. > > > > ginInsertCleanup() may now be the worst piece of code in the entire > > tree, so no surprised that it gets this wrong too. > > > > 2016's commit e2c79e14d99 ripped out the following comment about the > > use of maintenance_work_mem by ginInsertCleanup(): > > > > @@ -821,13 +847,10 @@ ginInsertCleanup(GinState *ginstate, > > * Is it time to flush memory to disk? Flush if we are at the end of > > * the pending list, or if we have a full row and memory is getting > > * full. > > - * > > - * XXX using up maintenance_work_mem here is probably unreasonably > > - * much, since vacuum might already be using that much. > > */ > > > > ISTM that the use of maintenance_work_mem wasn't given that much > > thought originally. > > > > One idea to something better could be to check, if there is a GIN > index on a table, then use 1/4 (25% or whatever) of > maintenance_work_mem for GIN indexes and 3/4 (75%) of > maintenance_work_mem for collection dead tuples. > I felt that it would not be easy for users to tune maintenance_work_mem which controls more than one things. If this is an index AM(GIN) specific issue we might rather want to control the memory limit of pending list cleanup by a separate GUC parameter like gin_pending_list_limit, say gin_pending_list_work_mem. And we can either set the (the memory for GIN pending list cleanup / # of GIN indexes) to the parallel workers. Regards, -- Masahiko Sawada
Re: WIP: raise error when submitting invalid ALTER SYSTEM command
"Jordan Deitch" writes: > ALTER SYSTEM currently does not raise error upon invalid entry. You mean on invalid combinations of entries. Take for example: > alter system set superuser_reserved_connections = 10; > ALTER SYSTEM > alter system set max_connections = 5; > ALTER SYSTEM > The database will now fail to restart without manual intervention by way of > editing the autoconf file (which says "# Do not edit this file manually!" on > its first line). Yeah. That's unfortunate, but ... > The attached WIP patch is intended to raise an error on invalid ALTER SYSTEM > commands before being written out to the filesystem. Hopefully this behavior > is more intuitive. There's no chance that you can make this work. We've had unpleasant experiences with previous attempts to implement cross-checks between GUC variables; in general, they created more problems than they fixed. A specific issue with what you've got here is that it only checks values that are proposed to be put into postgresql.auto.conf, without regard to other value sources such as postgresql.conf or built-in defaults. You also managed to break the system's defenses against invalid combinations that arise from such other sources --- taking out those error checks in PostmasterMain is completely unsafe. Also, even if you believe that O(N^2) behavior isn't a problem, this programming approach doesn't scale to cases where more than two variables contribute to an issue. Somewhere around O(N^3) or O(N^4) there is definitely going to be a threshold of pain. This aspect doesn't seem that hard to fix ... but it's just an efficiency issue, and doesn't speak at all to the fundamental problem that you don't have enough visibility into what the next postmaster run will be seeing. Also, from a code maintenance standpoint, having code in AlterSystemSetConfigFile that tries to know all about not only specific GUCs, but every possible combination of specific GUCs, is just not going to be maintainable. (The real underlying problem there is that those checks in PostmasterMain are merely the tip of the iceberg of error conditions that might cause a postmaster startup failure.) regards, tom lane
Re: Transparent Data Encryption (TDE) and encrypted files
Dear hackers. First, I don't know which email thread should written a reply, therefore using the first email thread. Sorry about the inconvenience... Sawada-san and I have previously researched the PostgreSQL database cluster file that contains user data. The result has been updated to the WIKI page[1], so share it here. This result is simply a list of files that contain user data, so we can think of it as the first step in classifying which files are encrypted. About the SLUR file that we have talked about so far, I think that discussions are in progress on the necessity of encryption, and I hope that this discussion will be useful. #In proceeding with the current development, we specified an encrypted file using the list above. If the survey results are different, it would be a help for this project if correct to the WIKI page. [1] https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#List_of_the_contains_of_user_data_for_PostgreSQL_files Best regards. Moon. On Tue, Oct 1, 2019 at 6:26 AM Bruce Momjian wrote: > > For full-cluster Transparent Data Encryption (TDE), the current plan is > to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem > overflow). The plan is: > > > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption > > We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact, or > other files. Is that correct? Do any other PGDATA files contain user > data? > > -- > Bruce Momjian http://momjian.us > EnterpriseDB http://enterprisedb.com > > + As you are, so once was I. As I am, so you will be. + > + Ancient Roman grave inscription + > >
Re: Transparent Data Encryption (TDE) and encrypted files
Moon, Insung wrote: > Hello. > > On Tue, Oct 8, 2019 at 8:52 PM Antonin Houska wrote: > > > > Robert Haas wrote: > > > > > On Mon, Oct 7, 2019 at 3:01 PM Antonin Houska wrote: > > > > However the design doesn't seem to be stable enough at the > > > > moment for coding to make sense. > > > > > > Well, I think the question is whether working further on your patch > > > could produce some things that everyone would agree are a step > > > forward. > > > > It would have made a lot of sense several months ago (Masahiko Sawada > > actually > > used parts of our patch in the previous version of his patch (see [1]), but > > the requirement to use a different IV for each execution of the encryption > > changes things quite a bit. > > > > Besides the relation pages and SLRU (CLOG), which are already being > > discussed > > elsewhere in the thread, let's consider other two file types: > > > > * Temporary files (buffile.c): we derive the IV from PID of the process that > > created the file + segment number + block within the segment. This > > information does not change if you need to write the same block again. If > > new IV should be used for each encryption run, we can simply introduce an > > in-memory counter that generates the IV for each block. However it becomes > > trickier if the temporary file is shared by multiple backends. I think it > > might still be easier to expose the IV values to other backends via shared > > memory than to store them on disk ... > > I think encrypt a temporary file in a slightly different way. > Previously, I had a lot of trouble with IV uniqueness, but I have > proposed a unique encryption key for each file. > > First, in the case of the CTR mode to be used, 32 bits are used for > the counter in the 128-bit nonce value. > Here, the counter increases every time 16 bytes are encrypted, and > theoretically, if nonce 96 bits are the same, a total of 64 GiB can be > encrypted. > Therefore, in the case of buffile.c that creates a temporary file due > to lack of work_mem, it is possible to use up to 1GiB per file, so it > is possible to encrypt to a simple IV value sufficiently safely. > The problem is that a vulnerability occurs when 96-bit nonce values > excluding Counter are the same values. I don't think the lower 32 bits impose any limitation, see CRYPTO_ctr128_encrypt_ctr32() in OpenSSL: if this lower part overflows, the upper part is simply incremented. So it's up to the user to decide what portion of the IV he wants to control and what portion should be controlled by OpenSSL internally. Of course the application design should be such that no overflows into the upper (user specific) part occur because those would result in duplicate IVs. > I also tried to generate IV using PID (32bit) + tempCounter (64bit) at > first, but in the worst-case PID and tempCounter are used in the same > values. > Therefore, the uniqueness of the encryption key was considered without > considering the uniqueness of the IV value. If you consider 64bit counter insufficient (here it seems that tempCounter counts the 1GB segments), then we can't even use LSN as the IV for relation pages. > The encryption key uses a separate key for each file, as described earlier. Do you mean a separate key for the whole temporary file, or for a single (1GB) segment? > First, it generates a hash value randomly for the file, and uses the > hash value and KEK (or MDEK) to derive and use the key with > HMAC-SHA256. > In this case, there is no need to store the encryption key separately > if it is not necessary to keep it in a separate IV file or memory. > (IV is a hash value of 64 bits and a counter of 32 bits.) You seem to miss the fact that user of buffile.c can seek in the file and rewrite arbitrary part. Thus you'd have to generate a new key for the part being changed. I think it's easier to use the same key for the whole 1GB segment if not for the whole temporary file, and generate an unique IV each time we write a chung (BLCKSZ bytes). -- Antonin Houska Web: https://www.cybertec-postgresql.com
RE: Global shared meta cache
Hi, Konstantin >>From: Konstantin Knizhnik [mailto:k.knizh...@postgrespro.ru] >>I do not completely understand from your description when are are going >>to evict entry from local cache? >>Just once transaction is committed? I think it will be more efficient >>to also specify memory threshold for local cache size and use LRU or >>some other eviction policy to remove data from local cache. >>So if working set (accessed relations) fits in local cache limit, there >>will be no performance penalty comparing with current implementation. >>There should be completely on difference on pgbench or other benchmarks >>with relatively small number of relations. >> >>If entry is not found in local cache, then we should look for it in >>global cache and in case of double cache miss - read it from the disk. >>I do not completely understand why we need to store references to >>global cache entries in local cache and use reference counters for global >>cache >entries. >>Why we can not maintain just two independent caches? >> >>While there are really databases with hundreds and even thousands of >>tables, application is still used to work with only some small subset of them. >>So I think that "working set" can still fit in memory. This is why I >>think that in case of local cache miss and global cache hit, we should >>copy data from global cache to local cache to make it possible to access it >>in future >without any sycnhronization. >> >>As far as we need to keep all uncommitted data in local cache, there is >>still a chance of local memory overflow (if some transaction creates or >>alters too much number of tables). >>But I think that it is very exotic and rare use case. The problem with >>memory overflow usually takes place if we have large number of >>backends, each maintaining its own catalog cache. >>So I think that we should have "soft" limit for local cache and "hard" >>limit for global cache. > >Oh, I didn't come up this idea at all. So local cache is sort of 1st cache and >global cache >is second cache. That sounds great. >It would be good for performance and also setting two guc parameter for >limiting local >cache and global cache gives complete memory control for DBA. >Yeah, uncommitted data should be in local but it's the only exception. >No need to keep track of reference to global cache from local cache header >seems less >complex for implementation. I'll look into the design. (After sleeping on it) What happens if there is a cache miss in local memory and it's found in global? One possible way is to copy the found global cache into local memory. If so, I'm just anxious about the cost of memcpy. Another way is, for example, leaving the global cache and not copying it into local memory. In this case, every time searching the global cache seems expensive because we need to get lock for at least the partition of hash table. The architecture that the local cache holding the reference to global cache (strictly speaking, holding the pointer to pointer to global cache ) is complex but once a process searches global cache, after that it can get global cache by checking the reference is still valid and traversing some pointers. Regards, Takeshi Ideriha
Re: Ordering of header file inclusion
On Tue, Oct 8, 2019 at 8:19 PM Tom Lane wrote: > > Amit Kapila writes: > > On Wed, Oct 2, 2019 at 2:57 PM vignesh C wrote: > >> I noticed that some of the header files inclusion is not ordered as > >> per the usual standard that is followed. > >> The attached patch contains the fix for the order in which the header > >> files are included. > >> Let me know your thoughts on the same. > > > +1. > > FWIW, I'm not on board with reordering system-header inclusions. > Some platforms have (had?) ordering dependencies for those, and where > that's true, it's seldom alphabetical. It's only our own headers > where we can safely expect that any arbitrary order will work. > Okay, that makes sense. However, I noticed that ordering for system-header inclusions is somewhat random. For ex. nodeSubPlan.c, datetime.c, etc. include limits.h first and then math.h whereas knapsack.c, float.c includes them in reverse order. There could be more such inconsistencies and the probable reason is that we don't have any specific rule, so different people decide to do it differently. > > I think we shouldn't remove the extra line as part of the above change. > > I would take out the blank lines between our own #includes. > Okay, that would be better, but doing it half-heartedly as done in patch might make it worse. So, it is better to remove blank lines between our own #includes in all cases. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Standby accepts recovery_target_timeline setting?
On Wed, Oct 9, 2019 at 1:02 AM Stephen Frost wrote: > > Greetings, > > * Fujii Masao (masao.fu...@gmail.com) wrote: > > On Tue, Oct 8, 2019 at 10:58 PM Stephen Frost wrote: > > > Having these options end up set but then hacking all of the other code > > > that looks at them to check if we're actually in recovery or not would > > > end up being both confusing to users as well as an ongoing source of > > > bugs (which has already been made clear by the fact that we're having > > > this discussion...). Wouldn't that also mean we would need to hack the > > > 'show' code, to blank out the recovery_target_name variable if we aren't > > > in recovery? Ugh. > > > > Isn't this overkill? This doesn't seem the problem only for recovery-related > > settings. We have already have the similar issue with other settings. > > For example, log_directory parameter is ignored when logging_collector is > > not enabled. But SHOW log_directory reports the setting value even when > > logging_collector is disabled. This seems the similar issue and might be > > confusing, but we could live with that. > > I agree it's a similar issue. I disagree that it's actually sensible > for us to do so and would rather contend that it's confusing and not > good. > > We certainly do a lot of smart things in PG, but showing the value of > variables that aren't accurate, and we *know* they aren't, hardly seems > like something we should be saying "this is good and ok, so let's do > more of this." > > I'd rather argue that this just shows that we need to come up with a > solution in this area. I don't think it's *as* big of a deal when it > comes to logging_collector/log_directory because, at least there, you > don't even start to get into the same code paths where it matters, like > you end up doing with the recovery targets and crash recovery, so the > chances of bugs creeping in are less in the log_directory case. > > I still don't think it's great though and, yes, would prefer that we > avoid having log_directory set when logging_collector is in use. There are other parameters having the similar issue, for example, - parameters for SSL connection when ssl is disabled - parameters for autovacuum activity when autovacuum is disabled - parameters for Hot Standby when hot_standby is disabled etc Yeah, it's better to make SHOW command handle these parameters "less confusing". But I cannot wait for the solution for them before fixing the original issue in v12 (i.e., the issue where restore_command can be executed even in crash recovery). So, barring any objection, I'd like to commit the patch that I attached upthread, soon. The patch prevents restore_command and recovery_end_command from being executed in crash recovery. Thought? Regards, -- Fujii Masao
Re: Transparent Data Encryption (TDE) and encrypted files
Dear Antonin Houska. Thank you for your attention to thie matter. On Wed, Oct 9, 2019 at 2:42 PM Antonin Houska wrote: > > Moon, Insung wrote: > > > Hello. > > > > On Tue, Oct 8, 2019 at 8:52 PM Antonin Houska wrote: > > > > > > Robert Haas wrote: > > > > > > > On Mon, Oct 7, 2019 at 3:01 PM Antonin Houska wrote: > > > > > However the design doesn't seem to be stable enough at the > > > > > moment for coding to make sense. > > > > > > > > Well, I think the question is whether working further on your patch > > > > could produce some things that everyone would agree are a step > > > > forward. > > > > > > It would have made a lot of sense several months ago (Masahiko Sawada > > > actually > > > used parts of our patch in the previous version of his patch (see [1]), > > > but > > > the requirement to use a different IV for each execution of the encryption > > > changes things quite a bit. > > > > > > Besides the relation pages and SLRU (CLOG), which are already being > > > discussed > > > elsewhere in the thread, let's consider other two file types: > > > > > > * Temporary files (buffile.c): we derive the IV from PID of the process > > > that > > > created the file + segment number + block within the segment. This > > > information does not change if you need to write the same block again. > > > If > > > new IV should be used for each encryption run, we can simply introduce > > > an > > > in-memory counter that generates the IV for each block. However it > > > becomes > > > trickier if the temporary file is shared by multiple backends. I think > > > it > > > might still be easier to expose the IV values to other backends via > > > shared > > > memory than to store them on disk ... > > > > I think encrypt a temporary file in a slightly different way. > > Previously, I had a lot of trouble with IV uniqueness, but I have > > proposed a unique encryption key for each file. > > > > First, in the case of the CTR mode to be used, 32 bits are used for > > the counter in the 128-bit nonce value. > > Here, the counter increases every time 16 bytes are encrypted, and > > theoretically, if nonce 96 bits are the same, a total of 64 GiB can be > > encrypted. > > > Therefore, in the case of buffile.c that creates a temporary file due > > to lack of work_mem, it is possible to use up to 1GiB per file, so it > > is possible to encrypt to a simple IV value sufficiently safely. > > The problem is that a vulnerability occurs when 96-bit nonce values > > excluding Counter are the same values. > > I don't think the lower 32 bits impose any limitation, see > CRYPTO_ctr128_encrypt_ctr32() in OpenSSL: if this lower part overflows, the > upper part is simply incremented. So it's up to the user to decide what > portion of the IV he wants to control and what portion should be controlled by > OpenSSL internally. Of course the application design should be such that no > overflows into the upper (user specific) part occur because those would result > in duplicate IVs. I'm sorry. I seem to have misunderstood. If I rechecked the source code of OpenSSL, as you said, it is assumed that the upper 96bit value is changed using the ctr96_inc() function. Sorry.. > > > I also tried to generate IV using PID (32bit) + tempCounter (64bit) at > > first, but in the worst-case PID and tempCounter are used in the same > > values. > > Therefore, the uniqueness of the encryption key was considered without > > considering the uniqueness of the IV value. > > If you consider 64bit counter insufficient (here it seems that tempCounter > counts the 1GB segments), then we can't even use LSN as the IV for relation > pages. The worst-case here is not a lack of tempCounter, but a problem that occurs when PID is reused after a certain period. Of course, it is very unlikely to be a problem because it is a temporary file, but since the file name can know the PID and tempFileCounter, if you accumulate some data, the same key and the same IV will be used to encrypt other data. So I thought there could be a problem. > > > The encryption key uses a separate key for each file, as described earlier. > > Do you mean a separate key for the whole temporary file, or for a single (1GB) > segment? Yes, that's right. Use a separate key per file. > > > First, it generates a hash value randomly for the file, and uses the > > hash value and KEK (or MDEK) to derive and use the key with > > HMAC-SHA256. > > In this case, there is no need to store the encryption key separately > > if it is not necessary to keep it in a separate IV file or memory. > > (IV is a hash value of 64 bits and a counter of 32 bits.) > > You seem to miss the fact that user of buffile.c can seek in the file and > rewrite arbitrary part. Thus you'd have to generate a new key for the part > being changed. That's right. I wanted to ask this too. Is it possible to overwrite the data already written in the actual buffile.c? Such a problem seems to become a problem when BufFileWRite fun
Safeguards against incorrect fd flags for fsync()
Hi all, After the set of issues discussed here, it seems to me that it would be a good thing to have some safeguards against incorrect flags when opening a fd which would be used for fsync(): https://www.postgresql.org/message-id/16039-196fc97cc05e1...@postgresql.org Attached is a patch aimed at doing that. Historically O_RDONLY is 0, so when looking at a directory we just need to make sure that no write flags are used. For files, that's the contrary, a write flag has to be used. Thoughts or better ideas? Thanks, -- Michael diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c index 94be62fa6e..791afcae4a 100644 --- a/src/backend/storage/file/fd.c +++ b/src/backend/storage/file/fd.c @@ -330,6 +330,28 @@ static int fsync_parent_path(const char *fname, int elevel); int pg_fsync(int fd) { +#ifdef USE_ASSERT_CHECKING + struct stat st; + + /* + * On some operating systems fsyncing a file requires O_RDWR, and + * a directory requires O_RDONLY. Ignore any errors. + */ + if (fstat(fd, &st) == 0) + { + int desc_flags = fcntl(fd, F_GETFL); + + /* + * O_RDONLY is historically 0, so just make sure that for + * directories no write flags are used. + */ + if (!S_ISDIR(st.st_mode)) + Assert((desc_flags & (O_RDWR | O_WRONLY)) != 0); + else + Assert((desc_flags & (O_RDWR | O_WRONLY)) == 0); + } +#endif + /* #if is to skip the sync_method test if there's no need for it */ #if defined(HAVE_FSYNC_WRITETHROUGH) && !defined(FSYNC_WRITETHROUGH_IS_FSYNC) if (sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH) signature.asc Description: PGP signature
Re: [HACKERS] Deadlock in XLogInsert at AIX
On Mon, Oct 07, 2019 at 03:06:35PM -0400, Tom Lane wrote: > Noah Misch writes: > > [ fetch-add-gcc-xlc-unify-v2.patch ] > > This still fails on Apple's compilers. The first failure I get is > > ccache gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith > -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute > -Wformat-security -fno-strict-aliasing -fwrapv -g -O2 -I../../../src/include > -I/usr/local/include -isysroot /Developer/SDKs/MacOSX10.5.sdk-c -o > nodeHashjoin.o nodeHashjoin.c > /var/tmp//ccXUM8ep.s:449:Parameter error: r0 not allowed for parameter 2 > (code as 0 not r0) > > Line 449 of the assembly file is the addi in > > LM87: > sync > lwarx r0,0,r2 > addir11,r0,1 > stwcx. r11,0,r2 > bne $-12 > isync > > which I suppose comes out of PG_PPC_FETCH_ADD. I find this idea of > constructing assembly code by string-pasting pretty unreadable and am not > tempted to try to debug it, but I don't immediately see why this doesn't > work when the existing s_lock.h code does. I think that the assembler > error message is probably misleading: while it seems to be saying to > s/r0/0/ in the addi, gcc consistently uses "rN" syntax for the second > parameter elsewhere. I do note that gcc never generates r0 as addi's > second parameter in several files I checked through, so maybe what it > means is "you need to use some other register"? (Which would imply that > the constraint for this asm argument is too loose.) Thanks for testing. That error boils down to "need to use some other register". The second operand of addi is one of the ppc instruction operands that can hold a constant zero or a register number[1], so the proper constraint is "b". I've made it so and added a comment. I should probably update s_lock.h, too, in a later patch. I don't know how it has mostly-avoided this failure mode, but its choice of constraint could explain https://postgr.es/m/flat/36E70B06-2C5C-11D8-A096-0005024EF27F%40ifrance.com > I'm also wondering why this isn't following s_lock.h's lead as to > USE_PPC_LWARX_MUTEX_HINT and USE_PPC_LWSYNC. Most or all of today's pg_atomic_compare_exchange_u32() usage does not have the property that the mutex hint would signal. pg_atomic_compare_exchange_u32() specifies "Full barrier semantics", which lwsync does not provide. We might want to relax the specification to make lwsync acceptable, but that would be a separate, architecture-independent project. (generic-gcc.h:pg_atomic_compare_exchange_u32_impl() speculates along those lines, writing "FIXME: we can probably use a lower consistency model".) [1] "Notice that addi and addis use the value 0, not the contents of GPR 0, if RA=0." -- https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0 diff --git a/configure b/configure index 7a6bfc2..d6ecb33 100755 --- a/configure +++ b/configure @@ -14650,6 +14650,46 @@ $as_echo "$pgac_cv_have_ppc_mutex_hint" >&6; } $as_echo "#define HAVE_PPC_LWARX_MUTEX_HINT 1" >>confdefs.h fi +# Check if compiler accepts "i"(x) when __builtin_constant_p(x). +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether __builtin_constant_p(x) implies \"i\"(x) acceptance" >&5 +$as_echo_n "checking whether __builtin_constant_p(x) implies \"i\"(x) acceptance... " >&6; } +if ${pgac_cv_have_i_constraint__builtin_constant_p+:} false; then : + $as_echo_n "(cached) " >&6 +else + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +static inline int + addi(int ra, int si) + { + int res = 0; + if (__builtin_constant_p(si)) + __asm__ __volatile__( + " addi %0,%1,%2\n" : "=r"(res) : "r"(ra), "i"(si)); + return res; + } + int test_adds(int x) { return addi(3, x) + addi(x, 5); } +int +main () +{ + + ; + return 0; +} +_ACEOF +if ac_fn_c_try_compile "$LINENO"; then : + pgac_cv_have_i_constraint__builtin_constant_p=yes +else + pgac_cv_have_i_constraint__builtin_constant_p=no +fi +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_have_i_constraint__builtin_constant_p" >&5 +$as_echo "$pgac_cv_have_i_constraint__builtin_constant_p" >&6; } +if test x"$pgac_cv_have_i_constraint__builtin_constant_p" = xyes ; then + +$as_echo "#define HAVE_I_CONSTRAINT__BUILTIN_CONSTANT_P 1" >>confdefs.h + +fi ;; esac diff --git a/configure.in b/configure.in index dde3eec..510bd76 100644 --- a/configure.in +++ b/configure.in @@ -1541,6 +1541,26 @@ case $host_cpu in if test x"$pgac_cv_have_ppc_mutex_hint" = xyes ; then AC_DEFINE(HAVE_PPC_LWARX_MUTEX_HINT, 1, [Define to 1 if the assembler supports PPC's LWARX mutex hint bit.]) fi +# Check if compiler accepts "i"(x) when __builtin_constant_p(x). +AC_CACHE_CHECK([whether __builtin_constant_p(x) implies "i"(x) acceptance], + [pgac_cv_have_i_con
Re: Transparent Data Encryption (TDE) and encrypted files
Moon, Insung wrote: > On Wed, Oct 9, 2019 at 2:42 PM Antonin Houska wrote: > > > > Moon, Insung wrote: > > > > > I also tried to generate IV using PID (32bit) + tempCounter (64bit) at > > > first, but in the worst-case PID and tempCounter are used in the same > > > values. > > > Therefore, the uniqueness of the encryption key was considered without > > > considering the uniqueness of the IV value. > > > > If you consider 64bit counter insufficient (here it seems that tempCounter > > counts the 1GB segments), then we can't even use LSN as the IV for relation > > pages. > > The worst-case here is not a lack of tempCounter, but a problem that > occurs when PID is reused after a certain period. > Of course, it is very unlikely to be a problem because it is a > temporary file, but since the file name can know the PID and > tempFileCounter, if you accumulate some data, the same key and the > same IV will be used to encrypt other data. So I thought there could > be a problem. ok > > > First, it generates a hash value randomly for the file, and uses the > > > hash value and KEK (or MDEK) to derive and use the key with > > > HMAC-SHA256. > > > In this case, there is no need to store the encryption key separately > > > if it is not necessary to keep it in a separate IV file or memory. > > > (IV is a hash value of 64 bits and a counter of 32 bits.) > > > > You seem to miss the fact that user of buffile.c can seek in the file and > > rewrite arbitrary part. Thus you'd have to generate a new key for the part > > being changed. > > That's right. I wanted to ask this too. > Is it possible to overwrite the data already written in the actual buffile.c? > Such a problem seems to become a problem when BufFileWRite function is > called, and BufFileSeek function is called, and BufFileRead is called. > In other words, the file is not written in units of 8kb, but the file > is changed in the pos, and some data is read in another pos. v04-0011-Make-buffile.c-aware-of-encryption.patch in [1] changes buffile.c so that data is read and written in 8kB blocks if encryption is enabled. In order to record the IV per block, the computation of the buffer position within the file would have to be adjusted somehow. I can check it soon but not in the next few days. > I also thought that this would be a problem with re-creating the > encrypted file, i.e., IV and key change would be necessary, > So far, my research has found no case of overwriting data in the > previous pos after it has already been created in File data (where > FilWrite is called). > Can you tell me the case overwriting buffer file? (I suppose you mean BufFileWrite(), not FileWrite()). I don't remember if I ever checked particular use case in the PG core, but as long as buffer.c API allows such a thing to happen, the encryption code needs to handle it anyway. v04-0012-Add-tests-for-buffile.c.patch in [1] contains regression tests that do involve temp file overwriting. [1] https://www.postgresql.org/message-id/7082.1562337694@localhost -- Antonin Houska Web: https://www.cybertec-postgresql.com