Re: Key management with tests
Dear All. Thank you for all opinions and discussions regarding the KMS/TDE function. First of all, to get to the point of this email, I want to participate in anything I can do (review or development) when TDE related development is in progress. If there is a meeting related to it, I can't communicate because of my poor English skills, but I would like to attend if it is only possible to listen. I didn't understand KMS and didn't participate in the direct development, so I didn't comment on anything so far. Still, when TDE development starts, I wanted to join in the discussion and meeting if there was anything I could do. However, since I have a complicated and insufficient English ability to communicate in English, maybe I will rarely say anything in meetings (voice and video meetings). But I would like to attend the discussion if it is only possible to listen. Also, if the wiki page and other mail threads related to TDE start, I'll join in discussions if there is anything I can do. Best regards. Moon. On Sat, Jan 30, 2021 at 10:23 PM Tom Kincaid wrote: > > > > > > Thanks Stephen, Bruce and Masahiko, > >> >> > discussions so far and the point behind the design so that everyone >> > can understand why this feature is designed in that way. To do that, >> > it might be a good start to sort the wiki page since it has data >> > encryption part, KMS, and ToDo mixed. >> >> I hope it's pretty clear that I'm also very much in support of both this >> effort with the KMS and of TDE in general- TDE is specifically, >> repeatedly, called out as a capability whose lack is blocking PG from >> being able to be used for certain use-cases that it would otherwise be >> well suited for, and that's really unfortunate. > > > It is clear you are supportive. > > As you know, I share your point of view that PG adoption is suffering for > certain use cases because it does not have TDE. > >> I appreciate the recent discussion and reviews of the KMS in particular, >> and of the patches which have been sent enabling TDE based on the KMS >> patches. Having them be relatively independent seems to be an ongoing >> concern and perhaps we should figure out a way to more clearly put them >> together. That is- the KMS patches have been posted on one thread, and >> TDE PoC patches which use the KMS patches have been on another thread, >> leading some to not realize that there's already been TDE PoC work done >> based on the KMS patches. Seems like it might make sense to get one >> patch set which goes all the way from the KMS and includes the TDE PoC, >> even if they don't all go in at once. > > > Sounds good, thanks Masahiko, let's see if we can get consensus on the > approach for moving this forward see below. > >> >> >> together, as a few on this thread have voiced, but there's no doubt that >> this is a large project and it's hard to see how we could possibly >> commit all of it at once. > > > I propose that we meet to discuss what approach we want to use to move TDE > forward. We then start a new thread with a proposal on the approach and > finalize it via community consensus. I will invite Bruce, Stephen and > Masahiko to this meeting. If anybody else would like to participate in this > discussion and subsequently in the effort to get TDE in PG1x, please let me > know. Assuming Bruce, Stephen and Masahiko are down for this, I (or a > volunteer from this meeting) will post the proposal for how we move this > patch forward in another thread. Hopefully, we can get consensus on that and > subsequently restart the execution of delivering this feature. > > > > >> >> Thanks! >> >> Stephen > > > > -- > Thomas John Kincaid >
Re: Transparent Data Encryption (TDE) and encrypted files
Dear Tels. On Tue, Oct 1, 2019 at 4:33 PM Tels wrote: > > Moin, > > On 2019-09-30 23:26, Bruce Momjian wrote: > > For full-cluster Transparent Data Encryption (TDE), the current plan is > > to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem > > overflow). The plan is: > > > > > > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption > > > > We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact, > > or > > other files. Is that correct? Do any other PGDATA files contain user > > data? > > IMHO the general rule in crypto is: encrypt everything, or don't bother. > > If you don't encrypt some things, somebody is going to find loopholes > and sidechannels > and partial-plaintext attacks. Just a silly example: If you trick the DB > into putting only one row per page, > any "bit-per-page" map suddenly reveals information about a single > encrypted row that it shouldn't reveal. > > Many people with a lot of free time on their hands will sit around, > drink a nice cup of tea and come up > with all sorts of attacks on these things that you didn't (and couldn't) > anticipate now. This is my thinks, but to minimize overhead, we try not to encrypt data that does not store confidential data. And I'm not a security expert, so my thoughts may be wrong. But isn't it more dangerous to encrypt predictable data? For example, when encrypting data other than the data entered by the user, it is possible(maybe..) to predict the plain text data. And if these data are encrypted, I think that there will be a security problem. Of course, the encryption key will use separately. But I thought it would be a problem if there were confidential data encrypted using the same key as the attacked data. Best regards. Moon. > > So IMHO it would be much better to err on the side of caution and > encrypt everything possible. > > Best regards, > > Tels > >
Re: Transparent Data Encryption (TDE) and encrypted files
Dear Magnus Hagander. On Tue, Oct 1, 2019 at 5:37 PM Magnus Hagander wrote: > > > > On Tue, Oct 1, 2019 at 9:33 AM Tels wrote: >> >> Moin, >> >> On 2019-09-30 23:26, Bruce Momjian wrote: >> > For full-cluster Transparent Data Encryption (TDE), the current plan is >> > to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem >> > overflow). The plan is: >> > >> > >> > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption >> > >> > We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact, >> > or >> > other files. Is that correct? Do any other PGDATA files contain user >> > data? >> >> IMHO the general rule in crypto is: encrypt everything, or don't bother. >> >> If you don't encrypt some things, somebody is going to find loopholes >> and sidechannels >> and partial-plaintext attacks. Just a silly example: If you trick the DB >> into putting only one row per page, >> any "bit-per-page" map suddenly reveals information about a single >> encrypted row that it shouldn't reveal. >> >> Many people with a lot of free time on their hands will sit around, >> drink a nice cup of tea and come up >> with all sorts of attacks on these things that you didn't (and couldn't) >> anticipate now. >> >> So IMHO it would be much better to err on the side of caution and >> encrypt everything possible. > > > +1. > > Unless we are *absolutely* certain, I bet someone will be able to find a > side-channel that somehow leaks some data or data-about-data, if we don't > encrypt everything. If nothing else, you can get use patterns out of it, and > you can make a lot from that. (E.g. by whether transactions are using > multixacts or not you can potentially determine which transaction they are, > if you know what type of transactions are being issued by the application. In > the simplest case, there might be a single pattern where multixacts end up > actually being used, and in that case being able to see the multixact data > tells you a lot about the system). > > As for other things -- by default, we store the log files in text format in > the data directory. That contains *loads* of sensitive data in a lot of > cases. Will those also be encrypted? Maybe...as a result of the discussion so far, we are not encrypted of the server log. https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#What_to_encrypt.2Fdecrypt I think Encrypting server logs can be a very difficult challenge, and will probably need to develop another application to see the encrypted server logs. Best regards. Moon. > > -- > Magnus Hagander > Me: https://www.hagander.net/ > Work: https://www.redpill-linpro.com/
Re: Transparent Data Encryption (TDE) and encrypted files
Hello. On Tue, Oct 8, 2019 at 8:52 PM Antonin Houska wrote: > > Robert Haas wrote: > > > On Mon, Oct 7, 2019 at 3:01 PM Antonin Houska wrote: > > > However the design doesn't seem to be stable enough at the > > > moment for coding to make sense. > > > > Well, I think the question is whether working further on your patch > > could produce some things that everyone would agree are a step > > forward. > > It would have made a lot of sense several months ago (Masahiko Sawada actually > used parts of our patch in the previous version of his patch (see [1]), but > the requirement to use a different IV for each execution of the encryption > changes things quite a bit. > > Besides the relation pages and SLRU (CLOG), which are already being discussed > elsewhere in the thread, let's consider other two file types: > > * Temporary files (buffile.c): we derive the IV from PID of the process that > created the file + segment number + block within the segment. This > information does not change if you need to write the same block again. If > new IV should be used for each encryption run, we can simply introduce an > in-memory counter that generates the IV for each block. However it becomes > trickier if the temporary file is shared by multiple backends. I think it > might still be easier to expose the IV values to other backends via shared > memory than to store them on disk ... I think encrypt a temporary file in a slightly different way. Previously, I had a lot of trouble with IV uniqueness, but I have proposed a unique encryption key for each file. First, in the case of the CTR mode to be used, 32 bits are used for the counter in the 128-bit nonce value. Here, the counter increases every time 16 bytes are encrypted, and theoretically, if nonce 96 bits are the same, a total of 64 GiB can be encrypted. Therefore, in the case of buffile.c that creates a temporary file due to lack of work_mem, it is possible to use up to 1GiB per file, so it is possible to encrypt to a simple IV value sufficiently safely. The problem is that a vulnerability occurs when 96-bit nonce values excluding Counter are the same values. I also tried to generate IV using PID (32bit) + tempCounter (64bit) at first, but in the worst-case PID and tempCounter are used in the same values. Therefore, the uniqueness of the encryption key was considered without considering the uniqueness of the IV value. The encryption key uses a separate key for each file, as described earlier. First, it generates a hash value randomly for the file, and uses the hash value and KEK (or MDEK) to derive and use the key with HMAC-SHA256. In this case, there is no need to store the encryption key separately if it is not necessary to keep it in a separate IV file or memory. (IV is a hash value of 64 bits and a counter of 32 bits.) Also, currently, the temporary file name is specified by the current PID.tempFileCounter, but if this is set to PID.tempFileCounter.hashvalue, we can encrypt and decrypt in any process thinking about. Reference URL https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption > > * "Buffered transient file". This is to be used instead of OpenTransientFile() > if user needs the option to encrypt the file. (Our patch adds this API to > buffile.c. Currently we use it in reorderbuffer.c to encrypt the data > changes produced by logical decoding, but there should be more use cases.) Agreed. Best regards. Moon. > > In this case we cannot keep the IVs in memory because user can close the > file anytime and open it much later. So we derive the IV by hashing the file > path. However if we should generate the IV again and again, we need to store > it on disk in another way, probably one IV value per block (PGAlignedBlock). > > However since our implementation of both these file types shares some code, > it might yet be easier if the shared temporary file also stored the IV on > disk instead of exposing it via shared memory ... > > Perhaps this is what I can work on, but I definitely need some feedback. > > [1] > https://www.postgresql.org/message-id/CAD21AoBjrbxvaMpTApX1cEsO=8N=nc2xVZPB0d9e-VjJ=ya...@mail.gmail.com > > -- > Antonin Houska > Web: https://www.cybertec-postgresql.com > >
Re: Transparent Data Encryption (TDE) and encrypted files
Dear hackers. First, I don't know which email thread should written a reply, therefore using the first email thread. Sorry about the inconvenience... Sawada-san and I have previously researched the PostgreSQL database cluster file that contains user data. The result has been updated to the WIKI page[1], so share it here. This result is simply a list of files that contain user data, so we can think of it as the first step in classifying which files are encrypted. About the SLUR file that we have talked about so far, I think that discussions are in progress on the necessity of encryption, and I hope that this discussion will be useful. #In proceeding with the current development, we specified an encrypted file using the list above. If the survey results are different, it would be a help for this project if correct to the WIKI page. [1] https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#List_of_the_contains_of_user_data_for_PostgreSQL_files Best regards. Moon. On Tue, Oct 1, 2019 at 6:26 AM Bruce Momjian wrote: > > For full-cluster Transparent Data Encryption (TDE), the current plan is > to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem > overflow). The plan is: > > > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption > > We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact, or > other files. Is that correct? Do any other PGDATA files contain user > data? > > -- > Bruce Momjian http://momjian.us > EnterpriseDB http://enterprisedb.com > > + As you are, so once was I. As I am, so you will be. + > + Ancient Roman grave inscription + > >
Re: Transparent Data Encryption (TDE) and encrypted files
Dear Antonin Houska. Thank you for your attention to thie matter. On Wed, Oct 9, 2019 at 2:42 PM Antonin Houska wrote: > > Moon, Insung wrote: > > > Hello. > > > > On Tue, Oct 8, 2019 at 8:52 PM Antonin Houska wrote: > > > > > > Robert Haas wrote: > > > > > > > On Mon, Oct 7, 2019 at 3:01 PM Antonin Houska wrote: > > > > > However the design doesn't seem to be stable enough at the > > > > > moment for coding to make sense. > > > > > > > > Well, I think the question is whether working further on your patch > > > > could produce some things that everyone would agree are a step > > > > forward. > > > > > > It would have made a lot of sense several months ago (Masahiko Sawada > > > actually > > > used parts of our patch in the previous version of his patch (see [1]), > > > but > > > the requirement to use a different IV for each execution of the encryption > > > changes things quite a bit. > > > > > > Besides the relation pages and SLRU (CLOG), which are already being > > > discussed > > > elsewhere in the thread, let's consider other two file types: > > > > > > * Temporary files (buffile.c): we derive the IV from PID of the process > > > that > > > created the file + segment number + block within the segment. This > > > information does not change if you need to write the same block again. > > > If > > > new IV should be used for each encryption run, we can simply introduce > > > an > > > in-memory counter that generates the IV for each block. However it > > > becomes > > > trickier if the temporary file is shared by multiple backends. I think > > > it > > > might still be easier to expose the IV values to other backends via > > > shared > > > memory than to store them on disk ... > > > > I think encrypt a temporary file in a slightly different way. > > Previously, I had a lot of trouble with IV uniqueness, but I have > > proposed a unique encryption key for each file. > > > > First, in the case of the CTR mode to be used, 32 bits are used for > > the counter in the 128-bit nonce value. > > Here, the counter increases every time 16 bytes are encrypted, and > > theoretically, if nonce 96 bits are the same, a total of 64 GiB can be > > encrypted. > > > Therefore, in the case of buffile.c that creates a temporary file due > > to lack of work_mem, it is possible to use up to 1GiB per file, so it > > is possible to encrypt to a simple IV value sufficiently safely. > > The problem is that a vulnerability occurs when 96-bit nonce values > > excluding Counter are the same values. > > I don't think the lower 32 bits impose any limitation, see > CRYPTO_ctr128_encrypt_ctr32() in OpenSSL: if this lower part overflows, the > upper part is simply incremented. So it's up to the user to decide what > portion of the IV he wants to control and what portion should be controlled by > OpenSSL internally. Of course the application design should be such that no > overflows into the upper (user specific) part occur because those would result > in duplicate IVs. I'm sorry. I seem to have misunderstood. If I rechecked the source code of OpenSSL, as you said, it is assumed that the upper 96bit value is changed using the ctr96_inc() function. Sorry.. > > > I also tried to generate IV using PID (32bit) + tempCounter (64bit) at > > first, but in the worst-case PID and tempCounter are used in the same > > values. > > Therefore, the uniqueness of the encryption key was considered without > > considering the uniqueness of the IV value. > > If you consider 64bit counter insufficient (here it seems that tempCounter > counts the 1GB segments), then we can't even use LSN as the IV for relation > pages. The worst-case here is not a lack of tempCounter, but a problem that occurs when PID is reused after a certain period. Of course, it is very unlikely to be a problem because it is a temporary file, but since the file name can know the PID and tempFileCounter, if you accumulate some data, the same key and the same IV will be used to encrypt other data. So I thought there could be a problem. > > > The encryption key uses a separate key for each file, as described earlier. > > Do you mean a separate key for the whole temporary file, or for a single (1GB) > segment? Yes, that's right. Use a separate key per file. > > > First, it generates a hash value randomly for the file, and uses the > > hash value and KEK (or MD
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Dear Hackers. It's been a long time since I sent a mail. On Sat, Aug 24, 2019 at 9:27 AM Bruce Momjian wrote: > On Fri, Aug 23, 2019 at 10:35:17AM -0400, Stephen Frost wrote: > > > Agreed. The features of other databases are a clear source for what we > > > should consider and run through the useful/reasonable filter. > > > > Following on from that- when other databases don't have something that > > we're thinking about implementing, maybe we should be contemplating if > > it really makes sense as a requirement for us. > > Yes, that's a good point. > > > Specifically in this case- I went back and tried to figure out what > > other database systems have an "encrypt EVERYTHING" option. I didn't > > have much luck finding one though. So I think we need to ask ourselves- > > the "check box" that we're trying to check off with TDE, do the other > > database system check that box? If so, then it looks like the "check > > box" isn't actually "encrypt EVERYTHING", it's more along the lines of > > "make sure all regular user data is encrypted automatically" or some > > such, and that's a very different requirement, which seems to be > > answered by the other systems by having a KMS + tablespace/database > > level encryption. We certainly shouldn't be putting a lot of effort > > into building something that is either overkill or won't be interesting > > to users due to limitations like "have to take the entire cluster > > offline to re-key it". > > Well, I think they might do that to reduce encryption overhead. I think > tests have shown that is not an issue, but we will need to test further. > I am not sure of the downside of encrypting everything, since it leaks > the least information and has a minimal user API and code impact. What > is the value of encrypting only the user rows? Better key control? > Maybe my think can be very wrong. Please understand. I think that the value of encrypting with granularity rather than encrypting of all clusters. Maybe it is advantageous to manageability. Of course, encrypting of all clusters is an excellent choice because it has minimal impact on the code, and perhaps simply to implement and management APIs. But what about Database user or DBA? I thought of the example below. Suppose we have a system with multiple tenants (Tenant here means table, tablespace, or database.) in one database cluster. (I think it's similar to a cloud service. I think this is going to be a common case in the future.) We need to encrypt for tenant A and not need encryption for tenant B. In this case, is there a reason to encrypt until tenant B? It is a great advantage that the user, which is a characteristic of TDE, is encrypted without considering encryption. But there is no reason to encrypt even a tenant that does not require encryption. And another example, in terms of key management, I thought to encrypt with granularity was a good choice (especially key rotation). A special situation where A and B tenants need encryption and A tenant needs to rotate the key once every three months. And B tenant needs to rotate the key once a year. ( Of course, maybe it is a very rare case.) If we encrypt all clusters and do not manage of granularity encryption keys by tenants. And when they run to A tenant key rotated, the B tenant is also rotated together, which can cause operational discomfort. Of course, encrypting of all clusters and creating the managing of granularity keys for each tenant will solve the problem. But if we are implementing this part, I think it's the same as the implementation of granular encryption. Best Regards. Moon. > > > Now, that KMS has to be encrypted using a master key, of course, and we > > have to make sure that it is able to survive across a crash, and it'd > > sure be nice if it was indexed. One option for such a KMS would be > > something entirely external (which could potentially just be another PG > > database or something) but it'd be nice if we had something built-in. > > We might also want it to be replicated (or maybe we don't, as was > > discussed on the call, to allow for a replica to use an independent set > > of keys- of course that leads to issues with pg_rewind and such though). > > I think the replica could use a different key for the relations, but the > WAL key would have to be the same. > > > Anything built-in does seem like it'd be a fair bit of work to get it to > > address those requirements, but that does seem to be what the other > > database systems have done. Unfortunately, their documentation doesn't > > seem to really say exactly what they've done to address that. > > I do like they pgcrypto key support to be per-database so pg_dump will > dump the data encrypted, and with its locked keys. > > -- > Bruce Momjian http://momjian.us > EnterpriseDB http://enterprisedb.com > > + As you are, so once was I. As I am, so you will be. + > + Ancient Roman grave inscription + >
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Dear Hackers. > Specifically in this case- I went back and tried to figure out what > other database systems have an "encrypt EVERYTHING" option. I didn't > have much luck finding one though. So I think we need to ask ourselves- > the "check box" that we're trying to check off with TDE, do the other > database system check that box? If so, then it looks like the "check > box" isn't actually "encrypt EVERYTHING", it's more along the lines of > "make sure all regular user data is encrypted automatically" or some > such, and that's a very different requirement, which seems to be > answered by the other systems by having a KMS + tablespace/database > level encryption. We certainly shouldn't be putting a lot of effort > into building something that is either overkill or won't be interesting > to users due to limitations like "have to take the entire cluster > offline to re-key it". > > Now, that KMS has to be encrypted using a master key, of course, and we > have to make sure that it is able to survive across a crash, and it'd > sure be nice if it was indexed. Sorry, Does KMS here mean key Management System(or Service)? I may be mistaken, but I know that KMS is managing cryptographic keys. In other words, I kept the master key(or KEK) in KMS( not kept to PostgreSQL server-side), and PostgreSQL fetched the master key from KMS, and then encrypt or decrypt it on the PostgreSQL server-side. Of course, some KMS supports encryption function, which is the function to encrypt plain text inside KMS. Is this project aiming to use this function? > > A couple random ideas that probably won't work, but I'll put them out > there for others to shoot down- > > Some kind of 2-phase WAL pass, where we do WAL replay for the > non-encrypted bits first (which would include the KMS) and then go back > and WAL replay the encrypted stuff. Seems terrible. Sorry, Can you tell me an example what is the 2-phase WAL pass? I know that WAL read process is decrypted WAL data when reading an encrypted WAL page(per-page encrypt) or WAL record(per-record encrypt) and then replay. Is this a different case? Best Regards. Moon. > > An independent WAL for the KMS only. Ugh, do we need another walwriter > then? and buffers, and lots of other stuff. > > Some kind of flat-file based approach with a temp file and renaming of > files using durable_rename(), like what we used to do with > pg_shadow/authid, and now do with replorigin_checkpoint and such? > > Something else? > > Thoughts? > > Thanks! > > Stephen
Performance improvement of WAL writing?
Dear Hackers. Currently, the XLogWrite function is written in 8k(or 16kb) units regardless of the size of the new record. For example, even if a new record is only 300 bytes, pg_pwrite is called to write data in 8k units (if it cannot be writing on one page is 16kb written). Let's look at the worst case. 1) LogwrtResult.Flush is 8100 pos. 2) And the new record size is only 100 bytes. In this case, pg_pwrite is called which writes 16 kb to update only 100 bytes. It is a rare case, but I think there is overhead for pg_pwrite for some systems. # For systems that often update one record in one transaction. So what about modifying the XLogWrite function only to write the size that should record? Can this idea benefit from WAL writing performance? If it's OK to improve, I want to do modification. How do you think of it? Best Regards. Moon.
Wrong value in metapage of GIN INDEX.
Dear Hackers. Kuroda-san and I are interested in the GIN index and have been testing various things. While testing, we are found a little bug. Some cases, the value of nEntries in the metapage was set to the wrong value. This is a reproduce of bug situation. =# SET maintenance_work_mem TO '1MB'; =# CREATE TABLE foo(i jsonb); =# INSERT INTO foo(i) select jsonb_build_object('foobar001', i) FROM generate_series(1, 1) AS i; # Input the same value again. =# INSERT INTO foo(i) select jsonb_build_object('foobar001', i) FROM generate_series(1, 1) AS i; # Creates GIN Index. =# CREATE INDEX foo_idx ON foo USING gin (i jsonb_ops); =# SELECT * FROM gin_metapage_info(get_raw_page('foo_idx', 0)) WITH (fastupdate=off); -[ RECORD 1 ]+--- pending_head | 4294967295 pending_tail | 4294967295 tail_free_size | 0 n_pending_pages | 0 n_pending_tuples | 0 n_total_pages| 74 n_entry_pages| 69 n_data_pages | 4 n_entries| 20004 <--★ version | 2 In this example, the nentries value should be 10001 because the gin index stores duplicate values in one leaf(posting tree or posting list). But, if look at the nentries value of metapage using pageinspect, it is stored as 20004. So, Let's run the vacuum. =# VACUUM foo; =# SELECT * FROM gin_metapage_info(get_raw_page('foo_idx', 0)); -[ RECORD 1 ]+--- pending_head | 4294967295 pending_tail | 4294967295 tail_free_size | 0 n_pending_pages | 0 n_pending_tuples | 0 n_total_pages| 74 n_entry_pages| 69 n_data_pages | 4 n_entries| 10001 <--★ version | 2 Ah. Run to the vacuum, nEntries is changing the normal value. There is a problem with the ginEntryInsert function. That calls the table scan when creating the gin index, ginBuildCallback function stores the new heap value inside buildstate struct. And next step, If GinBuildState struct is the size of the memory to be using is equal to or larger than the maintenance_work_mem value, run to input value into the GIN index. This process is a function called ginEnctryInsert. The ginEntryInsert function called at this time determines that a new entry is added and increase the value of nEntries. However currently, ginEntryInsert is first to increase in the value of nEntries, and to determine if there are the same entries in the current GIN index. That causes the bug. The patch is very simple. Fix to increase the value of nEntries only when a non-duplicate GIN index leaf added. This bug detection and code fix worked with Kuroda-san. Best Regards. Moon. GIN_Metapage_bugfix.patch Description: Binary data
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Hello. On Thu, Oct 31, 2019 at 11:25 PM Masahiko Sawada wrote: > > On Fri, Sep 6, 2019 at 3:34 PM Smith, Peter > wrote: > > > > -Original Message- > > From: Masahiko Sawada Sent: Thursday, 15 August > > 2019 7:10 PM > > > > > BTW I've created PoC patch for cluster encryption feature. Attached patch > > > set has done some items of TODO list and some of them can be used even > > > for finer granularity encryption. Anyway, the implemented components are > > > followings: > > > > Hello Sawada-san, > > > > I guess your original patch code may be getting a bit out-dated by the > > ongoing TDE discussions, but I have done some code review of it anyway. > > > > Hopefully a few comments below can still be of use going forward: > > > > --- > > > > REVIEW COMMENTS > > > > * src/backend/storage/encryption/enc_cipher.c – For functions > > EncryptionCipherValue/String maybe should log warnings for unexpected > > values instead of silently assigning to default 0/”off”. > > > > * src/backend/storage/encryption/enc_cipher.c – For function > > EncryptionCipherString, purpose of returning ”unknown” if unclear because > > that will map back to “off” again anyway via EncryptionCipherValue. Why not > > just return "off" (with warning logged). > > > > * src/include/storage/enc_common.h – Typo in comment: "Encrypton". > > > > * src/include/storage/encryption.h - The macro DataEncryptionEnabled may be > > better to be using enum TDE_ENCRYPTION_OFF instead of magic number 0 > > > > * src/backend/storage/encryption/kmgr.c - Function BootStrapKmgr will > > report error if USE_OPENSSL is not defined. The check seems premature > > because it would fail even if the user is not using encryption. Shouldn't > > the lack of openssl be OK when user is not using TDE at all (i.e. when > > encryption is "none")? > > > > * src/backend/storage/encryption/kmgr.c - In function BootStrapMgr suggest > > better to check if (bootstrap_data_encryption_cipher == TDE_ENCRYPTION_OFF) > > using enum instead of the magic number 0. > > > > * src/backend/storage/encryption/kmgr.c - The function > > run_cluster_passphrase_command function seems mostly a clone of an existing > > run_ssl_passphrase_command function. Is it possible to refactor to share > > the common code? > > > > * src/backend/storage/encryption/kmgr.c - The function > > derive_encryption_key declares a char key_len. Why char? It seems int > > everywhere else. > > > > * src/backend/bootstrap/bootstrap.c - Suggest better if variable > > declaration bootstrap_data_encryption_cipher = 0 uses enum > > TDE_ENCRYPTION_OFF instead of magic number 0 > > > > * src/backend/utils/misc/guc.c - It looks like the default value for GUC > > variable data_encryption_cipher is AES128. Wouldn't "off" be the more > > appropriate default value? Otherwise it seems inconsistent with the logic > > of initdb (which insists that the -e option is mandatory if you wanted any > > encryption). > > > > * src/backend/utils/misc/guc.c - There is a missing entry in the > > config_group_names[]. The patch changed the config_group[] in guc_tables.h, > > so I think there needs to be a matching item in the config_group_names. > > > > * src/bin/initdb/initdb.c - The function check_encryption_cipher would > > disallow an encryption value of "none". Although maybe it is not very > > useful to say -e none, it does seem inconsistent to reject it, given that > > "none" was a valid value for the GUC variable data_encryption_cipher. > > > > * contrib/bloom/blinsert.c - In function btbuildempty the arguments for > > PageEncryptionInPlace seem in the wrong order (forknum should be 2nd). > > > > * src/backend/access/hash/hashpage.c - In function _hash_alloc_buckets the > > arguments for PageEncryptionInPlace seem in the wrong order (forknum should > > be 2nd). > > > > * src/backend/access/spgist/spginsert.c - In function spgbuildempty the > > arguments for PageEncryptionInPlace seem in the wrong order (forknum should > > be 2nd). This error looks repeated 3X. > > > > * in multiple files - The encryption enums have equivalent strings ("off", > > "aes-128", "aes-256") but they are scattered as string literals in many > > places (e.g. pg_controldata.c, initdb.c, guc.c, enc_cipher.c). Suggest it > > would be better to declare those as string constants in one place only.em > > > > Thank you for reviewing this patch. > > I've updated TDE patches. These patches implements key system, buffer > encryption and WAL encryption. Please refer to ToDo of cluster-wide > encryption for more details of design and components. It lacks > temporary file encryption and front end tools encryption. For > temporary file encryption, we are discussing which files should be > encrypted on other thread and I thought that temporary file encryption > might be related to that. So I'm currently studying the temporary > encryption patch that Antonin already submitted[1] but some changes > might be needed based on that discu
Exposure related to GUC value of ssl_passphrase_command
Deal Hackers. The value of ssl_passphrase_command is set so that an external command is called when the passphrase for decrypting an SSL file such as a private key is obtained. Therefore, easily set to work with echo "passphrase" or call to another get of passphrase application. I think that this GUC value doesn't contain very sensitive data, but just in case, it's dangerous to be visible to all users. I think do not possible these cases, but if a used echo external commands or another external command, know what application used to get the password, maybe we can't be convinced that there's the safety of using abuse by backtracking on applications. So I think to the need only superusers or users with the default role of pg_read_all_settings should see these values. Patch is very simple. How do you think about my thoughts like this? Best regards. Moon. Change-show-authority-of-ssl_passphrase_command.diff Description: Binary data
Re: Wrong value in metapage of GIN INDEX.
Dear Tom Lane. On Tue, Nov 5, 2019 at 3:55 AM Tom Lane wrote: > > "imai.yoshik...@fujitsu.com" writes: > > Moon-san, kuroda.keisuke-san > > On Thu, Aug 29, 2019 at 8:20 AM, Moon, Insung wrote: > >> The patch is very simple. > >> Fix to increase the value of nEntries only when a non-duplicate GIN index > >> leaf added. > > > Does nentries show the number of entries in the leaf pages? > > If so, the fix seems to be correct. > > I looked at this issue. The code in ginEntryInsert is not obviously wrong > by itself; it depends on what you think nEntries is supposed to count. > However, ginvacuum.c updates nEntries to the sum of PageGetMaxOffsetNumber() > across all the index's leaf pages, ie the number of surviving leaf items. > > It's hard to see how ginvacuum could reverse-engineer a value that would > match what ginEntryInsert is doing, so probably we ought to define > nEntries as the number of leaf items, which seems to make the proposed > patch correct. (It could use a bit more commentary though.) > > I'm inclined to apply this to HEAD only; it doesn't seem significant > enough to justify back-patching. Thank you for review and push to patch. Yes. I don't think it's a bug that has a big impact like your opinion. Best regards. Moon. > > regards, tom lane
RE: Typo: llvm*.cpp files identified as llvm*.c
Dear Hackers. > -Original Message- > From: Thomas Munro [mailto:thomas.mu...@enterprisedb.com] > Sent: Wednesday, January 23, 2019 9:38 AM > To: Amit Langote > Cc: Andres Freund; Pg Hackers > Subject: Re: Typo: llvm*.cpp files identified as llvm*.c > > On Wed, Jan 23, 2019 at 1:16 PM Amit Langote > wrote: > > On 2019/01/23 4:51, Andres Freund wrote: > > > On 2019-01-22 13:43:32 +0900, Amit Langote wrote: > > >> Attached find a patch to fix $subject. > > > > > > Thanks, pushed to master and 11. > > > > Thank you. > > It's not only the ending that's wrong. Here are some more source files whose > IDENTIFICATION heading doesn't exactly match > their path: I found the same problem as you while checking the typo patch. So I attached a patch that modified the correct file path in IDENTIFICATION. Regards. Moon. > > $ git grep -A 1 IDENTIFICATION | grep -v IDENTIFICATION | grep -v -- > -- | sed 's/-[^a-z][^a-z]*/ /' | awk '{ if ($1 != $2) print; }' > doc/src/sgml/lobj.sgml src/test/examples/testlo.c > src/backend/catalog/pg_publication.c pg_publication.c > src/backend/commands/publicationcmds.c publicationcmds.c > src/backend/commands/subscriptioncmds.c subscriptioncmds.c > src/backend/jit/llvm/llvmjit_inline.cpp > src/backend/lib/llvmjit/llvmjit_inline.cpp > src/backend/jit/llvm/llvmjit_wrap.cpp src/backend/lib/llvm/llvmjit_wrap.cpp > src/backend/optimizer/util/appendinfo.c > src/backend/optimizer/path/appendinfo.c > src/backend/optimizer/util/inherit.c src/backend/optimizer/path/inherit.c > src/backend/replication/logical/logicalfuncs.c > src/backend/replication/logicalfuncs.c > src/backend/replication/logical/reorderbuffer.c > src/backend/replication/reorderbuffer.c > src/backend/replication/logical/snapbuild.c > src/backend/replication/snapbuild.c > src/backend/replication/pgoutput/Makefile src/backend/replication/pgoutput > src/backend/utils/adt/version.c > src/include/replication/pgoutput.h pgoutput.h > > This could be really confusing for erm, future people reading a dot matrix > print-out of the source code? > > -- > Thomas Munro > http://www.enterprisedb.com typo-identification-filepath.patch Description: Binary data
Typo: pgbench.c
Dear hackers. I found a minor typo in the comment pgbench.c. -* always allocate one more in order to accomodate the NULL terminator +* always allocate one more in order to accommodate the NULL terminator Attached fix it. Regards. Moon. Moon, insung NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center typo-pgbench.patch Description: Binary data
RE: Typo: pgbench.c
> -Original Message- > From: Michael Paquier [mailto:mich...@paquier.xyz] > Sent: Wednesday, January 23, 2019 3:01 PM > To: Moon, Insung > Cc: 'Pg Hackers' > Subject: Re: Typo: pgbench.c > > On Wed, Jan 23, 2019 at 02:18:49PM +0900, Moon, Insung wrote: > > Attached fix it. > > Thanks, pushed. Please note that the indentation was not correct after > applying your patch. Thank you. I'll read your advice and the PostgreSQL manual and be careful when posting the future patch. Regards. Moon. > -- > Michael
RE: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Dear Ibrar Ahmed. From: Ibrar Ahmed [mailto:ibrar.ah...@gmail.com] Sent: Thursday, February 07, 2019 4:09 AM To: Moon, Insung Cc: Tom Lane; PostgreSQL-development Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS) On Tue, Jul 3, 2018 at 5:37 PM Moon, Insung wrote: Dear Tom Lane. > -Original Message- > From: Tom Lane [mailto:t...@sss.pgh.pa.us] > Sent: Monday, June 18, 2018 11:52 PM > To: Robert Haas > Cc: Joe Conway; Masahiko Sawada; Moon, Insung; PostgreSQL-development > Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key > Management Service (KMS) > > Robert Haas writes: > > On Mon, Jun 18, 2018 at 10:12 AM, Joe Conway wrote: > >> Not necessarily. Our pages probably have enough predictable bytes to > >> aid cryptanalysis, compared to user data in a column which might not > >> be very predicable. > > > Really? I would guess that the amount of entropy in a page is WAY > > higher than in an individual column value. > > Depending on the specifics of the encryption scheme, having some amount of > known (or guessable) plaintext may allow breaking > the cipher, even if much of the plaintext is not known. This is cryptology > 101, really. > > At the same time, having to have a bunch of independently-decipherable short > field values is not real secure either, especially > if they're known to all be encrypted with the same key. But what you know or > can guess about the plaintext in such cases > would be target-specific, rather than an attack that could be built once and > used against any PG database. > > Yes. If there is known to guessable data of encrypted data, maybe there is > > a possibility of decrypting the encrypted data. > > > > But would it be safe to use an additional encryption mode such as GCM or > > XFS to solve this problem? > > (Do not use the same IV) > > Thank you and Best regards. > > Moon. > > > > > regards, tom lane > Hi Moon, > > Have you done progress on that patch? I am thinking to work on the project > and found that you are already working on it. The last message is almost six > months old. I want to check with you that are you still working on that, if > yes I can help on that by reviewing the patch etc. If you are not working on > that anymore, can you share your done work (if possible)? > -- > Ibrar Ahmed We are currently developing for TDE and integration KMS. So, We will Also be prepared to start a new discussion with the PoC patch as soon as possible. At currently, we have changed the development direction of a per-Tablespace unit by per-table Also, currently researching how to associate with KMIP protocol related to the encryption key for integration with KMS. We talked about this in the Unconference session of PGConf.ASIA, And a week ago, we talked about the development direction of TDE and integration with KMS at FOSDEM PGDAY[1]. We will soon provide PoC with new discussions. Regards. [1] TRANSPARENT DATA ENCRYPTION IN POSTGRESQL AND INTEGRATION WITH KEY MANAGEMENT SERVICES https://www.postgresql.eu/events/fosdem2019/schedule/session/2307-transparent-data-encryption-in-postgresql-and-integration-with-key-management-services/
Re: Exposure related to GUC value of ssl_passphrase_command
Dear Hackers. Thank you for an response. I registered this entry in commifest of 2020-03. # I registered in the security part, but if it is wrong, sincerely apologize for this. And I'd like to review show authority to ssl_ * later and discuss it in a separate thread. Best regards. Moon. On Thu, Feb 13, 2020 at 6:11 PM Peter Eisentraut wrote: > > On 2020-02-13 04:38, Michael Paquier wrote: > > On Thu, Feb 13, 2020 at 11:28:05AM +0900, Kyotaro Horiguchi wrote: > >> I think it is reasonable. > > > > Indeed, that makes sense to me as well. I am adding Peter Eisentraut > > in CC as the author/committer of 8a3d942 to comment on that. > > I'm OK with changing that. > > >> By the way, I'm not sure the criteria of setting a GUC variable as > >> GUC_SUPERUSER_ONLY, but for example, ssl_max/min_protocol_version, > >> dynamic_library_path, log_directory, krb_server_keyfile, > >> data_directory and config_file are GUC_SUPERUSER_ONLY. So, it seems to > >> me very strange that ssl_*_file are not. Don't we need to mark them > >> maybe and some of the other ssl_* as the same? > > > > This should be a separate discussion IMO. Perhaps there is a point in > > softening or hardening some of them. > > I think some of this makes sense, and we should have a discussion about it. > > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > >
Flexible pglz_stategy values and delete const.
Dear Hackers. For the current PostgreSQL, the toast data and some WAL data (page data) are compressed using the LZ-based algorithm. For PostgreSQL, two types of strategy(PGLZ_strategy_default, PGLZ_strategy_always) are provided by default, and only used of PGLZ_strategy_default(default values) on PostgreSQL core. For PGLZ_strategy_default values: 1. reduce the size of compressed data by 25% or more 2. original data between 32 bytes and INT_MAX bytes 3. When the first 1 kb data is compressed (addition to the dictionary and reused) PGLZ Compression will only succeed if these conditions are met. However, some users may want to be compressed, requiring a slightly more loose condition. So how about things modify a flexible strategy that is not a fixed setting to compression strategy, to allow the user to set it? Perhaps the configurable value is min_input_size min_comp_rate first_supcess_by I want to add this to GUC so that is can set the minimum compressible file size and compression rate. The compression-related strategy is applied only when compressed. Decompression does not use strategy, so the old compressed data is not affected by the new patch. What do you think of this proposal? If there are any agree to this proposal, I want to write patches.
Re: Flexible pglz_stategy values and delete const.
Hello. On Tue, Feb 18, 2020 at 1:20 PM Michael Paquier wrote: > > On Mon, Feb 17, 2020 at 11:04:47AM +0900, Moon, Insung wrote: > > The compression-related strategy is applied only when compressed. > > Decompression does not use strategy, so the old compressed data is not > > affected by the new patch. > > This may be a difficult question, but do you have examples of > workloads which could benefit of having such reloptions? It is not > the kind of options which are easy to tune, still a POC should be > simple enough to implement to show your point. Thank you for your response. In fact, I have not a very specific workload, but for the sake of simplicity test workload, I'm created TOAST data using my PoC to check if compression was successful. For default pglz_strategy ... =# create table foo (i text); =# insert INTO foo values (random_string_simple (1)); # random_string_simple function is that generates random numbers(text) from 0 to 9. result = Compression failure [1] # This is the log I added, if compression failed. For change pglz_strategy in the made POC ... (Here I set the min_comp_rate strategy more loosely 25 to 10) =# Insert INTO foo values (random_string_simple (1)); result = Compression Success [1] [7631] Compression was successful, and data was reduced from 1byte to 7631byte(about 24%), including the TOAST header. In other words, such a case may occur, and instead of compressing with a fixed strategy, I want to give a choice to users who need a compression strategy. Of course, compression may cause performance degradation, but some users may want to minimize disk usage. And, in the case of FPW, there are more cases where the size of the WAL is smaller, so there may be cases in which the SR environment is advantageous. I will send my POC for testing and discussing it as soon as possible. Of course, test modules and documentation modifications are not included. Best regards. Moon. > -- > Michael
Re: Internal key management system
Dear Sawada-san I don't know if my environment or email system is weird, but the V5 patch file is only include simply a changed list. and previous V4 patch file size was 64kb, but the v5 patch file size was 2kb. Can you check it? Best regards. Moon. On Tue, Mar 3, 2020 at 5:58 PM Masahiko Sawada wrote: > > On Tue, 3 Mar 2020 at 08:49, Cary Huang wrote: > > > > Hi Masahiko > > Please see below my comments regarding kms_v4.patch that you have shared > > earlier. > > Thank you for reviewing this patch! > > > > > (1) > > There is a discrepancy between the documentation and the actual function > > definition. For example in func.sgml, it states pg_wrap_key takes argument > > in text data type but in pg_proc.dat and kmgr.c, the function is defined to > > take argument in bytea data type. > > > > ===> doc/src/sgml/func.sgml > > + > > + > > + pg_wrap_key > > + > > + pg_wrap_key(data > > text) > > + > > + > > + bytea > > + > > > > ===> src/include/catalog/pg_proc.dat > > +{ oid => '8201', descr => 'wrap the given secret', > > + proname => 'pg_wrap_key', > > + provolatile => 'v', prorettype => 'bytea', > > + proargtypes => 'bytea', prosrc => 'pg_wrap_key' }, > > > > ===> src/backend/crypto/kmgr.c > > +Datum > > +pg_wrap_key(PG_FUNCTION_ARGS) > > +{ > > + bytea *data = PG_GETARG_BYTEA_PP(0); > > + bytea *res; > > + int datalen; > > + int reslen; > > + int len; > > Fixed. > > > + > > > > (2) > > I think the documentation needs to make clear the difference between a key > > and a user secret. Some parts of it are trying to mix the 2 terms together > > when they shouldn't. To my understanding, a key is expressed as binary data > > that is actually used in the encryption and decryption operations. A user > > secret, on the other hand, is more like a passphrase, expressed as string, > > that is used to derive a encryption key for subsequent encryption > > operations. > > > > If I just look at the function names "pg_wrap_key" and "pg_unwrap_key", I > > immediately feel that these functions are used to encapsulate and uncover > > cryptographic key materials. The input and output to these 2 functions > > should all be key materials expressed in bytea data type. In previous email > > discussion, there was only one key material in discussion, called the > > master key (generated during initdb and stored in cluster), and this > > somehow automatically makes people (including myself) associate pg_wrap_key > > and pg_unwrap_key to be used on this master key and raise a bunch of > > security concerns around it. > > > > Having read the documentation provided by the patch describing pg_wrap_key > > and pg_unwrap_key, they seem to serve another purpose. It states that > > pg_wrap_key is used to encrypt a user-supplied secret (text) with the > > master key and produce a wrapped secret while pg_unwrap_key does the > > opposite, so we can prevent user from having to enter the secret in > > plaintext when using pgcrypto functions. > > > > This use case is ok but user secret is not really a cryptographic key > > material is it? It is more similar to a secret passphrase expressed in text > > and pg_wrap_key is merely used to turn the passphrase into a wrapped > > passphrase expressed in bytea. > > > > If I give pg_wrap_key with a real key material expressed in bytea, I will > > not be able to unwrap it properly: > > > > Select pg_unwrap_key > > (pg_wrap_key(E'\\x2b073713476f9f0e761e45c64be8175424d2742e7d53df90b6416f1d84168e8a') > > ); > > > > pg_unwrap_key > > -- > > +\x077\x13Go\x0Ev\x1EEK\x17T$t.}SߐAo\x1D\x16 > > (1 row) > > > > Maybe we should rename these SQL functions like this to prevent confusion. > > => pg_wrap_secret (takes a text, returns a bytea) > > => pg_unwrap_secret(takes a bytea, returns a text) > > Agreed to change argument types. User secret will be normally text > password as we do with pgcrypto. So probably these functions can cover > most cases. I changed the function name to pg_wrap and pg_unwrap > because these functions generically wrap and unwrap the given data. > > > > > if there is a use case for users to encapsulate key materials then we can > > define 2 more wrap functions for these, if there is no use case, don't > > bother: > > => pg_wrap_key (takes a bytea, returns a bytea) > > => pg_unwrap_key (takes a bytea, returns a bytea) > > +1. Like pgcrypto has both pgp_sym_encrypt_bytea and pgp_sym_encrypt, > maybe we can have such functions. > > > > > (3) > > I would rephrase "chapter 32: Encryption Key Management Part III. Server > > Administration" documentation like this: > > > > = > > PostgreSQL supports Encryption Key Management System, which is enabled when > > PostgreSQL is built with --with-openssl option a
Re: Exposure related to GUC value of ssl_passphrase_command
Dear Kuroda-san, Fujii-san Thank you for review and commit! #Oops.. Sorry..This mail thread has been spammed in Gmail. I'll go to submit a new discussion after found which case could leak about the GUC parameters related to ssl_*. Please wait a bit. Best regards. Moon. On Mon, Mar 9, 2020 at 11:43 AM Fujii Masao wrote: > > > > On 2020/02/14 10:31, Moon, Insung wrote: > > Dear Hackers. > > > > Thank you for an response. > > I registered this entry in commifest of 2020-03. > > # I registered in the security part, but if it is wrong, sincerely > > apologize for this. > > > > And I'd like to review show authority to ssl_ * later and discuss it > > in a separate thread. > > So, you are planning to start new discussion about this? > > Regards, > > -- > Fujii Masao > NTT DATA CORPORATION > Advanced Platform Technology Group > Research and Development Headquarters
RE: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Dear Antonin Houska. > -Original Message- > From: Antonin Houska [mailto:a...@cybertec.at] > Sent: Tuesday, May 29, 2018 3:23 PM > To: Moon, Insung > Cc: pgsql-hack...@postgresql.org > Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key > Management Service (KMS) > > Moon, Insung wrote: > > This patch seems to implement some of the features you propose, especially > encryption of buffers and WAL. I recommend > you to check so that no effort is > duplicated: Yes. encrypting / decrypting between Buffer <-> Disk is the same architecture. But, this idea is not to encrypt all table, thinks to minimize the performance overhead, only encrypting to necessary tables (including Xlog). Thank you and Best regards. Moon. > > > [4] Recently discussed mail > > > > https://www.postgresql.org/message-id/CA%2BCSw_tb3bk5i7if6inZFc3yyf%2B > > 9HEVNTy51QFBoeUk7UE_V%3Dw%40mail.gmail.com > > > > -- > Antonin Houska > Cybertec Schönig & Schönig GmbH > Gröhrmühlgasse 26, A-2700 Wiener Neustadt > Web: https://www.cybertec-postgresql.com
RE: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Dear Aleksander Alekseev. > -Original Message- > From: Aleksander Alekseev [mailto:a.aleks...@postgrespro.ru] > Sent: Thursday, May 31, 2018 10:33 PM > To: Moon, Insung > Cc: pgsql-hack...@postgresql.org > Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key > Management Service (KMS) > > Hello Moon, > > I promised to email links to the articles I mentioned during your talk on the > PGCon Unconference to this thread. Here > they are: > > * http://cryptowiki.net/index.php?title=Order-preserving_encryption > * https://en.wikipedia.org/wiki/Homomorphic_encryption > > Also I realized that I was wrong regarding encryption of the indexes since > they will be encrypted on the block level the > same way the heap will be. Sorry. I did not explain correctly in PGCon. Yes. this idea is encrypting at the block level as you said, there is probably not a big problem with index encryption. I will testing with PoC later an Index Encryption. Thank you and Best regards. Moon. > > -- > Best regards, > Aleksander Alekseev
RE: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Dear Masahiko Sawada. > -Original Message- > From: Masahiko Sawada [mailto:sawada.m...@gmail.com] > Sent: Monday, June 11, 2018 6:22 PM > To: Moon, Insung > Cc: PostgreSQL-development; Joe Conway > Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key > Management Service (KMS) > > On Fri, May 25, 2018 at 8:41 PM, Moon, Insung > wrote: > > Hello Hackers, > > > > This propose a way to develop "Table-level" Transparent Data > > Encryption (TDE) and Key Management Service (KMS) support in PostgreSQL. > > > > > > Issues on data encryption of PostgreSQL == Currently, in > > PostgreSQL, data encryption can be using pgcrypto Tool. > > However, it is inconvenient to use pgcrypto to encrypts data in some cases. > > > > There are two significant inconveniences. > > > > First, if we use pgcrypto to encrypt/decrypt data, we must call pgcrypto > > functions everywhere we encrypt/decrypt. > > Second, we must modify application program code much if we want to do > > database migration to PostgreSQL from other databases that is using TDE. > > > > To resolved these inconveniences, many users want to support TDE. > > There have also been a few proposals, comments, and questions to support > > TDE in the PostgreSQL community. > > > > However, currently PostgreSQL does not support TDE, so in development > > community, there are discussions whether it's necessary to support TDE or > > not. > > > > In these discussions, there were requirements necessary to support TDE in > > PostgreSQL. > > > > 1) The performance overhead of encryption and decryption database data > > must be minimized > > 2) Need to support WAL encryption. > > 3) Need to support Key Management Service. > > > > Therefore, I'd like to propose the new design of TDE that deals with both > > above requirements. > > Since this feature will become very large, I'd like to hear opinions from > > community before starting making the patch. > > > > First, my proposal is table-level TDE which is that user can specify tables > > begin encrypted. > > Indexes, TOAST table and WAL associated with the table that enables TDE are > > also encrypted. > > > > Moreover, I want to support encryption for large object as well. > > But I haven't found a good way for it so far. So I'd like to remain it as > > future TODO. > > > > My proposal has five characteristics features of "table-level TDE". > > > > 1) Buffer-level data encryption and decryption > > 2) Per-table encryption > > 3) 2-tier encryption key management > > 4) Working with external key management services(KMS) > > 5) WAL encryption > > > > Here are more details for each items. > > > > > > 1. Buffer-level data encryption and decryption == > > Transparent data encryption and decryption accompany by storage > > operation With ordinally way like using pgcrypto, the biggest problem > > with encrypted data is the performance overhead of decrypting the data each > > time the run to queries. > > > > My proposal is to encrypt and decrypt data when performing DISK I/O > > operation to minimize performance overhead. > > Therefore, the data in the shared memory layer is unencrypted so that > > performance overhead can minimize. > > > > With this design, data encryption/decryption implementations can be > > developed by modifying the codes of the storage and buffer manager > > modules, which are responsible for performing DISK I/O operation. > > > > > > 2. Per-table encryption > > == > > User can enable TDE per table as they want. > > I introduce new storage parameter "encryption_enabled" which enables TDE at > > table-level. > > > > // Generate the encryption table > >CREATE TABLE foo WITH ( ENCRYPTION_ENABLED = ON ); > > > > // Change to the non-encryption table > >ALTER TABLE foo SET ( ENCRYPTION_ENABLED = OFF ); > > > > This approach minimizes the overhead for tables that do not require > > encryption options. > > For tables that enable TDE, the corresponding table key will be > > generated with random values, and it's stored into the new system catalog > > after being encrypted by the master key. > > > > BTW, I want to support CBC mode encryption[3]. However, I'm not sure how to > > use the IV in CBC mode for this proposal. > > I'd like to hear opinions b
RE: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Dear Tomas Vondra. > -Original Message- > From: Tomas Vondra [mailto:tomas.von...@2ndquadrant.com] > Sent: Wednesday, June 13, 2018 10:15 PM > To: Moon, Insung; pgsql-hack...@postgresql.org > Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key > Management Service (KMS) > > Hi, > > On 05/25/2018 01:41 PM, Moon, Insung wrote: > > Hello Hackers, > > > > ... > > > > BTW, I want to support CBC mode encryption[3]. However, I'm not sure > > how to use the IV in CBC mode for this proposal. I'd like to hear > > opinions by security engineer. > > > > I'm not a cryptographer either, but this is exactly where you need a prior > discussion about the threat models - there > are a couple of chaining modes, each with different weaknesses. > Thank you for your advice. First, I'm researched to more security problem and found that CBC mode is an not safe encryption mode. Later, when I'll create a PoC, using to GCM or XTS encryption mode. And this time I know for using the same IV is dangerous, and I'm doing some more research on this. Thank you and Best regards. Moon. > FWIW it may also matter if data_checksums are enabled, because that may > prevent malleability attacks affecting of the > modes. Assuming active attacker (with the ability to modify the data files) > is part of the threat model, of course. > > regards > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Dear Tomas Vondra. > -Original Message- > From: Tomas Vondra [mailto:tomas.von...@2ndquadrant.com] > Sent: Wednesday, June 13, 2018 10:03 PM > To: Masahiko Sawada; Moon, Insung > Cc: PostgreSQL-development; Joe Conway > Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key > Management Service (KMS) > > On 06/11/2018 11:22 AM, Masahiko Sawada wrote: > > On Fri, May 25, 2018 at 8:41 PM, Moon, Insung > > wrote: > >> Hello Hackers, > >> > >> This propose a way to develop "Table-level" Transparent Data > >> Encryption (TDE) and Key Management Service (KMS) support in > >> PostgreSQL. > >> > >> ... > > > > As per discussion at PGCon unconference, I think that firstly we need > > to discuss what threats we want to defend database data against. > > If user wants to defend against a threat that is malicious user who > > logged in OS or database steals an important data on datbase this > > design TDE would not help. Because such user can steal the data by > > getting a memory dump or by SQL. That is of course differs depending > > on system requirements or security compliance but what threats do you > > want to defend database data against? and why? > > > > I do agree with this - a description of the threat model needs to be part of > the design discussion, otherwise it's not > possible to compare it to alternative solutions (e.g. full-disk encryption > using LUKS or using existing privilege controls > and/or RLS). > > TDE was proposed/discussed repeatedly in the past, and every time it died > exactly because it was not very clear which > issue it was attempting to solve. > > Let me share some of the issues mentioned as possibly addressed by TDE (I'm > not entirely sure TDE actually solves them, > I'm just saying those were mentioned in previous discussions): > > 1) enterprise requirement - Companies want in-database encryption, for > various reasons (because "enterprise solution" > or something). Yes. I do not know clearly about enterprise encryption requirements. Typically, identified the requirements for encryption of PCI-DSS and posted these ideas.(Storage encryptoin) Therefore, according to your opinion, I will more try to research of the enterprise encryption requirements. > > 2) like FDE, but OS/filesystem independent - Same config on any OS and > filesystem, which may make maintenance easier. > > 3) does not require special OS/filesystem setup - Does not require help from > system adminitrators, setup of LUKS devices > or whatever. Yes. We can use disk encryption like LUKS at Linux, but it does not apply to all OS's, so I'm proposed TDE. > > 4) all filesystem access (basebackups/rsync) is encrypted anyway > > 5) solves key management (the main challenge with pgcrypto) In fact, it is the biggest worry about key management. First, I think of 2-tier encryption as I wrote in my idea, and I am thinking of using KMS for management to master key. However, I am also worried about security problems when I managed of table key and master key. Therefore, I want to more discuss of Key Management and develop KMS simultaneously with TDE. Thank you and Best regards. Moon. > > 6) allows encrypting only some of the data (tables, columns) to minimize > performance impact > > IMHO it makes sense to have TDE even if it provides the same "security" > as disk-level encryption, assuming it's more convenient to setup/use from the > database. > > > Also, if I understand correctly, at unconference session there also > > were two suggestions about the design other than the suggestion by > > Alexander: implementing TDE at column level using POLICY, and > > implementing TDE at table-space level. The former was suggested by Joe > > but I'm not sure the detail of that suggestion. I'd love to hear the > > deal of that suggestion. The latter was suggested by Tsunakawa-san. > > Have you considered that? > > > > You mentioned that encryption of temporary data for query processing > > and large objects are still under the consideration. But other than > > them you should consider the temporary data generated by other > > subsystems such as reorderbuffer and transition table as well. > > > > The severity of those limitations is likely related to the threat model. > I don't think encrypting temporary data would be a big problem, assuming you > know which key to use. > > regards > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Dear Takayuki Tsunakawa. > -Original Message- > From: Tsunakawa, Takayuki [mailto:tsunakawa.ta...@jp.fujitsu.com] > Sent: Thursday, June 14, 2018 9:58 AM > To: 'Tomas Vondra'; Moon, Insung; pgsql-hack...@postgresql.org > Subject: RE: [Proposal] Table-level Transparent Data Encryption (TDE) and Key > Management Service (KMS) > > > From: Tomas Vondra [mailto:tomas.von...@2ndquadrant.com] > > On 05/25/2018 01:41 PM, Moon, Insung wrote: > > > BTW, I want to support CBC mode encryption[3]. However, I'm not sure > > > how to use the IV in CBC mode for this proposal. I'd like to hear > > > opinions by security engineer. > > > > > > > I'm not a cryptographer either, but this is exactly where you need a > > prior discussion about the threat models - there are a couple of > > chaining modes, each with different weaknesses. > Our products uses XTS, which recent FDE software like BitLocker and TrueCrypt > uses instead of CBC. > > https://en.wikipedia.org/wiki/Disk_encryption_theory#XTS > > "According to SP 800-38E, "In the absence of authentication or access > control, XTS-AES provides more protection than the > other approved confidentiality-only modes against unauthorized manipulation > of the encrypted data."" Thank your for your advice! Yes. I found that CBC is not safe at this time. So let's use XTS mode or GCM mode as you mentioned. Thank you and Best regards. Moon. > > > > > FWIW it may also matter if data_checksums are enabled, because that > > may prevent malleability attacks affecting of the modes. Assuming > > active attacker (with the ability to modify the data files) is part of > > the threat model, of course. > > Encrypt the page after embedding its checksum value. If a malicious attacker > modifies a page on disk, then the decrypted > page would be corrupt anyway, which can be detected by checksum. > > > Regards > Takayuki Tsunakawa >
RE: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Dear Joe. > -Original Message- > From: Joe Conway [mailto:m...@joeconway.com] > Sent: Monday, June 18, 2018 9:30 PM > To: Masahiko Sawada > Cc: Moon, Insung; PostgreSQL-development > Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key > Management Service (KMS) > > On 06/14/2018 12:19 PM, Masahiko Sawada wrote: > > On Wed, Jun 13, 2018 at 10:20 PM, Joe Conway wrote: > >> The idea has not been extensively fleshed out yet, but the thought > >> was that we create column level POLICY, which would transparently > >> apply some kind of transform on input and/or output. The transforms > >> would presumably be expressions, which in turn could use functions > >> (extension or builtin) to do their work. That would allow > >> encryption/decryption, DLP (data loss prevention) schemes (masking, > >> redacting), etc. to be applied based on the policies. > > > > Which does this design encrypt data on, buffer or both buffer and > > disk? > > > The point of the design is simply to provide a mechanism for input and output > transformation, not to provide the transform > function itself. > > How you use that transformation would be entirely up to you, but if you were > providing an encryption transform on input > the data would be encrypted both buffer and disk. > > > And does this design (per-column encryption) aim to satisfy something > > specific security compliance? > > > Again, entirely up to you and dependent on what type of transformation you > provide. If, for example you provided input > encryption and output decryption based on some in memory session variable > key, that would be essentially TDE and would > satisfy several common sets of compliance requirements. > > > >> This, in and of itself, would not address key management. There is > >> probably a separate need for some kind of built in key management -- > >> perhaps a flexible way to integrate with external systems such as > >> Vault for example, or maybe something self contained, or perhaps both. > > > > I agree to have a flexible way in order to address different > > requirements. I thought that having a GUC parameter to which we store > > a shell command to get encryption key is enough but considering > > integration with various key managements seamlessly I think that we > > need to have APIs for key managements. (fetching key, storing key, > > generating key etc) > > > I don't like the idea of yet another path for arbitrary shell code execution. > An API for extension code would be preferable. Thank you for your advice on key management. In fact, it was a big worry how to implement key management. Basically, we will look at the rules of KMIP, and I'll try to create an extension API that can mostly work with KMS. and I have a question. You said do not like the idea of another path for arbitrary shell code execution, is there any special reason? For example, I think usability to specify a path for shell code to use several KMSs, Is there a potential security issue? Thank you and Best regards. Moon. > > > >> Or > >> maybe key management is really tied into the separately discussed > >> effort to create SQL VARIABLEs somehow. > > > > Could you elaborate on how key management is tied into SQL VARIABLEs? > > Well, the key management probably is not, but the SQL VARIABLE might be where > the key is stored for use. > > Joe > > -- > Crunchy Data - http://crunchydata.com > PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source > Development
RE: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Dear Tom Lane. > -Original Message- > From: Tom Lane [mailto:t...@sss.pgh.pa.us] > Sent: Monday, June 18, 2018 11:52 PM > To: Robert Haas > Cc: Joe Conway; Masahiko Sawada; Moon, Insung; PostgreSQL-development > Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key > Management Service (KMS) > > Robert Haas writes: > > On Mon, Jun 18, 2018 at 10:12 AM, Joe Conway wrote: > >> Not necessarily. Our pages probably have enough predictable bytes to > >> aid cryptanalysis, compared to user data in a column which might not > >> be very predicable. > > > Really? I would guess that the amount of entropy in a page is WAY > > higher than in an individual column value. > > Depending on the specifics of the encryption scheme, having some amount of > known (or guessable) plaintext may allow breaking > the cipher, even if much of the plaintext is not known. This is cryptology > 101, really. > > At the same time, having to have a bunch of independently-decipherable short > field values is not real secure either, especially > if they're known to all be encrypted with the same key. But what you know or > can guess about the plaintext in such cases > would be target-specific, rather than an attack that could be built once and > used against any PG database. Yes. If there is known to guessable data of encrypted data, maybe there is a possibility of decrypting the encrypted data. But would it be safe to use an additional encryption mode such as GCM or XFS to solve this problem? (Do not use the same IV) Thank you and Best regards. Moon. > > regards, tom lane
[Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
all aspects of security from the secure generation of keys, secure storing keys, and secure fetching keys up to encryption key handling. Also, various types of KMSs are provided by many companies, and users can choose them. Therefore I would like to manage the master key using KMS. Also, my proposal is to create callback APIs(generate_key, fetch_key, store_key) in the form of a plug-in so that users can use many types of KMS as they want. In KMIP protocol and most KMS manage keys by string IDs. We can get keys by key ID from KMS. So in my proposal, all master keys are distinguished by its ID, called "master key ID". The master key ID is made, for example, using the database oid and a sequence number, like _. And they are managed in PostgreSQL. When database startup, all master key ID is loaded to shared memory, and they are protected by LWLock. When it comes time to rotate the master keys, run this query. ALTER SYSTEM ROTATION MASTER KEY; In this query, the master key is rotated with the following step. 1. Generate new master key, 2. Change master key IDs and emit corresponding WAL 3. Re-encrypt all table keys on its database Also during checkpoint, master key IDs on shared memory become a permanent condition. 5. WAL encryption == If we encrypt all WAL records, performance overhead can be significant. Therefore, this proposes a method to encrypt only WAL record excluding WAL header when writing WAL on the WAL buffer, instead of encrypting a whole WAL record. WAL encryption key is generated separately when the TDE-enabled table is created the first time. We use 2-tier encryption for WAL encryption as well. So, when it comes time to rotate the WAL encryption key, run this query. ALTER SYSTEM ROTATION WAL KEY; Next, I will explain how to encrypt WAL. To do this operation, I add a flag to WAL header which indicates whether the subsequent WAL data is encrypted or not. Then, when we write WAL for encryption table we write "encrypted" WAL on WAL buffer layer. In recovery, we read WAL header and check the flag of encryption, and judges whether WAL must be decrypted. In the case of PITR, we use WAL key ID in the backup file. With this approach, the performance overhead of writing and reading the WAL for unencrypted tables would be almost the same as before. == I'd like to discuss the design before starting making any change of code. After a more discussion I want to make a PoC. Feedback and suggestion are very welcome. Finally, thank you initial design input for Masahiko Sawada. Thank you. [1] What does TDE mean? > https://en.wikipedia.org/wiki/Transparent_Data_Encryption [2] What does KMS mean? > https://en.wikipedia.org/wiki/Key_management#Key_Management_System [3] What does CBC-Mode mean? > https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation [4] Recently discussed mail https://www.postgresql.org/message-id/CA%2BCSw_tb3bk5i7if6inZFc3yyf%2B9HEVNTy51QFBoeUk7UE_V%3Dw%40mail.gmail.com Regards. Moon. Moon, Insung NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
[PATCH]pg_buffercache add a buffer state column, Add fuction to decode buffer state
Dear Hackers. I'm studied PostgreSQL buffers for the development of new patches. In particular, using pg_buffercache, is can easily check the status of actual buffer. Bur there was one inconvenience. Pg_buffercache was also to check only the dirty state of the buffer. State of the buffer currently represents 10 states. Therefore, it seems impossible to check remaining 9 state. So I add a state column to pg_buffercache view so that I could print a value indicating the state of the buffer. This is outpu as an unit32 type, and examples are shown below. - postgres=# select * from pg_buffercache where bufferid = 1; -[ RECORD 1 ]+--- bufferid | 1 relfilenode | 1262 reltablespace| 1664 reldatabase | 0 relforknumber| 0 relblocknumber | 0 isdirty | f usagecount | 5 pinning_backends | 0 buffer_state | 2203320320 <- it's a new column - With the patches, user can check status values and check the status of the buffers. However, if do not know source code, or do not know hex values, It's difficult or impossible to check the actual state even using this patch. Therefore, add a new function to improve readability when checking state. When you input a value for state, this function prints out what the actual state. Examples of actual use are as follows. - postgres=# SELECT bufferid, relfilenode, state_text FROM pg_buffercache, LATERAL pg_buffercache_state_print(buffer_state) M(state_text) WHERE bufferid < 10; bufferid | relfilenode | state_text --+-+--- 1 |1262 | {LOCKED,VALID,TAG_VALID,PERMANENT} 2 |1260 | {LOCKED,VALID,TAG_VALID,PERMANENT} 3 |1259 | {LOCKED,DIRTY,VALID,TAG_VALID,JUST_DIRTIED,PERMANENT} 4 |1259 | {LOCKED,VALID,TAG_VALID,PERMANENT} 5 |1259 | {LOCKED,VALID,TAG_VALID,PERMANENT} 6 |1249 | {LOCKED,VALID,TAG_VALID,PERMANENT} 7 |1249 | {LOCKED,VALID,TAG_VALID,PERMANENT} 8 |1249 | {LOCKED,VALID,TAG_VALID,PERMANENT} 9 |1249 | {LOCKED,VALID,TAG_VALID,PERMANENT} (9 rows) - If you use this patch, I think that you can easily judge the state of the buffer. Regards. Moon. pg_buffercache add a buffer state column and Add function to decode buffer state_V1.diff Description: Binary data
RE: [HACKERS][PATCH]pg_buffercache add a buffer state column, Add fuction to decode buffer state
# I add [hacker] to the mail subject. Dear Andres Freund. Thank you for review! > I'm disinclined to exposing state that way. It's an internal representation > that's not unlikely to change. Sure, > pg_buffercache is more of a debugging / investigatory tool, but I > nevertheless see no reason to expose it that way. Okay! I'll not print(or add) the internal value directly. (and I'll be careful when create another patch). Thank you > One way around that would be to create a buffer_state type that's returned by > pg_buffercache and then only decoded when > outputting. Doing that + having a cast to an array seems like it'd provide > most of the needed functionality? It's it better to output the decode state value from pg_buffercache view? For example to following output - postgres=# select * from pg_buffercache where bufferid = 1; -[ RECORD 1 ]+--- bufferid | 1 relfilenode | 1262 reltablespace| 1664 reldatabase | 0 relforknumber| 0 relblocknumber | 0 isdirty | f usagecount | 5 pinning_backends | 0 buffer_state | {LOCKED,VALID,TAG_VALID,PERMANENT} - It's right? If it is correct, I'll modify patch ASAP. Regards. Moon. > -Original Message- > From: Andres Freund [mailto:and...@anarazel.de] > Sent: Tuesday, November 14, 2017 6:07 PM > To: Moon Insung > Cc: 'PostgreSQL Hackers' > Subject: Re: [PATCH]pg_buffercache add a buffer state column, Add fuction to > decode buffer state > > Hi, > > On 2017-11-14 17:57:00 +0900, Moon Insung wrote: > > So I add a state column to pg_buffercache view so that I could print a > > value indicating the state of the buffer. > > This is outpu as an unit32 type, and examples are shown below. > > > - > > postgres=# select * from pg_buffercache where bufferid = 1; -[ RECORD > > 1 ]+--- > > bufferid | 1 > > relfilenode | 1262 > > reltablespace| 1664 > > reldatabase | 0 > > relforknumber| 0 > > relblocknumber | 0 > > isdirty | f > > usagecount | 5 > > pinning_backends | 0 > > buffer_state | 2203320320 <- it's a new column > > - > > I'm disinclined to exposing state that way. It's an internal representation > that's not unlikely to change. Sure, > pg_buffercache is more of a debugging / investigatory tool, but I > nevertheless see no reason to expose it that way. > > If we shared those flags more in a manner like you did below: > > 1 |1262 | {LOCKED,VALID,TAG_VALID,PERMANENT} > > that'd be more acceptable. However doing that by default would have some > performance downsides, because we'd need to > create these arrays for every row. > > One way around that would be to create a buffer_state type that's returned by > pg_buffercache and then only decoded when > outputting. Doing that + having a cast to an array seems like it'd provide > most of the needed functionality? > > Greetings, > > Andres Freund