FYI: jOOQ blog

2018-03-22 Thread Bear Giles
​If you want to know how PostgreSQL compares to other databases, or are
looking for ideas on areas to improve, the jOOQ blog looks like a good
resource: https://blog.jooq.org/. jOOQ is a java library that provides a
database-agnostic way to access many different types of databases. I know
it does some query optimizations but I don't know how advanced it is. The
other reason for them to track advanced functionality available in some
databases is to ensure that they support users who want to use those
features.

Most of the blog is specific to java or its library but there's also
frequent articles on advanced functionality (row values, window functions,
etc.), query optimization, and comparisons between the databases. The
comparisons of query optimizations in particular might be good low hanging
fruit since it doesn't involve any user-facing changes, just recognizing a
few additional patterns in the query parse tree and knowing a more
efficient way to perform equivalent work.

Bear


Re: [GSoC Idea Discussion] "Thrift datatype support" Project

2018-02-13 Thread Bear Giles
Isn't thrift the communications protocol?

Do we have foreign server support for parquet and ORC files?

On Tue, Feb 13, 2018 at 8:40 AM, Udit Juneja  wrote:

> Hi,
>
> I am Udit Juneja, a Computer Science undergraduate student at Thapar
> Institute of Engineering and Technology, India. I am interested in
> contributing to PostgreSQL.
>
> A brief introduction about me:
> I am familiar with programming languages (C, C++, Python, SQL).
>
> I am interested in "Thrift datatype support"  project. I have already
> started exploring PostgreSQL, and the libraries involved in this project.
>
> I would like to discuss the idea further and maybe submit a proposal in
> GSoC later.
>
> Regards,
> Udit Juneja
>


Re: [GSoC Idea Discussion] "Thrift datatype support" Project

2018-02-13 Thread Bear Giles
On Tue, Feb 13, 2018 at 10:24 AM, Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:

> Technically speaking, Thrift is "language-independent"
> serialization-deserialization format with clean approach to backward
> compatibility.
>
> I think Thrift (or something like that) can be useful, as it can generate
> serializers/deserializers for lots of languages.
> PostgreSQL's "binary" format is tied to the PostgreSQL and it has to be
> reimplemented for each and every client language.
>
> "text" format is non-trivial as well (e.g. it is hard to get quoting right
> for structs, and text is server-locale-dependent)
>
> Vladimir
>

​​Thanks. I've been helping a coworker with an older parquet writer and
​the URL for the hive metastore uses a thrift:// scheme. That is why I
thought it is a communications protocol.​


Re: Precision loss casting float to numeric

2018-02-26 Thread Bear Giles
On Mon, Feb 26, 2018 at 11:29 AM, Tom Lane  wrote:

> Chapman Flack  writes:
> > The 0002-*.patch is a proof-of-concept patching float4_numeric and
> > float8_numeric in the trivial way (just using FLT_DECIMAL_DIG and
> > DBL_DECIMAL_DIG in place of FLT_DIG and DBL_DIG). It makes the new
> > regression test pass. (It will only work under a compiler that has
> > __FLT_DECIMAL_DIG__ and __DBL_DECIMAL_DIG__ available, and I used
> > those internal versions to avoid mucking with build tooling to change
> > the target C standard, which I assume wouldn't be welcome anyway.
>
> Nope.  TBH, I'd think about just using "DBL_DIG + 3", given our existing
> coding around extra_float_digits in places like pg_dump and postgres_fdw.
> The knowledge that you need 2 or 3 extra digits is already well embedded.
>
> Conceivably you could do it like
>
> #ifndef DBL_DECIMAL_DIG
> #ifdef __DBL_DECIMAL_DIG__
> #define DBL_DECIMAL_DIG __DBL_DECIMAL_DIG__
> #else
> #define DBL_DECIMAL_DIG (DBL_DIG + 3)
> #endif
> #endif
>
> but I'm not exactly seeing how that buys us anything.
>
> The bigger question here is whether people actually want this behavioral
> change.  I think there's probably a bigger chance of complaints that
> "casting 1.1::float8 to numeric now produces some weird,
> incorrectly-rounded result" than that we make anyone happier.
>
> I have a vague idea that at some point in the past we discussed making
> this conversion use extra_float_digits, which'd allow satisfying both
> camps, at the nontrivial price that the conversion would have to be
> considered stable not immutable.  We didn't pull the trigger, if this
> memory is real at all, presumably because of the mutability issue.
>
> Another idea would be to leave the cast alone and introduce a named
> function that does the "exact" conversion.  Possibly that makes nobody
> happy, but at least both the cast and the function could be immutable.
> It'd dodge backwards-compatibility objections, too.
>
> regards, tom lane
>

​Working for a company that ​
has enterprise customers this can't be overemphasized.
Never require the user to do something so they keep getting the same
results.​
​
​
​ It doesn't
matter if it's "wrong".

​I would vote for a property. If you want the best effort to match the IEEE
spec
you need to execute 'set use_ieee_numbers'  and you'll get the extra digits
and
rounding behavior. If not ​you'll get the existing behavior.

Bear


Re: Record last password change

2018-12-12 Thread Bear Giles
Could you add your own UPDATE trigger to the password table? It can write
an entry to a new table, e.g., (userid, current date) whenever a record in
that table is modified.

On an earlier question - the issue isn't whether someone can crack your
password, it's possible disclosure in archive media somewhere. E.g., a
classic example is someone who accidently commits source code that contains
a password and then reverts it. It's not in the current source code but
without a lot of effort (not always possible) it could be retrieved by
anyone with access to the commit history. If you change your password every
few months this will soon be a moot issue even if the person doesn't
mention this to someone who can change the password immediately.

A more subtle point is backups. An attacker might have had access to
encrypted backups (or regular backups containing encrypted records) for a
very long time and held onto them against the chance discovery of the
password. Once they learn it they have access to all of that data. If you
rotate the passwords they might have access to a few months of data but no
more than that. It's bad, but a few months is far better than a few years
if your data contains information that requires notification of everyone
affected and the offer of credit monitoring, etc.

I agree that people may choose bad passwords if forced to change them too
frequently but I'm in the camp that says it's fine to use a password
manager or even to write them down on a card kept in the person's wallet.

BTW another solution is SSO, e.g., Kerberos. I still need to submit a patch
to pgsql to handle it better(*) but with postgresql itself you sign into
the system and then the database server will just know who you are. You
don't have to worry about remembering a new password for postgresql. X.509
(digital certs) are another possibility and I know you can tie them to a
smart card but again I don't know how well we could integrate it into pgsql.

(*) I haven't looked at the code recently but the last time I checked pgsql
used the username/password combo. Enterprise environments usually use
keytab files instead of (u/p). It should also be smart enough to check if
the user already has a kerberos ticket and use it if nothing else is
specified. The latter would usually work with people. (I'm not sure what
happens in a more secure environment where the database expects the service
to be specified as well - users would need the username/postgresql@REALM
identity, not the more generic username@REALM identity.)



On Tue, Dec 11, 2018 at 12:04 PM Chapman Flack 
wrote:

> On 12/11/18 9:56 AM, Tom Lane wrote:
> > I've heard that if you want to implement a password aging policy, PAM
> > authentication can manage that for you; but I don't know the details.
>
> Interesting idea ... could use pam-pgsql[1] and PAM as the
> authentication method. Might result in another connection (from PAM)
> to authenticate every connection, though. I suppose the module could
> use a daemon keeping one connection open for auth queries, but the
> README doesn't *say* it does. Could set up a pooler just for the auth
> module to connect through, I guess.
>
> It allows you to configure arbitrary auth_query, acct_query, pwd_query,
> etc., so you could conceivably join pg_authid with some other table
> where you'd keep expiration info.
>
> Looks like our PAM authentication might not support some PAM
> capabilities like conducting additional message exchanges (for
> example, to prompt for a new password on the spot if the old
> one has expired).
>
> It might be possible to shoehorn that capability into the existing
> fe-be protocol by calling it a custom SASL method, something analogous
> to ssh's "keyboard-interactive"[2].
>
> -Chap
>
>
> [1] https://github.com/pam-pgsql/pam-pgsql
> [2] https://tools.ietf.org/html/rfc4256
>
>


Re: [FEATURE REQUEST] Encrypted indexes over encrypted data

2018-08-09 Thread Bear Giles
There are alternatives. If you know what you want to find, e.g., a search
by username or email address, you can store a strong hash of the value as
an indexed column. By "strong hash" I mean don't just use md5 or sha1, or
even one round with a salt. I can give you more details about how and why
offline.

So you might have a record with:

   id serial primary key,
   email_hash text not null indexed,
   first_name_hash text indexed,
   last_name_hash text indexed,
   phone_number_hash text indexed ,
   'wallet'_containing_all_encrypted_values text

and that allows you to search on email, first name, last name, or phone
number, or some combination on them. But no expressions. The hashing would
be done in your app, not the database. You also probably want to convert
everything to lowercase, maybe remove spaces, etc., before computing the
hash.

You should be prepared to handle multiple matches. It's unlikely that an
email or phone number hash won't be unique but it's safest to always be
prepared for more than one match, decrypt the 'wallet', and then do a final
comparison. That also gives you a bit of protection from an attacker
creating an account and then changing the hash values to match someone
else. You can use that to support very limited expressions, e.g., also keep
a hash on the first three letters of their last name, but that will
compromise your security a bit since it allows an attacker to perform some
statistical analysis on the data.

Finally there's the general advice that hashes (and encrypted values)
should always have a version number of some sort. It could be something as
simple as 3$hash, or it could be a composite column or even a user-defined
type. The # indicates is a lookup into a table, perhaps in your app, that
tells you which hashing algorithm and salt to use. It makes life a lot
easier if the security audit tells you that you need to change your
cipher/salt/key/whatever but you can't do it immediately since you don't
know everything you need in order to do it, e.g., the password that you
need in order to recompute the hash value. With that version number it's
easy to continue to accept the existing password so they can log in, and in
the background you quietly recompute the hash using the new
salt/algorithm/whatever and update their record. I've worked for some
pretty knowledgeable companies that have overlooked this.

On Thu, Aug 9, 2018 at 6:05 AM, Andres Freund  wrote:

>
>
> On August 9, 2018 5:30:26 PM GMT+05:30, Danylo Hlynskyi <
> abcz2.upr...@gmail.com> wrote:
> > ?Is it hard to implement soluition 2?
>
> Yes.
>
> To the point that I'm fairly certain that an implementation would be
> considered to costly to maintain (vs benefit) of proposed.
>
> Andres
>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>
>


Re: [FEATURE REQUEST] Encrypted indexes over encrypted data

2018-08-09 Thread Bear Giles
Some regulatory standards require all UII, even all PII, information be
encrypted within the database, not just on encrypted media. That's to
reduce exposure even if someone gets access to a live server, e.g., via SQL
Injection. (The perennial #1 risk for software vulnerabilities.)

UII is uniquely identifiable information, e.g., SSN.

PII is personally identifiable information, e.g, email address, phone
number, address. It doesn't have to be enough to uniquely identify the
person, just enough to cut the number of possible individuals down to a
handful. That's a surprising large number of fields, e.g., knowing where
someone was born and their birthdate will get you close to their SSN for
anyone born after the IRS started requiring SSNs for all claimed
dependents. Knowing someone's birth date and city of residence will get you
down to a handful of individuals, often a single individual depending upon
their age and the size of their city. It's remarkably easy to uniquely
identify something like 75% of the population if you have the data from a
couple different sites and some way to correlate the records. (That's why
it's not good enough to just use the sha1 of an email address, etc.)

I know the government required UII encryption in its databases when I last
worked on a government contract, and I think they've required PII
encryption as well for years. I would be verify surprised if HIPAA doesn't
require that as well for PII in addition to the medical info. I definitely
know PCI-DSS requires encryption of all information on the credit card
itself - you can keep the last few digits (I think 6 are allowed but for is
recommended) to facilitate searches. Of course companies could still have
the same information unencrypted in other columns or tables (except for the
CC number itself - and you *never* keep the CVN in any form on threat of
losing your ability to accept credit cards if you're caught) but they were
encouraged to encrypt it as well.

Anyway legal requirements is "#0" on that list. Everything else *might*
happen but depending upon the nature of the data you *will* be audited for
compliance with regulations, either preemptively (e.g., VISA requires
periodic audits of anyone making more than $X in transactions per year) or
after a breach. One of my other past employers did the type of auditing
VISA requires and their promotional material was full of interviews with
former small business owners who lost their business after a breach. It
wasn't due to the loss itself, it's because any breach automatically
requires the strictest auditing for the next (4?) years and that cost far
more than the average independent restaurant, auto repair shop, etc., can
afford. Obviously their business model is (in part) to scare people but
there are plenty of situations where you have to encrypt data within the
database and not just rely on encrypted media.




> Here are some threats you might choose to protect against:
>
> 1) passive attackers on the wire
> 2) active  attackers on the wire
> 3a) theft / compromise of storage devices
> 3b) compromise of decommissioned storage devices
> 3c) theft of running server
> 4) compromised backup storage
> 5) bad / compromised clients
> 6) bad / compromised DBAs or sysadmins
> 7) side channel exploits
> 8) ??
>
> (1) and (2) are taken care of by TLS.
>
> (3a) is taken care of by FDE in controllers, say, or by physical
> security.
>
> (3b) is taken care of by proper decommissioning, but FDE helps.
>
> (3c) you can't protect against if you have keys in memory.  You could
> use client-side crypto, but you'll have more clients to worry about than
> servers.  Physical security is your best option.  (And really, you don't
> get any way to protect against law enforcement taking the devices.)
>
> (4) is taken care of by encrypting backups, which requires no changes to
> PG to get.
>
> (5) is taken care of (to some degree) by server-side logic (triggers,
> ...).
>
> (6)...  You can't protect against sysadmins, really, nor DBAs, but you
> can use crypto on the *client*-side to get some protection.  Since the
> PG client is very thin and dumb, the PG client can't easily do this.
> The idea is to encrypt values and MAC/sign rows to prevent DBAs/
> sysadmins seeing sensitive data or tampering with your data.
>
> (7) one deals with by using crypto implementations built with side
> channel protection, though, really, this is a very difficult subject in
> general, especially since Spectre.
>
> Nico
> --
>
>


Re: What does Time.MAX_VALUE actually represent?

2017-12-31 Thread Bear Giles
​You don't need to store 25:20 in the database though - your app can use a
window that treats a day as "from 5 am today until 5 am tomorrow" and adds
24:00 to the times for tomorrow.​

Bear

On Sat, Dec 30, 2017 at 2:25 PM, Gavin Flower  wrote:

> On 12/31/2017 03:07 AM, Dave Cramer wrote:
>
>> We are having a discussion on the jdbc project about dealing with
>> 24:00:00.
>>
>> https://github.com/pgjdbc/pgjdbc/pull/992#issuecomment-354507612
>>
>> Dave Cramer
>>
>
> In Dublin (I was there 2001 to 2004), Time tables show buses just after
> midnight, such as 1:20am as running at the time 2520 - so there are visible
> close to the end of the day.  If you are looking for buses around midnight
> this is very user friendly - better than looking at the other end of the
> time table for 0120.
>
> I think logically that 24:00:00 is exactly one day later than 00:00:00 -
> but I see from following the URL, that there are other complications...
>
>
> Cheers,
> Gavin
>
>
>


pgcrypto: better memory management?...

2024-11-07 Thread Bear Giles
I was working on a crypto extension many, many years ago but...life. And I
found the book "Encryption in the Database" which made me rethink many
things. (It describes the approach used by Oracle.)

But between openssl 3 and some other tasks I've been revisiting some of
these ideas and focused on the pgcrypto extension first.

Something that immediately struck me is memory management. There's two
items.

The first is that openssl has supported custom memory management for a very
long time - I remember using it in the late 90s. The function is in crypto.h

 typedef void *(*CRYPTO_malloc_fn)(size_t num, const char *file, int line);
 typedef void *(*CRYPTO_realloc_fn)(void *addr, size_t num, const char
*file,
int line);
 typedef void (*CRYPTO_free_fn)(void *addr, const char *file, int line);
 int CRYPTO_set_mem_functions(CRYPTO_malloc_fn malloc_fn,
  CRYPTO_realloc_fn realloc_fn,
  CRYPTO_free_fn free_fn);

The main benefit of this function is that it openssl-internal functions
will use the same memory pool as your application. I know the current
extension uses palloc/pfree but I didn't see a call to this function.

I didn't have any problems with my extension but I never put the server +
extension under a heavy load. There may be problems with this approach -
but I think the benefits of a single memory pool are large enough to
warrant investigating this.

The second item is that openssl also has these wonderful functions:

int CRYPTO_secure_malloc_init(size_t sz, size_t minsize);
int CRYPTO_secure_malloc_done(void);
void *CRYPTO_secure_malloc(size_t num, const char *file, int line);
void *CRYPTO_secure_zalloc(size_t num, const char *file, int line);
void CRYPTO_secure_free(void *ptr, const char *file, int line);
void CRYPTO_secure_clear_free(void *ptr, size_t num,
  const char *file, int line);
int CRYPTO_secure_allocated(const void *ptr);
int CRYPTO_secure_malloc_initialized(void);
size_t CRYPTO_secure_actual_size(void *ptr);
size_t CRYPTO_secure_used(void);

void OPENSSL_cleanse(void *ptr, size_t len);

These functions address (at least) two potential vulnerabilities. First it
tweaks the memory block so it will never be written to the swap file, and
second it (iirc) also tweaks the memory block to reduce the access
permissions further than usual.

These functions should only be used for sensitive values, e.g., unprotected
keys. There's also the usual advice to minimize the amount of time the
sensitive data is kept in memory - do a clear + free immediately after use,
not as the last thing you do before your function returns.

It's important to remember that this memory pool is managed separately from
the rest. It has to be - it has to be able to manipulate the mmap flags.

Of course we can use a conditional #define to define these functions if the
user is running an old version of libssl. I would have to research when
they were introduced.

Broader applicability

I want to point out that the latter functions may be useful in other parts
of the system. E.g., the client session may have sensitive information that
should be as protected as possible. However that may not be possible since
it would require the server, not just the extension, to load and initialize
libssl

Sample code

I'll create my own github fork so it will be easy to create pull requests
for further discussion.


Bear


pgcrypto: PGP keyring FDW ?...

2024-11-07 Thread Bear Giles
I wanted to bounce a few related ideas off people.

Foreign Data Wrappers

One of the biggest vulnerabilities of the current system is the need for
the client to know the PGP key. There's a better alternative for many (not
all) use cases - loading the PGP key from a file located on the server or
securely downloaded from a trusted source on an as-needed basis.

In other words - a Foreign Data Wrapper for PGP keyrings. The initial
milestone could be a readonly table containing:

 - keyid
 - alg
 - not_before
 - not_after
 - public key
 - secret key (if available)

I've included type --list-keys below. I know the FDW should include the
[SCE] flags and '[unknown|lultimate|...] values, and possibly the uid
values. (It would need to be text[]). Secret keys add a 'ssb' element.

UDT and UDF

This still exposes a lot of information so it makes sense to consider a UDT
for PGP_KEY and possibly more. This UDT can be opaque - the secret key can
be write-only. Otherwise the UDT can be a drop-in replacement for the
existing text and bytea UDFs.

A similar idea, one that I'm still exploring, is PGP_CIPHERTEXT. It's a
drop-in replacement for bytea but makes it easy to add type-safe functions
to extract metadata from the ciphertext.

The keyring FDW would return a PGP_KEY for the secret key. This prevents
the exposure of a secret key loaded from a keyring.


Sample output of --list-keys

pub   rsa4096 2011-02-08 [SCE]
  BFC61E9C8BE9942EAE818A95E5D9A0503AD31D0B
uid   [ unknown] Fedora-SPARC (15) 
sub   elg4096 2011-02-08 [E]

pub   rsa4096 2015-07-25 [SCE]
  5048BDBBA5E776E547B09CCC73BDE98381B46521
uid   [ unknown] Fedora (24) 

pub   rsa4096 2015-07-27 [SCE]
  8C6E5A80A399BABE791985BAB8635EEB030D5AED
uid   [ unknown] Fedora Secondary (24) <
fedora-24-second...@fedoraproject.org>
sub   elg2048 2015-07-27 [E]

pub   rsa4096 2016-03-31 [SCE]
  C437DCCD558A66A37D6F43724089D8F2FDB19C98
uid   [ unknown] Fedora 25 Primary (25) <
fedora-25-prim...@fedoraproject.org>

pub   rsa4096 2016-03-31 [SCE]
  838BD48E1B70069F4111BDE91A185CDDE372E838
uid   [ unknown] Fedora 25 Secondary (25) <
fedora-25-second...@fedoraproject.org>
sub   elg2048 2016-03-31 [E]

pub   rsa4096 2016-09-09 [SCE]
  E641850B77DF435378D1D7E2812A6B4B64DAB85D
uid   [ unknown] Fedora 26 Primary (26) <
fedora-26-prim...@fedoraproject.org>

pub   rsa4096 2016-08-08 [SCE]
  19AA0442249191098B3D80354560FD4D3B921D09
uid   [ unknown] Fedora 26 Secondary (26) <
fedora-26-second...@fedoraproject.org>
sub   elg2048 2016-08-08 [E]