from:"Miles Elam"

Re: Elastic Search much faster at statistics?

2019-07-08 Thread Miles Elam

Not enough information. It looks far more like he's testing Ruby's support
for ElasticSearch vs ActiveRecord rather than ES vs PostgreSQL. Caching
could definitely hold a role but also choice of indexes. If ES is
calculating some aggregate info on the fly, the equivalent in PG would be a
stats table updated by trigger or as part of a regularly refreshed
materialized view. That said, if ES does some of that aggregation out of
the box, the convenience by default is compelling for some.

There are indeed cases where a caching engine will outperform a general
purpose data management engine. There are many cases where ACID
requirements preclude the use of a dedicated search engine. Use the right
tool for the job, and for a sufficiently large scale, use multiple tools.

On Mon, Jul 8, 2019 at 9:54 AM Guyren Howe  wrote:

> I find this… surprising. Caching?
>
> http://blog.nrowegt.com/database-vs-elasticsearch-speed-column-statistics/
>

Sorting composite types

2019-08-06 Thread Miles Elam

Is there any way to define a natural sorting order for composite types? For
example, let's say you have a type like:

CREATE TYPE contrived AS (
i1 integer,
i2 integer
);

The semantics of this contrived type are that the natural order is
ascending NULLS first for i1 and descending NULLS last for i2. I know I can
do an ORDER BY per query and specify each part separately, but if my
contrived example has a perceived natural sort order, is there any way to
set that in a global way so that I can simply and universally ORDER BY
my_contrived_column?

REVOKE DROP rights

2019-09-11 Thread Miles Elam

Is there any way to prevent a user from dropping a table when that user has
create rights? I'd like to allow that user to be able to create and delete
their own tables but not specific shared tables.

Is the only way to put the shared tables into a different schema?

Thanks in advance

Re: REVOKE DROP rights

2019-09-11 Thread Miles Elam

Makes sense. Thanks!

On Wed, Sep 11, 2019 at 1:43 PM Tom Lane  wrote:

> Miles Elam  writes:
> > Is there any way to prevent a user from dropping a table when that user
> has
> > create rights? I'd like to allow that user to be able to create and
> delete
> > their own tables but not specific shared tables.
>
> I think maybe you didn't read the manual closely.  Creation privileges
> cover the right to create an object (in a given database or
> schema), but only the creator/owner has the right to drop a particular
> object once it exists.
>
> We do grant the owner of a schema or database the right to drop objects
> within it, since they could surely achieve that result by dropping the
> whole schema or database.  But merely having create privilege doesn't
> extend to that.
>
> So basically you want a shared schema that is owned by some trusted
> role, and your less-trusted roles have create (and usage!) on that
> schema.
>
> regards, tom lane
>

Event Triggers and Dropping Objects

2019-10-04 Thread Miles Elam

The event trigger firing matrix lists tags like DROP TABLE and DROP
FUNCTION are listed below the ddl_command_end event, but when I created a
basic audit table and event trigger, they don't seem to fire. I know
sql_drop exists, but again the matrix lists DROP commands in the
ddl_command_end event.

For example:

CREATE TABLE IF NOT EXISTS ddl_info (

classid oid,

objid oid,

objsubid integer,

command_tag text,

object_type text,

schema_name text,

object_identity text,

in_extension bool,

transaction_id bigint NOT NULL DEFAULT txid_current(),

inserted timestamptz NOT NULL DEFAULT clock_timestamp()

);


CREATE OR REPLACE FUNCTION ddl_log()
RETURNS EVENT_TRIGGER
LANGUAGE plpgsql AS $$
BEGIN
  INSERT INTO ddl_info (
classid, objid, objsubid, command_tag, object_type,
schema_name, object_identity, in_extension
  )

SELECT
classid, objid, objsubid, command_tag, object_type,
schema_name, object_identity, in_extension
  FROM pg_event_trigger_ddl_commands();
END;
$$;

CREATE EVENT TRIGGER aa_ddl_info ON ddl_command_end
EXECUTE PROCEDURE ddl_log();


-


About as simple as I can make it. If I run the following, either in the
same transaction or separately, I see the CREATE TABLE tag, the CREATE
INDEX tag for the primary key, the CREATE FUNCTION tag, and the CREATE
COMMENT for the function, but no DROP tags.


CREATE TABLE test (
  test_id uuid NOT NULL PRIMARY KEY,
  description text
);

CREATE FUNCTION simple_add(integer a, integer b)
RETURNS void
LANGUAGE sql STRICT IMMUTABLE PARALLEL SAFE AS $$
  SELECT a + b;
$$;
COMMENT ON FUNCTION simple_add(integer, integer) IS 'A basic addition';

DROP TABLE test;

DROP FUNCTION simple_add(integer, integer);

-


I didn't see anything obvious in the docs that states that DROP statements
require explicit tagging, so I assumed that not specifying any tags would
include all tags. Is this an oversight in the docs and expected behavior or
is this a bug? Doesn't fire in any version from 9.6 on. I didn't test
versions before 9.6.



Thanks in advance,


Miles Elam

Re: Event Triggers and Dropping Objects

2019-10-05 Thread Miles Elam

Thanks, it does!


On Sat, Oct 5, 2019 at 1:50 AM Luca Ferrari  wrote:

> On Fri, Oct 4, 2019 at 10:38 PM Miles Elam 
> wrote:
> >
> > The event trigger firing matrix lists tags like DROP TABLE and DROP
> FUNCTION are listed below the ddl_command_end event, but when I created a
> basic audit table and event trigger, they don't seem to fire. I know
> sql_drop exists, but again the matrix lists DROP commands in the
> ddl_command_end event.
>
> Yes, I think this is a little misleading:
> <https://www.postgresql.org/docs/11/event-trigger-matrix.html>.
> The ddl_command_end is issued, and the function is invoked, but
> pg_event_trigger_ddl_commands() returns NULL on such invocation
> because sql_drop is the event with the attached data.
>
> Hope this helps,
> Luca
>

Event Triggers and GRANT/REVOKE

2019-10-09 Thread Miles Elam

GRANT and REVOKE trigger on a ddl_command_end event trigger but don't
provide any information beyond whether it was a table, schema, function,
etc. that was affected. No object IDs or the like are included. How would
you find out which table had its ACLs modified?

Also, why do grants and revokes have an object_type of 'TABLE' instead of
lower case names like 'table' for all other event types?


Thanks,

Miles Elam

Re: Event Triggers and GRANT/REVOKE

2019-10-09 Thread Miles Elam

Using my example below from another thread, GRANTs and REVOKEs leave all
fields NULL except for command_tag (set to 'GRANT' or 'REVOKE'),
object_type (set to upper case target like 'TABLE'), and in_extension (set
to whatever is appropriate, but typically false).

-

CREATE TABLE IF NOT EXISTS ddl_info (
  classid oid,
  objid oid,
  objsubid integer,
  command_tag text,
  object_type text,
  schema_name text,
  object_identity text,
  in_extension bool,
  transaction_id bigint NOT NULL DEFAULT txid_current(),
  inserted timestamptz NOT NULL DEFAULT clock_timestamp()
);

CREATE OR REPLACE FUNCTION ddl_log()
RETURNS EVENT_TRIGGER
LANGUAGE plpgsql AS $$
BEGIN
  INSERT INTO ddl_info (
classid, objid, objsubid, command_tag, object_type,
schema_name, object_identity, in_extension
  )
  SELECT
classid, objid, objsubid, command_tag, object_type,
schema_name, object_identity, in_extension
  FROM pg_event_trigger_ddl_commands();
END;
$$;

CREATE EVENT TRIGGER aa_ddl_info ON ddl_command_end
EXECUTE PROCEDURE ddl_log();

On Wed, Oct 9, 2019 at 2:27 PM Adrian Klaver 
wrote:

> On 10/9/19 1:56 PM, Miles Elam wrote:
> > GRANT and REVOKE trigger on a ddl_command_end event trigger but don't
> > provide any information beyond whether it was a table, schema, function,
> > etc. that was affected. No object IDs or the like are included. How
> > would you find out which table had its ACLs modified?
>
> What is the code for trigger and function?
>
> >
> > Also, why do grants and revokes have an object_type of 'TABLE' instead
> > of lower case names like 'table' for all other event types?
> >
> >
> > Thanks,
> >
> > Miles Elam
> >
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

Re: DDL support for logical replication

2019-10-10 Thread Miles Elam

Hi Lev,

While I don't have an answer to your roadmap question, you've raised a
different question for me.

What are you expecting to get from logical replication of DDL commands that
is not served by binary replication?  I ask because typically someone would
want to use logical replication if they wanted triggers to fire on the
subscriber, they only a subset of all tables replicated, etc.

Perhaps a better question would be "What problem are you trying to solve?"
rather than focus on how you expected to solve that problem.

Cheers,

Miles Elam

On Thu, Oct 10, 2019 at 11:08 AM Lev Kokotov  wrote:

> Hello,
>
> Is DDL support on a roadmap for logical replication?
>
> Thank you.
> - Lev
>

Re: ERROR: there is no unique constraint matching given keys for referenced table "audit_p"

2019-11-18 Thread Miles Elam

I was under the impression that PostgreSQL 12 removed this limitation. Was
this incorrect?
https://www.2ndquadrant.com/en/blog/postgresql-12-foreign-keys-and-partitioned-tables/

On Mon, Nov 18, 2019 at 9:58 AM Michael Lewis  wrote:

> On Mon, Nov 18, 2019 at 10:10 AM Shatamjeev Dewan 
> wrote:
>
>> I am trying to create a foreign key constraint on a table : audit_param
>> in postgres 12 which references  partitioned table audit_p. is there anyway
>> to get rid of this error.
>>
>>
>>
> *ERROR:  there is no unique constraint matching given keys for referenced
>> table "audit_p"*
>>
>
> As far as I understand, what you want is not possible. You cannot
> partition on a timestamp and then foreign key to an object_id natively. You
> can get around this with triggers-
> https://www.depesz.com/2018/10/31/foreign-key-to-partitioned-table-part-2/ -
> but that would not be recommended.
>
> Admittedly, I have a pretty nasty head cold so it may be that someone
> chimes in with much better insight on your design.
>
> By the way, there are strong recommendations for using timestamp WITH
> TIMEZONE in nearly all cases. I just wanted to mention since timestamp
> without timezone is rarely the best choice.
>

Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?

2019-12-10 Thread Miles Elam

In terms of "wasted computation", MD5, SHA1, and the others always compute
the full length before they are passed to a UUID, int, or whatever. It's a
sunk cost. It's also a minor cost considering many hash algorithms are
performed in CPU hardware now. All that's left is the truncation and cast,
which you can't avoid easily.


Sure, you could reimplement Java's .hashCode() method by iterating through
the characters and processing the character codes:

s[0]*31^(n - 1) + s[1]*31^(n - 2) + ... + s[n - 1]


I don't see how that would beat the CPU-based hashes though unless you
wrote a C-based extension. Maybe it's better just to embrace the
user-defined function first and then decide if performance is insufficient
for your use cases.


CREATE EXTENSION IF NOT EXISTS pgcrypto;

CREATE OR REPLACE FUNCTION hash8 (p_data text, p_algo text = 'md5') RETURNS
int8 AS $$

SELECT ('x' || encode(substring(digest(p_data, p_algo) FROM 1 FOR 16),
'hex'))::bit(64)::int8

$$ LANGUAGE sql IMMUTABLE PARALLEL SAFE;


CREATE OR REPLACE FUNCTION hash4 (p_data text, p_algo text = 'md5') RETURNS
int4 AS $$

SELECT ('x' || encode(substring(digest(p_data, p_algo) FROM 1 FOR 8),
'hex'))::bit(32)::int4

$$ LANGUAGE sql IMMUTABLE PARALLEL SAFE;


SELECT

hash4('something something something'),

hash4('something something something', 'sha1'),

hash8('something something something'),

hash8('something something something', 'sha1');



Cheers,


Miles

On Tue, Dec 10, 2019 at 1:12 PM Erwin Brandstetter 
wrote:

> I am looking for stable hash functions producing 8-byte or 4-byte hashes
> from long text values in Postgres 10 or later.
>
> There is md5(), the result of which can be cast to uuid. This reliably
> produces practically unique, stable 16-byte values. I have usecases where
> an 8-byte or even 4-byte hash would be good enough to make collisions
> reasonably unlikely. (I can recheck on the full string) - and expression
> indexes substantially smaller. I could truncate md5 and cast back and
> forth, but that seems like a lot of wasted computation. Are there
> suggestions for text hash functions that are
> - fast
> - keep collisions to a minimum
> - stable across major Postgres versions (so expression indexes don't break)
> - croptographic aspect is not needed (acceptable, but no benefit)
>
> There is an old post from 2012 by Tom Lane suggesting that hashtext() and
> friends are not for users:
>
> https://www.postgresql.org/message-id/24463.1329854466%40sss.pgh.pa.us
>
> Postgres 11 added hashtextextended() and friends to generate bigint
> hashes. In a more recent post from 3 months ago, Tom suggests to use it in
> user-land - if portability is not needed:
>
> https://www.postgresql.org/message-id/9434.1568839177%40sss.pgh.pa.us
>
> Is pghashlib by Marko Kreen my best option?
>
> https://github.com/markokr/pghashlib
>
> Or the "version-independent hash functions for PostgreSQL" from Peter
> Eisentraut:
>
> https://github.com/petere/pgvihash
>
> Neither received updates for a couple of years. Unmaintained? Or obsolete?
> And neither is available on most hosted services like RDS or Heroku (which
> would be required in come cases).
>
> So what are my best options?
>
> Regards
> Erwin
>

Table Interfaces (Inheritance again. I know. I'm sorry.)

2020-01-06 Thread Miles Elam

Searching through the lists for "inheritance" yields a War and Peace level
of content back. "inheritance harmful" and "inheritance interface" didn't
answer my question.

What about cross-cutting concerns like what might find with interfaces in
many OO languages? For example:

  CREATE TABLE auditable (
id uuid NOT NULL,
actor varchar NOT NULL, -- username
created timestamptz NOT NULL
  );

  CREATE TABLE authable (
id uuid NOT NULL,
actor varchar NOT NULL, -- username
access_tags varchar[]
  );

  CREATE TABLE a (  ...  ) INHERITS (auditable);
  CREATE TABLE b (  ...  ) INHERITS (auditable, authable);
  CREATE TABLE c (  ...  ) INHERITS (authable);

Most of the discussion I've seen regarding inheritance has centered on
cases where a base table with foreign key relationships for specialization
would suffice. In the case above, it serves ONLY to enforce consistent
style and allow for utility functions to share logic due to the
aforementioned consistent naming. Uniqueness is not necessary (but UUIDs
handle that anyway), the tableoid is present for reverse lookups when
needed, defaults, check constraints, et al. are not a concern. But as an
"interface", the hierarchy isn't like one would find in traditional OOP
extends, more like Java's interfaces.

There are also avenues for using event triggers to detect the creation of
tables that conform to certain "interfaces" to automatically add certain
logic. For example:

  CREATE TABLE track_modified (
last_modified timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP
  );

  CREATE TABLE x ( ... ) INHERITS (track_modified);
  CREATE TABLE y ( ... ) INHERITS (track_modified);
  CREATE TABLE z ( ... ) INHERITS (track_modified);

Where an event trigger would fire, see that each table has a last_modified
column, and an expectation to update the last modification a la MySQL's ON
UPDATE CURRENT_TIMESTAMP.

An empty parent table could even be useful for an event trigger to add a
regular set of policies without repeating yourself (and keeping them all in
sync after the fact).

That said, I've also seen messages dating all the way back to 2005 claiming
that INHERITS is a red-headed stepchild that would be phased out shortly
after table partitioning landed and matured. Is this still the intention?
Is inheritance just considered such an old feature that no one dare remove
due to breaking users? Or have folks come to find good use cases for it
even though it's used for far too many use cases?

I apologize in advance for stirring a pot, encouraging the beating of an
already dead horse, or exhausting with cliches. I have seen the notices on
the wiki about inheritance being useful for temporal logic but not much
else since proper table partitioning was introduced. By and large I agree
with the reasoning, especially with regard to unique keys and their lack of
propagation. It just didn't seem to address the interface model one way or
another.

- Miles Elam

Re: a proposal for a new functionality: "SELECT * [EXCEPT col1 [,col2]]

2020-02-25 Thread Miles Elam

How do you see this syntax working in a JOIN query?

SELECT x.* EXCEPT x.col1, x.col2, y.col1
FROM tablex AS x
  LEFT JOIN tabley AS y;

The column(s) you want to exclude become ambiguous. Parentheses?

SELECT x.* EXCEPT (x.col1, x.col2), y.col1
FROM tablex AS x
  LEFT JOIN tabley AS y;

Could work, but this is encouraging the use of the wildcard selector, which
I'm not sure is a productive or maintainable goal. In exchange for
flexibility, you've added a non-trivial amount of comprehension complexity.
I'm not a big fan of the wildcard selector except in the most trivial cases
and even then only as part of development toward a final query with all
columns specified. Then again I try not to have tables with hundreds of
columns (or even tens in most cases), so my own use cases may bias
me. Personally I just don't like queries where I cannot clearly see what it
being returned to me. Anything that makes that ambiguity more popular will
be viewed with a skeptical eye.

On Tue, Feb 25, 2020 at 2:18 AM Stanislav Motyčka <
stanislav.moty...@gmail.com> wrote:

> Hello,
>
> Sometimes (for tables with many columns) it would be better and easier to
> write "SELECT" statement with clause "EXCEPT":
> "SELECT * [EXCEPT col1 [,col2]] FROM ..."
>
> It's easier to write "except" one or two columns from all (*) as to write
> names of all columns besides one or two.
> What do you thin about it?
>
> Best regards
> Stano Motycka
>
>

Re: a proposal for a new functionality: "SELECT * [EXCEPT col1 [,col2]]

2020-02-25 Thread Miles Elam

On Tue, Feb 25, 2020 at 6:41 AM Josef Šimánek 
wrote:

>
> út 25. 2. 2020 v 15:35 odesílatel Miles Elam 
> napsal:
>
>> How do you see this syntax working in a JOIN query?
>>
>> SELECT x.* EXCEPT x.col1, x.col2, y.col1
>> FROM tablex AS x
>>   LEFT JOIN tabley AS y;
>>
>> The column(s) you want to exclude become ambiguous.
>>
>
> Can you explain how are those column(s) ambiguous in your example? I would
> expect to select everything from table x (as SELECT x.* should do) except
> x.col1 and x.col2. Nothing is selected from table y thus y.col1 is not
> relevant here (the question is if this is problem or not - raise, ignore?).
>

Do you mean
  "select everything from tablex except for tablex.col1, and also select
tablex.col2 and tabley.col1"
or
  "select everything from tablex except for tablex.col1 AND tablex.col2,
and also select tabley.col1"
?

It's entirely possible to specify a column twice. It's quite common for me
to see what fields I need from a table by doing a "SELECT * ... LIMIT 1"
and then "SELECT col1, * ... LIMIT 1" as I refine the query, eventually
eliminating the wildcard when I'm done. (When I'm using an IDE that doesn't
support SQL table/column autocomplete.)

EXCEPT would need to be scoped as to which columns it's meant to be
excluding without ambiguity. Just reading from the column list until you
hit another table's columns or a function strikes me as far too loose.

RETURNING to_jsonb(...)

2020-05-05 Thread Miles Elam

How can the new record returned from RETURNING to converted to jsonb?

For example something like:

INSERT INTO my_table (a, b, c) VALUES ('a', 'b', 'c')
RETURNING to_jsonb(*);


or

UPDATE my_table SET a = 'a'
RETURNING to_jsonb(*);


or

INSERT INTO my_table (a, b, c) VALUES ('a', 'b', 'c')

ON CONFLICT DO UPDATE SET a = EXCLUDED.a

RETURNING to_jsonb(*);



Thanks in advance

Re: PG Admin 4

2020-07-10 Thread Miles Elam

I did the same for at least a year, but I must admit that v4 has improved
greatly since its initial release. Also it turns out is handy for running
in a docker-compose environment so no matter who is starting up on your
team, they always have a database and a database admin tool at the ready
along with the app.

On Fri, Jul 10, 2020 at 1:31 PM Susan Hurst 
wrote:

> I gave up on PGAdmin4 and went back to PGAdmin3, although 3 is deprecated
> now.  The History tab will show you what you want after executing a SQL
> statement.
>
> I don't use Windows any more than I have to but when I did try PGAdmin4 on
> windows, the feedback was sometimes there and sometimes not.  Linux works
> better.
>
> That said, I agree that executing a script from a command line is the best
> approach for implementing DDL statements.  You can capture the script and
> the output for further confirmation of success or failure.
>
> Sue
>
>
>
>
>
> ---
> 
> Susan E Hurst
> Principal Consultant
> Brookhurst Data LLC
> Email: susan.hu...@brookhurstdata.com
> Mobile: 314-486-3261
>
> On 2020-07-10 15:20, Tim Clarke wrote:
>
> Why would you shun the ease of command line batch control?
> Tim Clarke MBCS
> IT Director
> Direct: +44 (0)1376 504510 | Mobile: +44 (0)7887 563420
>
> On 10/07/2020 17:36, rwest wrote:
>
> Oh sorry, should have specified that.
>
> We're running on a Windows platform.
>
> We're trying to avoid running anything command-line when doing DDL releases
> and leverage whatever PG Admin 4 can provide us.  I'm just wondering why we
> don't see the results of each CREATE or ALTER statement as the script runs
> in the tool.  That seems very strange to me.  Is there some sort of setting
> or parameter that can force the tool to do this for us?
>
>
>
>
>
> --
> Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
>
>
>
> Telephone: Witham: +44(0)1376 503500 | London: +44 (0)20 3009 0853 |
> Frankfurt: +49 (0)69 7191 6000 | Hong Kong: +852 5803 1687 | Toronto: +1
> 647 503 2848
> Web: https://www.manifest.co.uk/
>
>
>
> Minerva Analytics Ltd - A Solactive Company
> 9 Freebournes Court | Newland Street | Witham | Essex | CM8 2BL | United
> Kingdom
>
>
> --
>
>
>
> Copyright: This e-mail may contain confidential or legally privileged
> information. If you are not the named addressee you must not use or
> disclose such information, instead please report it to ad...@minerva.info
> Legal: Minerva Analytics is the trading name of: Minerva Analytics Ltd:
> Registered in England Number 11260966 & The Manifest Voting Agency Ltd:
> Registered in England Number 2920820 Registered Office at above address.
> Please Click Here https://www.manifest.co.uk/legal/ for further
> information.
>
>

Re: Implement a new data type

2020-08-11 Thread Miles Elam

Also of note: PostgreSQL already has a money type (
https://www.postgresql.org/docs/current/datatype-money.html)
But you shouldn't use it (
https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_money).

I only bring it up so that you can know to make your money type a slightly
different name to avoid a conflict. Money is deceptively hard to implement
correctly. I'd recommend reading the second link if you have not already to
avoid previously known issues.

On Tue, Aug 11, 2020 at 5:02 AM Pavel Stehule 
wrote:

> Hi
>
> út 11. 8. 2020 v 13:31 odesílatel mohand oubelkacem makhoukhene <
> mohand-oubelka...@outlook.com> napsal:
>
>> Hello;
>> I whould like to implement a new data type next to char, number,
>> varchar... A Money type.
>> So i'll have to change the source code, but i don't know which fonctions
>> i need to change and which part to work on.
>> I need some help to implement this new data type.
>>
>
> If you need to implement just a new data type, then you don't need to
> modify Postgres source code. You can write your own extension.
>
> For own custom type you need to write minimally "in" and "out" function,
> and then you can run "CREATE TYPE" statement
>
> one example of custom type can be a extension
> https://github.com/okbob/pgDecimal
>
> Regards
>
> Pavel
>
>
> Thank you and best regards
>> Mohand
>>
>>
>>
>>

MERGE RETURNING

2022-11-23 Thread Miles Elam

Are there any plans to (or specific decisions not to) support a RETURNING
clause on MERGE statements in future versions of Postgres? The only
reference I could find in the mailing list archives was this comment, which
suggested it was desired but simply not technically feasible at the time.

https://www.postgresql.org/message-id/202203161918.qz6phlortw2w@alvherre.pgsql

e.g.

MERGE INTO customer_account ca
USING (SELECT customer_id, transaction_value FROM recent_transactions) AS t
ON t.customer_id = ca.customer_id
WHEN MATCHED THEN
  UPDATE SET balance = balance + transaction_value
WHEN NOT MATCHED THEN
  INSERT (customer_id, balance)
  VALUES (t.customer_id, t.transaction_value)
RETURNING customer_account.*;

Why is this SELECT evaluated?

2023-01-28 Thread Miles Elam

Postgres v15

Given this example of a conversion from a byte array to an int8
masquerading as an "unsigned" int4

  SELECT (get_byte(bytes, byte_offset)::int8 << 24)
 | (get_byte(bytes, byte_offset + 1) << 16)
 | (get_byte(bytes, byte_offset + 2) << 8)
 | (get_byte(bytes, byte_offset + 3))
FROM ( VALUES ('\x01'::bytea, 0) ) b(bytes, byte_offset)
   WHERE length(bytes) >= (4 + byte_offset)
  ;

Why does this error result?

ERROR: index 3 out of valid range, 0..2
SQL state: 2202E

I was under the impression that if the WHERE clause evaluated to
false, the SELECT clause would not be evaluated. Why is get_byte(...)
ever run in the first place even though length(bytes) is 3?

- Miles Elam

Re: Sequence vs UUID

2023-01-28 Thread Miles Elam

On Sat, Jan 28, 2023 at 6:02 PM Ron  wrote:
>
> Type 4 UUIDs are sub-optimal for big table because cache hit rates drop 
> through the floor.
>
> This lesson was burned into my psyche wy back in the Clinton 
> administration.  It was my task to speed up a five hour batch job which read 
> input records from a flat file, did some validations and then inserted them.  
> Simply sorting the input file on the primary key fields -- we used natural 
> keys, not synthetics -- dropped the run time to two hours.  (VMS SORT saved 
> the day, because you could tell it the sort order you wanted; thus, I could 
> preserve the header record at the top of the file, and the trailer record at 
> the end of the file without jumping through a bunch of hoops.)

This can be mitigated with judicious use of a sequence at the front of
the uuidv4.

https://www.2ndquadrant.com/en/blog/sequential-uuid-generators/

More effort than just calling the built-in gen_random_uuid() or
equivalent in app code, but a substantial performance gain for your
effort.

https://github.com/tvondra/sequential-uuids

And in a managed environment where you can't install custom
extensions, a fairly simple function with divide on unix epoch seconds
combined with a call to overlay(...) should suffice performance-wise.
At 60 seconds, this will loop every 45 days or so, and you can choose
how much "sequentialness" works for you, from 1 to 4 bytes at the
expense of pseudo-randomness.

-

-- Generate time interval UUID
CREATE OR REPLACE FUNCTION gen_interval_uuid(interval_seconds int4 =
60, block_num_bytes int2 = 2)
RETURNS uuid LANGUAGE sql VOLATILE PARALLEL SAFE AS $$
  SELECT encode(
overlay(
  -- convert the uuid to byte array
  uuid_send(gen_random_uuid())
  -- use only the bottom bytes
  PLACING substring(
  int4send((extract(epoch FROM now()) / interval_seconds)::int4)
  FROM (5 - block_num_bytes)
  )
  -- place at the front two bytes of the uuid
  FROM 1
)
-- convert the resulting byte array to hex for conversion to uuid
, 'hex')::uuid
  WHERE interval_seconds > 0 AND block_num_bytes BETWEEN 1 AND 4
$$;

-

Technically affecting the v4 spec. You could always convert to a
UUIDv8, which is the intention behind that new version even though the
standard hasn't been ratified yet.

Cheers,

Miles Elam

Re: Sequence vs UUID

2023-01-28 Thread Miles Elam

On Sat, Jan 28, 2023 at 8:02 PM Ron  wrote:
>
> Then it's not a Type 4 UUID, which is perfectly fine; just not random.

Yep, which is why it really should be re-versioned to UUIDv8 to be
pedantic. In everyday use though, almost certainly doesn't matter.

> Also, should now() be replaced by clock_timestamp(), so that it can be
> called multiple times in the same transaction?

Not necessary. Instead of 122 bits of entropy, you get 106 bits of
entropy and a new incremented prefix every minute. now() vs
clock_timestamp() wouldn't make a substantive difference. Should still
be reasonably safe against the birthday paradox for more than a
century when creating more than a million UUIDs per second.

Re: Sequence vs UUID

2023-02-02 Thread Miles Elam

On Wed, Feb 1, 2023 at 10:48 AM Kirk Wolak  wrote:

>
>
> On Wed, Feb 1, 2023 at 1:34 PM veem v  wrote:
>
>>
>> 1) sequence generation vs UUID generation, execution time increased from
>> ~291ms to 5655ms.
>> 2) Insert performance of "sequence" vs "UUID"  execution time increased
>> from ~2031ms to 10599ms.
>> 3) Index performance for sequence vs UUID,  execution time increased from
>> ~.3ms to .5ms.
>>
>>
> Yes, assuming that UUIDs would be efficient as keys when they are randomly
> generated, versus sequences (which tend to expand in one direction, and
> have been relatively optimized for years).
>
> This article explains in detail what is going on.  If I were doing this, I
> would strongly consider a ULID because of getting the best of both worlds.
> https://blog.daveallie.com/ulid-primary-keys
>
> Of course, YMMV...  And since ULIDs are not native to PG, there is
> overhead, but it is far more performant, IMO...
>

Biased comparison. ULIDs have a timestamp component. The closest UUID
equivalent in Postgres is UUIDv1 from the uuid-ossp extension, not v4.
Another difference not mentioned in the blog article is that UUID is
versioned, meaning you can figure out what kind of data is in the UUID,
whereas ULIDs are a "one size fits all" solution.

There is an implementation of sequential UUIDs for Postgres I posted
earlier in this thread. In addition, here is an implementation of UUIDv7
for Postgres:

https://gist.github.com/kjmph/5bd772b2c2df145aa645b837da7eca74

I would suggest running your tests against v1, v7, and sequential UUID
before jumping on ULID, which has no native type/indexing in Postgres.

It should also be noted that apps cannot provide a bigint ID due to
collisions, but an app can generate UUIDs and ULIDs without fear,
essentially shifting the generation time metric in UUID/ULID's favor over a
bigserial.

- Miles

Re: Sequence vs UUID

2023-02-03 Thread Miles Elam

On Thu, Feb 2, 2023 at 11:47 AM veem v  wrote:

> Tested the UUIDv7 generator for postgres as below.
>
> With regards to performance , It's still way behind the sequence. I was
> expecting the insert performance of UUID v7 to be closer to the sequence ,
> but it doesn't seem so, as it's 500ms vs 3000ms. And the generation takes a
> lot longer time as compared to sequence too i.e. 59ms vs 1700ms. Read time
> or the index scan looks close i.e. 2.3ms vs 2.6ms.
>

Thank you for taking the effort in testing and measuring this.

Those numbers make some intuitive sense to me. The function is written in
plpgsql, not C, and is dependent on generating a UUIDv4 and then modifying
it to include the timestamp and version change. While I suspect it will
never beat a bigint by virtue of 64-bits will always be half the size of
128-bit, the read time on the index scan after it is generated is
encouraging with a strong suggestion there's a lot of low-hanging fruit for
improvement.

Also, like UUIDv4, v7 can be generated by clients, ameliorating the
generation bottleneck.

Once again, thank you for following up with good quality analysis.

Re: Sequence vs UUID

2023-02-08 Thread Miles Elam

On Wed, Feb 8, 2023 at 11:56 AM Kirk Wolak  wrote:
>
> CREATE FUNCTION generate_ulid() RETURNS uuid
> LANGUAGE sql
> RETURN ((lpad(to_hex((floor((EXTRACT(epoch FROM clock_timestamp()) * 
> (100)::numeric)))::bigint), 14, '0'::text)
>   || encode(gen_random_bytes(9), 'hex'::text)))::uuid;

You can save yourself some CPU by skipping the extra cast, omitting
the lpad, to_hex, and floor, and just grabbing the bytes from the
bigint directly along with the random part since bytea can be
concatenated.

SELECT encode(int8send((EXTRACT(epoch FROM clock_timestamp()) *
100)::bigint)
  || gen_random_bytes(8), 'hex')::uuid
;

Note that you refer to it as a ULID, but it is stored as a UUID.
Hopefully nothing downstream ever relies on UUID versioning/spec
compliance. Now that I think of it, I could probably speed up my
tagged interval UUID implementation using some of this at the expense
of configurability.

Re: PostgreSQL vs MariaDB

2023-03-24 Thread Miles Elam

On Fri, Mar 24, 2023 at 4:07 AM Inzamam Shafiq
 wrote:
>
> Can someone please list pros and cons of MariaDB vs PostgreSQL that actually 
> needs serious consideration while choosing the right database for large OLTP 
> DBs (Terabytes)?

Think about what you want/need from the database for your project and
then see which engine best satisfies those requirements. All modern
RDBMS engines can handle terabytes of data. Relatively static data
(mostly reads) or heavy insert will be fine with either. With large
databases, UPDATEs and DELETEs may require more maintenance with
Postgres in the form of tweaking autovacuum settings.

https://www.sql-workbench.eu/dbms_comparison.html

For a basic comparison of Postgres vs MariaDB, just uncheck all the
other engines at the top. Go down the list of rows, especially the
ones where only one of the two support a feature. If you don't know
what a feature does or what use cases it would be for, this would
provide an excellent opportunity to become better acquainted with it.

For example, if you end up needing bitemporal temporal table support
out of the box, go with MariaDB. On the other hand if you want the
most flexibility in data types and modeling your data, Postgres is
going to win out.

 * Transactional DDL
 * Clustered indexes
 * DDL event triggers
 * Alter table used in a view

Go down the list. What do you need? What do you think you'll need?
What would be nice to have?

Do you want to host it yourself or go to one of the cloud providers
for a fully managed option? If self-hosted, use whatever you like.
Hack it to your specs. If managed in the cloud, Postgres and MySQL are
more commonly available than MariaDB, but worth checking out ahead of
time.

Cheers,

Miles

Inserts restricted to a trigger

2019-06-17 Thread Miles Elam

Is there are way to restrict direct access to a table for inserts but allow
a trigger on another table to perform an insert for that user?

I'm trying to implement an audit table without allowing user tampering with
the audit information.


Thanks in advance,

Miles Elam

Re: Inserts restricted to a trigger

2019-06-18 Thread Miles Elam

That seems straightforward. Unfortunately I also want to know the user/role
that performed the operation. If I use SECURITY DEFINER, I get the
superuser account back from CURRENT_USER, not the actual user.

Sorry, should have included that in the original email. How do I restrict
access while still retaining info about the current user/role?

On Mon, Jun 17, 2019 at 5:47 PM  wrote:

> Adrian Klaver wrote:
>
> > On 6/17/19 4:54 PM, Miles Elam wrote:
> > > Is there are way to restrict direct access to a table for inserts but
> > > allow a trigger on another table to perform an insert for that user?
> > >
> > > I'm trying to implement an audit table without allowing user tampering
> > > with the audit information.
> >
> > Would the below not work?:
> > CREATE the table as superuser or other privileged user
> > Have trigger function run as above user(use SECURITY DEFINER)
>
> and make sure not to give any other users insert/update/delete
> permissions on the audit table.
>
> > > Thanks in advance,
> > >
> > > Miles Elam
> >
> > --
> > Adrian Klaver
> > adrian.kla...@aklaver.com
>
>
>

Re: Inserts restricted to a trigger

2019-06-18 Thread Miles Elam

Thanks for the suggestion. Unfortunately we only have a single login role
(it's a web app) and then we SET ROLE according to the contents of a JSON
Web Token. So we end up with SESSION_USER as the logged in user and the
active role as CURRENT_USER.

It may be that we're just stuck with a gap and need to just try and keep
track of our mutation points, such as limit what is accessible through REST
or GraphQL, and there is no way to fundamentally lock this down in
Postgres. I was checking the mailing list to see if I'd missed anything.


On Tue, Jun 18, 2019 at 9:47 AM Torsten Förtsch 
wrote:

> Have you tried session_user?
>
> create function xx() returns table (cur text, sess text)
> security definer language sql as $$
> select current_user::text, session_user::text;
> $$;
>
> Then log in as different user and:
>
> => select (xx()).*;
>cur| sess
> --+---
>  postgres | write
>
>
> On Tue, Jun 18, 2019 at 6:30 PM Miles Elam 
> wrote:
>
>> That seems straightforward. Unfortunately I also want to know the
>> user/role that performed the operation. If I use SECURITY DEFINER, I get
>> the superuser account back from CURRENT_USER, not the actual user.
>>
>> Sorry, should have included that in the original email. How do I restrict
>> access while still retaining info about the current user/role?
>>
>>
>> On Mon, Jun 17, 2019 at 5:47 PM  wrote:
>>
>>> Adrian Klaver wrote:
>>>
>>> > On 6/17/19 4:54 PM, Miles Elam wrote:
>>> > > Is there are way to restrict direct access to a table for inserts but
>>> > > allow a trigger on another table to perform an insert for that user?
>>> > >
>>> > > I'm trying to implement an audit table without allowing user
>>> tampering
>>> > > with the audit information.
>>> >
>>> > Would the below not work?:
>>> > CREATE the table as superuser or other privileged user
>>> > Have trigger function run as above user(use SECURITY DEFINER)
>>>
>>> and make sure not to give any other users insert/update/delete
>>> permissions on the audit table.
>>>
>>> > > Thanks in advance,
>>> > >
>>> > > Miles Elam
>>> >
>>> > --
>>> > Adrian Klaver
>>> > adrian.kla...@aklaver.com
>>>
>>>
>>>

Re: Inserts restricted to a trigger

2019-06-19 Thread Miles Elam

Hi Adrian, thanks for responding.

How would I restrict access to the SECURITY DEFINER function? If it can be
called by the trigger, it can be called by the user as well I would think.
Same issue as access to the table itself only now with a superuser
intermediary, right?


On Tue, Jun 18, 2019 at 6:20 PM Adrian Klaver 
wrote:

> On 6/18/19 10:14 AM, Miles Elam wrote:
> > Thanks for the suggestion. Unfortunately we only have a single login
> > role (it's a web app) and then we SET ROLE according to the contents of
> > a JSON Web Token. So we end up with SESSION_USER as the logged in user
> > and the active role as CURRENT_USER.
>
> Have not tried it but nested function?:
>
> 1) Outer function runs as normal user and grabs the CURRENT_USER. This
> is passed into 2)
>
> 2) Audit function that runs with SECURITY DEFINER.
>
> Other option is to record the CURRENT_USER in the table the trigger is
> on and just pass that to the audit function.
>
> >
> > It may be that we're just stuck with a gap and need to just try and keep
> > track of our mutation points, such as limit what is accessible through
> > REST or GraphQL, and there is no way to fundamentally lock this down in
> > Postgres. I was checking the mailing list to see if I'd missed anything.
> >
> >
> > On Tue, Jun 18, 2019 at 9:47 AM Torsten Förtsch  > <mailto:tfoertsch...@gmail.com>> wrote:
> >
> > Have you tried session_user?
> >
> > create function xx() returns table (cur text, sess text)
> > security definer language sql as $$
> >  select current_user::text, session_user::text;
> > $$;
> >
> > Then log in as different user and:
> >
> > => select (xx()).*;
> > cur| sess
> > --+---
> >   postgres | write
> >
> >
> > On Tue, Jun 18, 2019 at 6:30 PM Miles Elam
> > mailto:miles.e...@productops.com>>
> wrote:
> >
> > That seems straightforward. Unfortunately I also want to know
> > the user/role that performed the operation. If I use SECURITY
> > DEFINER, I get the superuser account back from CURRENT_USER, not
> > the actual user.
> >
> > Sorry, should have included that in the original email. How do I
> > restrict access while still retaining info about the current
> > user/role?
> >
> >
> > On Mon, Jun 17, 2019 at 5:47 PM  > <mailto:r...@raf.org>> wrote:
> >
> > Adrian Klaver wrote:
> >
> >  > On 6/17/19 4:54 PM, Miles Elam wrote:
> >  > > Is there are way to restrict direct access to a table
> > for inserts but
> >  > > allow a trigger on another table to perform an insert
> > for that user?
> >  > >
> >  > > I'm trying to implement an audit table without allowing
> > user tampering
> >  > > with the audit information.
> >  >
> >  > Would the below not work?:
> >  > CREATE the table as superuser or other privileged user
> >  > Have trigger function run as above user(use SECURITY
> DEFINER)
> >
> > and make sure not to give any other users
> insert/update/delete
> > permissions on the audit table.
> >
> >  > > Thanks in advance,
> >  > >
> >  > > Miles Elam
> >  >
> >  > --
> >  > Adrian Klaver
> >  > adrian.kla...@aklaver.com  adrian.kla...@aklaver.com>
> >
> >
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

Re: Inserts restricted to a trigger

2019-06-20 Thread Miles Elam

Thanks for the reply, Adrian.

1. The audit tables (plural) are recording the historical data for a table,
ie., updates and deletes. All the same data as found in a given table along
with the role that performed the operation, the transaction id, and the
time range where this data was in active use.

2. Only thorough a web UI via an API service.

3. Should be limited to web app, but the data scientists may need direct
access in the near future.

PGAudit does not cover our use case. We are making a temporal table system
since PostgreSQL does not support one natively. For example: "What would
this query have returned yesterday at 4:27pm PT?" Access is as expected for
inserts but updates and deletes are logged to history tables. We cannot use
3rd party extensions because we are on AWS managed databases. We are
following the model detailed here (
https://wiki.postgresql.org/wiki/SQL2011Temporal) with some modifications.

Given the model listed in the link, it's not clear how we can prevent user
tampering with history inserts. (History updates and deletes are already
REVOKE restricted.) Since we are going through an API server via REST
and/or GraphQL, the possibility is very unlikely, but we would prefer a
defense in depth approach in case an oversight somehow allowed arbitrary
query access to the database with the web user. For the most part, we're
fairly well locked down, but I just can't quite see how to restrict
aforementioned query access from inserting to the history in an ad-hoc
manner rather than the trigger-based predetermined insert pattern.

On Thu, Jun 20, 2019 at 8:01 AM Adrian Klaver 
wrote:

> On 6/19/19 3:07 PM, Miles Elam wrote:
> > Hi Adrian, thanks for responding.
> >
> > How would I restrict access to the SECURITY DEFINER function? If it can
> > be called by the trigger, it can be called by the user as well I would
> > think. Same issue as access to the table itself only now with a
> > superuser intermediary, right?
> >
>
> Should have also mentioned, if you are not adverse to a third party
> solution there is PGAudit:
>
> https://www.pgaudit.org/
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

Re: A Good Beginner's Book

2023-08-12 Thread Miles Elam

> On 8/12/23 9:02 a.m., Amn Ojee Uw wrote:
>
> Is there a book to be recommended for PostgreSQL beginners?

If you are new to relational databases and SQL in general, I recommend
the basics of SQL (not Postgres-specific) to start off. The SQL Murder
Mystery is a good first experience.

https://mystery.knightlab.com/

If you are not an SQL beginner but looking more for Postgres-specific
knowledge, it depends on what area you're looking for: administration,
performance, customization, etc. Here is the list provided from the
Postgres web site with a range of areas and experience level for
study. Some can be downloaded for free while others would be
purchased.

https://www.postgresql.org/docs/books/

I myself started with "PostgreSQL: Introduction and Concepts" many
years ago and loved it. It's pretty long in the tooth now (20+ years
old?!), but still good for SQL beginners in general and available free
online. (Thanks again, Bruce Momjian!)

The first one on the web page looks pretty good for SQL beginners too.
"POSTGRES: The First Experience"

Best wishes on your journey!

Check constraint failure messages

2021-04-06 Thread Miles Elam

I've got a domain that validates email addresses. When inserting a bunch of
entries I simply get the error message

ERROR: value for domain po.email violates check constraint "email_check"
SQL state: 23514


When inserting 1000+ entries in a batch, finding the exact entry with the
problem is noticeably harder than with other error types. For example when
a column should be a uuid but you pass in 'Mary had a little lamb', the
error message tells you what the invalid value is as well as the column
name you're trying to put it into.

Are there any quick hacks floating around out there to solve or at least
mitigate this?

- Miles

Re: Check constraint failure messages

2021-04-06 Thread Miles Elam

On Tue, Apr 6, 2021 at 1:03 PM Ron  wrote:

> On 4/6/21 2:40 PM, Miles Elam wrote:
>
> I've got a domain that validates email addresses. When inserting a bunch
> of entries I simply get the error message
>
> ERROR: value for domain po.email violates check constraint "email_check"
> SQL state: 23514
>
>
> When inserting 1000+ entries in a batch, finding the exact entry with the
> problem is noticeably harder than with other error types. For example when
> a column should be a uuid but you pass in 'Mary had a little lamb', the
> error message tells you what the invalid value is as well as the column
> name you're trying to put it into.
>
> Are there any quick hacks floating around out there to solve or at least
> mitigate this?
>
>
> Is it a deferred constraint?
>

Plain ole domain CHECK constraint.

CREATE DOMAIN po.email AS varchar
  CHECK (VALUE IS NULL OR (po.length_in(VALUE, 1, 254) AND NOT
po.email_expanded(VALUE) IS NULL));

where "po" is another schema, po.length_in(...) is an IMMUTABLE range
check, and po.email_expanded(...) is a function returning a record. Same
behavior happens if I remove the functions and define the check constraint
in place. The only info returned in a bulk insert is the name of the
violated check constraint, aka email_check.

An example table using it is defined as follows

CREATE TABLE IF NOT EXISTS profile (
id uuid PRIMARY KEY,
email po.email NOT NULL,

manager_email po.email NOT NULL

);

Nothing fancy.

INSERT INTO profile (id, email, manager_email) VALUES
  (gen_random_uuid(), 'us...@example.com', 'manag...@example.com'),
  (gen_random_uuid(), 'us...@example.com', 'manag...@example.com'),

 (gen_random_uuid(), 'user3&example.com', 'manag...@example.com
'),
 (gen_random_uuid(), 'us...@example.com', 'manager4.example.com
'),
 (gen_random_uuid(), 'us...@example.com', 'manag...@example.com
');

Inserting this batch will tell me that there was an error and that it was
"email_check" that failed, but no indication that the 3rd user's email
address or the 4th user's manager email was the problem, forcing a bisect
operation among 1,000+ entries to find the first error, then bisect from
there to find the second error if any, and repeat until no more constraint
errors.

- Miles

Re: Check constraint failure messages

2021-04-06 Thread Miles Elam

On Tue, Apr 6, 2021 at 1:59 PM Ron  wrote:

>
> The blunt force answer is to not use bulk inserts.  Try COPY; it's good at
> saying which record throws an error.
>

Sadly, this is a cloud-managed database without direct access to 5432 from
outside the VPC and bastian instances are frowned upon by our security
folks. Guess I'm stuck with bisecting. Thanks for the confirmation.

Re: Check constraint failure messages

2021-04-08 Thread Miles Elam

Following up in case someone else runs into this problem. I changed the
function the CHECK statement called to raise a warning. Not perfect, but
noticeably better. I don't get the column that failed but I do get what bad
input gummed things up.

CREATE OR REPLACE FUNCTION po.confirm(p_val anyelement, p_validated boolean)
RETURNS boolean LANGUAGE plpgsql STRICT IMMUTABLE PARALLEL SAFE AS $$
BEGIN
  IF NOT p_validated THEN
RAISE WARNING 'Invalid value: %', p_val;
  END IF;
  RETURN p_validated;
END;
$$;
COMMENT ON FUNCTION po.confirm(anyelement,boolean) IS
'Raises a warning when a condition is false; useful for outputting CHECK
constraint error values.';

CREATE DOMAIN po.email AS varchar
  CHECK (po.confirm(VALUE, VALUE IS NULL OR NOT po.email_expanded(VALUE) IS
NULL));


Code is not seamless or DRY, but manageable.

- Miles

On Tue, Apr 6, 2021 at 2:18 PM Miles Elam  wrote:

> On Tue, Apr 6, 2021 at 1:59 PM Ron  wrote:
>
>>
>> The blunt force answer is to not use bulk inserts.  Try COPY; it's good
>> at saying which record throws an error.
>>
>
> Sadly, this is a cloud-managed database without direct access to 5432 from
> outside the VPC and bastian instances are frowned upon by our security
> folks. Guess I'm stuck with bisecting. Thanks for the confirmation.
>
>

Re: Logical replication from Rds into on-premise

2021-07-27 Thread Miles Elam

On Tue, Jul 27, 2021 at 4:38 AM Dave Cramer 
wrote:

>
> Does RDS allow logical replication
>

Yes, it does. I believe it was patched for v9.6, but v10 and above support
it out of the box, and the RDS version of PostgreSQL shares that support. I
have used it with v10 and v11, and it works exactly like the standard PG
docs say it should.

- Miles

Re: [E] Regexp_replace bug / does not terminate on long strings

2021-08-20 Thread Miles Elam

On Fri, Aug 20, 2021 at 12:32 PM Mark Dilger 
wrote:

>
> The following queries take radically different time to run:
>

Unbounded ranges seem like a problem. Seems worth trying a range from 1 to
N where you play around with N to find your optimum
performance/functionality tradeoff. {1,20} is like '+' but clamps at 20.

select regexp_replace(
  repeat('someone,one,one,one,one,one,one,', 60),
  '(?<=^|,)([^,]+)(?:,\1){1,20}(?=$|,)',
  '\1', -- replacement
  'g'  -- apply globally (all matches)
);
- Miles Elam

Idempotent DDL Updates

2021-08-27 Thread Miles Elam

What is the general consensus within the community on idempotent DDL
scripts, ie. consistent usage of IF EXISTS/IF NOT EXISTS/OR REPLACE for SQL
init files that get checked into source control?

My experience has been that it's much easier to manage than an ever-growing
set of migration files, but I'm just a data point of one. Obviously for
other engines that don't support transactional DDL, it's a non-starter,
which leads me toward the notion that its lack of obvious popularity is due
to its limited cross-engine viability. But PG does have transaction DDL,
preventing incomplete DDL updates. However this may just be my personal
bias talking. Yet I cannot seem to discount the advantages over
ever-increasing volumes of Flyway-style migration files & directories:
 * being able to do diffs of changes in source control
 * limiting the size of the total SQL over time relative to what's actually
in the DB structure
 * much more easily determining the most current version of a
function/procedure (this bit me hard in the past with dozens of migration
files)
 * the ability to prune old changes that no longer apply to any deployed
version of the database
 * treating database structure as code

The drawbacks I've run across are those areas where the EXISTS/REPLACE
constructs aren't implemented like roles management, domains, constraints,
etc. However those cases seem to be handled with only minor increases in
complexity with judicious use of inline plpgsql.

In others' opinions, has DDL idempotency been viable for maintenance of PG
databases fo you in production?

- Miles Elam

Re: Idempotent DDL Updates

2021-08-30 Thread Miles Elam

On Fri, Aug 27, 2021 at 7:14 PM Julien Rouhaud  wrote:

>
> Note that the IF EXISTS / IF NOT EXISTS are *not* idempotent.  If you
> need to write idempotent schema update scripts, you need to query the
> catalogs to check if the specific change you want to apply has already
> been applied or not.
>
Poor choice of words. You're absolutely right. The goal is for the script
to be idempotent, not that individual statements like that are idempotent.

For example, adding ADD COLUMN foo IF NOT EXISTS and DROP COLUMN… to the
script in addition to CREATE TABLE IF NOT EXISTS statements so that the end
result is always the same column definitions no matter how often the script
is run.

Eventually the individual ADD and DROP COLUMN statements can be removed
once all databases are up to date.

Not sure that querying the catalogs is strictly necessary though… Could you
say more?

This removes the ability to have "down" migration scripts, but I'll be
honest, I've never actually used a "down" script in production. If the "up"
script failed for some reason, the validity of the logic in the "down"
script is immediately suspect. It's always a new "up" script to fix the
problem. That's leaving aside the issue of "down" scripts not getting
anywhere near the same level of scrutiny and testing as "up" migration
scripts get.

- Miles

Logical Replication to Older Version

2021-09-08 Thread Miles Elam

We recently upgraded from v9.6 to v13 but are seeing some problems. It's on
AWS Aurora, so I won't ask to diagnose a heavily altered version. We're
hoping for better results with v12, but when we set up logical replication
from v13 to v12, while the initial data snapshot copies the data,
pg_stat_replication remains empty.

There are no v13-specific features or idioms in use in the databases. As a
sanity check, should it be possible to set up logical replication from a
newer version of PostgreSQL to an older version? Has this scenario been
tested? Has the logical replication protocol changed between versions?


Thanks in advance,

Miles Elam

Re: Logical Replication to Older Version

2021-09-08 Thread Miles Elam

Follow up to this. Turns out we had a table without a primary key which
halted the ongoing replication.

Reviewing this document in detail now.
https://pgdash.io/blog/postgres-replication-gotchas.html

- Miles Elam

Re: The tragedy of SQL

2021-09-18 Thread Miles Elam

On Fri, Sep 17, 2021 at 1:13 PM Benedict Holland <
benedict.m.holl...@gmail.com> wrote:

> I don't get why there are so many programming languages out there. C is
> virtually perfect.
>

Oh my. Even its creators didn't believe this, and that was decades ago. Use
after free. Dangling pointers. No array bounds detection. The YOLO that is
(void *).

Decades of CERT advisories tell a different story.

Don't get me wrong, I like C. It's truly a marvel that such a simple
language could be so powerful and flexible. But virtually perfect?

PostgreSQL is lauded as a great piece of C software—not just because it has
a wonderful feature set, but because large scale C development is truly a
difficult thing to get right even with the best, most dedicated developers.
See: OpenSSL.

Buffer overflows, all too common and often hard to find, are largely
impossible in some languages (Rust) and far less likely in others (Zig),
and yet they account for far too many exploits out there leading to massive
data leaks and ransomware.

We develop new languages precisely because C is not perfect. (This is not
to say any of the newer languages are, just that C should not be the end of
our search, especially in the domain-specific cases.)

Re: Determining if a table really changed in a trigger

2021-10-26 Thread Miles Elam

On Tue, Oct 26, 2021 at 6:36 AM Marcos Pegoraro  wrote:

>
>> Maybe converting new and old records to json and text
> PERFORM * FROM (select ID, row_to_json(O.*)::text Old_Values,
> row_to_json(N.*)::text New_Values from old_table o full outer join
> new_table N using(ID) where Old_Values is distinct from New_Values) as
> differences LIMIT 1;
>

I have done this, but with casting to jsonb, which already supports the
equality operator. Saved an extra cast to text. Also allows for easily
excluding a column or two before comparing.

I never noticed a performance problem, but I was using this technique to
see if a row had substantively changed, and if so, to cancel the write and
subsequent trigger invocations by returning NULL in the before-trigger.

The trade off of conversions to jsonb by not writing and performing
subsequent processing/writes due to later triggers was an obvious win for
me, but your mileage may vary depending on your use case.

Re: How can a Postgres SQL script be automatically run when a new table turns up?

2022-01-14 Thread Miles Elam

On Thu, Jan 13, 2022 at 4:32 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Thu, Jan 13, 2022 at 8:55 AM Shaozhong SHI 
> wrote:
>
>>
>>  How to create an event trigger in Postgres? When a user finished loading
>> a new table on to it, the trigger can start off an script 10 minutes after
>> the event?
>>
>>
> You could have a script execute every minute (say via cron)
>

Vendor-specific, but on AWS, you can invoke a lambda from RDS or Aurora.

1. An event trigger function runs after CREATE TABLE and invokes a lambda.
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL-Lambda.html

2. The lambda calls StartExecution on a step function.
https://docs.aws.amazon.com/step-functions/latest/apireference/API_StartExecution.html

3. In the first step of the step functions, call wait for 10 minutes.
https://docs.aws.amazon.com/step-functions/latest/dg/amazon-states-language-wait-state.html

4. In the second step, have the lambda perform whatever operation(s) you
need to do.

No polling required, but you'd have to be running in a managed service in
the Amazon Cloud.

Similarly, if you are self-hosting and willing/able to write some C code or
run some pl/pythonu, you could create an extension/function that performs
this logic.

Or again if you are self-managed and go the cron route as suggested by
David Johnson, there's the extension pg_cron.

– Miles Elam

44 matches

Mail list logo