Re: Would PostgreSQL 16 native transparent data encryption support database level encryption?

2023-05-23 Thread Stephen Frost
Greetings,

There seems to be a bit of confusion here, so I'll try to clear it up.

* Tony Xu (tony...@rubrik.com) wrote:
> Thanks all for the discussions. It sounds like there are different
> questions to clear before we can get to a conclusion on if per-database KEK
> is possible or not.

It's not possible in PG's design today to ensure that all user data is
segregated by database, due to the fact that there's a shared WAL which
includes data from all databases running under a given postmaster
(typically considered a 'cluster' or 'instance').  There are also shared
catalogs which aren't per-database to begin with (as for if that's
considered user data or not depends on your definition of user data..
are users/roles considered user data, or not?).

> First question - do we, as a community, see the value of the proposal and
> do we believe that value is big enough for us to make any necessary changes
> to PostgreSQL?  Another way to ask is, do we want/need to keep the parity
> with other database solutions that already have such ability?

TDE is clearly valuable to this community and multiple organizations
have implemented it already and there's an ongoing effort to bring TDE
into core, with patches already posted and many of these design
decisions made.  Given the point made above, it seems very unlikely that
the basics of the design are going to change, at least not unless PG
changes in some fundamental way when it comes to WAL and how shared
catalogs and databases work (note: I'm not aware of anyone currently
working on this..).

> If the answer to the first question is no, then I will stop here.
> However, if the answer is yes or "it depends", then the second question is
> on how - It seems that per-cluster KEK can be a a good alternative to
> achieve the same as per-db KEK, however, there are still some shared area
> between clusters. Is it possible to further split this shared area? Are
> there more shared memory/storage between clusters?

There isn't anything shared between PG clusters, to be clear.  A
"cluster" in PG is all data and processes running under one
"postmaster", there is nothing shared between two PG clusters.
Splitting up the WAL to be per-database has been contemplated before but
there's no active work trying to achieve that and there's a lot of
complications associated with trying to do that (such as- what to do
about the shared catalogs...).

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: How to speed up product code and subcode match

2023-05-23 Thread Andrus

Hi!

I ran

analyze toode;

create index vordlusajuhinnak_toode_pattern_idx on 
vordlusajuhinnak(toode bpchar_pattern_ops);


create index vordlusajuhinnak_toode_idx on vordlusajuhinnak(toode);
analyze vordlusajuhinnak;

Select runs now more than one hour. Output from explain

explain create table peatoode as
select toode.toode , n2, n3, n4
from toode, vordlusajuhinnak
WHERE toode.toode=vordlusajuhinnak.toode OR  toode.toode LIKE 
vordlusajuhinnak.toode||'/%'


"Gather  (cost=1000.55..443361894.55 rows=143828910 width=78)"
"  Workers Planned: 2"
"  ->  Nested Loop  (cost=0.55..428978003.55 rows=59928712 width=78)"
"    Join Filter: ((toode.toode = (vordlusajuhinnak.toode)::bpchar) 
OR (toode.toode ~~ ((vordlusajuhinnak.toode)::text || '/%'::text)))"
"    ->  Parallel Index Only Scan using toode_pkey on toode  
(cost=0.55..95017.93 rows=303869 width=60)"
"    ->  Seq Scan on vordlusajuhinnak  (cost=0.00..721.33 rows=39433 
width=32)"


with

Set enable_nestloop to off;

explain output is:

"Gather  (cost=1001000.55..10443361906.55 rows=143828910 width=78)"
"  Workers Planned: 2"
"  ->  Nested Loop  (cost=100.55..10428978015.55 rows=59928712 
width=78)"
"    Join Filter: ((toode.toode = (vordlusajuhinnak.toode)::bpchar) 
OR (toode.toode ~~ ((vordlusajuhinnak.toode)::text || '/%'::text)))"
"    ->  Parallel Index Only Scan using toode_pkey on toode  
(cost=0.55..95029.93 rows=303869 width=60)"
"    ->  Seq Scan on vordlusajuhinnak  (cost=0.00..721.33 rows=39433 
width=32)"


How to speed it up?

Andrus.

23.05.2023 14:32 Bzm@g kirjutas:

Great,

However I think it is still way to slow.
Next step is to run analyze also for the other table  vordlusajuhinnak.

And make sure you have an index on vordlusajuhinnak.toode similar to 
the index on toode.toode


--
Boris



Am 23.05.2023 um 12:56 schrieb Andrus :



Hi!

I ran analyze firma2.toode and changed where clause to use like:

create table peatoode as
select toode.toode , n2, n3, n4
from toode, vordlusajuhinnak
WHERE toode.toode=vordlusajuhinnak.toode OR  toode.toode LIKE 
vordlusajuhinnak.toode||'/%'


In this case it took 37 minutes, returned 277966 rows.

Thank you for help.

Andrus.

23.05.2023 11:24 Bzm@g kirjutas:

Also your row count is way off I guess. Did you ever run analyze bigtable?

--
Boris



Am 23.05.2023 um 10:22 schrieb...@2bz.de:

Hi there,

I guess the main problem is the nested loop.

As a quick recheck what happened if you run your query Without nested loop?

This is not a solution but a quickt test

In a Session

Set enable_nestedloop = off;
Explain Select your query ;
--
Boris



Am 23.05.2023 um 08:53 schrieb Andrus:

Hi!

Price list of main products vordlusajuhinnak contains 3 prices for product 
(column toode) and has 39433 products:

create table vordlusajuhinnak( toode varchar(60), n2 numeric(8,2), n3 
numeric(8,2), n4 numeric(8,2) );

toode column in unique, may be primary key in table and contains upper case 
letters, digits and - characters.

product table (toode) contains 733021 products:

CREATE TABLE toode (
grupp character(1),
toode character(60) primary key,
... lot of other columns
  );

Both tables have pattern indexes to speed up queries:

CREATE INDEX toode_toode_pattern_idx
ON toode (toode bpchar_pattern_ops ASC NULLS LAST);
-- This index is probably not used, should removed:
CREATE INDEX vordlusajuhinnak_toode_pattern_idx ON vordlusajuhinnak(toode 
bpchar_pattern_ops);

Product table as both main products and subproducts with sizes. Size is 
separated by main product code always by / character:


SHOE1-BLACK
SHOE1-BLACK/38
SHOE1-BLACK/41
SHOE1-BLACK/42
SHOE1-BLACK/43
SHOE2/XXL
SHOE2/L

Product codes contain upper characers only in this table also.

Trying to get prices for all products using

create table peatoode as
select toode.toode , n2, n3, n4
from toode, vordlusajuhinnak
where  toode.toode between vordlusajuhinnak.toode and 
vordlusajuhinnak.toode||'/z'

Takes 4.65 hours. How to speed this query up?
Output from explain:


"Nested Loop  (cost=0.55..272273178.12 rows=3202240012 width=78)""
->  Seq Scan on vordlusajuhinnak  (cost=0.00..721.33 rows=39433 width=32)" "  
->  Index Only Scan using toode_pkey on toode
(cost=0.55..6092.62 rows=81207 width=60)" "Index Cond: (toode

= (vordlusajuhinnak.toode)::bpchar)" "Filter: ((toode)::text <= 
((vordlusajuhinnak.toode)::text || '/z'::text))"

Using  PostgreSQL 13.2, compiled by Visual C++ build 1900, 64-bit in Windows 
server and psqlODBC driver.
Upgrading Postgres is possible, if this helps.

Tried also using like:

WHERE toode.toode=vordlusajuhinnak.toode OR
  toode.toode LIKE vordlusajuhinnak.toode||'/%'

Posted also in

https://stackoverflow.com/questions/76311957/how-to-match-character-columns-with-separated-subcolumns

Andrus.




Re: How to speed up product code and subcode match

2023-05-23 Thread Tom Lane
Andrus  writes:
> Product table as both main products and subproducts with sizes. Size is 
> separated by main product code always by / character:

>      SHOE1-BLACK
>      SHOE1-BLACK/38
>      SHOE1-BLACK/41
>      SHOE1-BLACK/42
>      SHOE1-BLACK/43
>      SHOE2/XXL
>      SHOE2/L

You could probably have devised a worse data representation if
you really tried, but it would have taken some effort.  Separate
the product code and size into two columns --- if there's somebody
who really wants to see them in the above format, give them a
view or generated column.  Then instead of the impossible-to-optimize
queries you showed, you could do something like

select toode.toode , n2, n3, n4
from toode, vordlusajuhinnak
where toode.toode = vordlusajuhinnak.toode;

Anytime you're trying to join two tables on something that isn't
a plain equality condition (or ANDed conditions), you're in for
a world of hurt.

regards, tom lane




Re: How to speed up product code and subcode match

2023-05-23 Thread Andrus

Hi!


Separate
the product code and size into two columns --- if there's somebody
who really wants to see them in the above format, give them a
view or generated column.  Then instead of the impossible-to-optimize
queries you showed, you could do something like

select toode.toode , n2, n3, n4
from toode, vordlusajuhinnak
where toode.toode = vordlusajuhinnak.toode;


Can function index

create index on toode ( *split_part( toode, '/',1) *)

and query

select toode.toode , n2, n3, n4

from toode, vordlusajuhinnak

where *split_part( toode.toode, '/',1) *= vordlusajuhinnak.toode;


used and keeping existing table structure? Functional index should 
produce same speed improvement as using separate column?


Andrus.



Re: 15 pg_upgrade with -j

2023-05-23 Thread Jeff Ross



On 5/22/23 5:42 PM, Tom Lane wrote:

Jeff Ross  writes:

On 5/22/23 5:24 PM, Adrian Klaver wrote:

So is the 1400G mostly in one database in the cluster?

Yes, one big database with about 80 schemas and several other smaller
databases so -j should help, right?

AFAICT from a quick look at the code, you won't get any meaningful
parallelism unless you have several large DBs and/or several large
tablespaces.  It looks like the assumption was that issuing link()
requests in parallel wouldn't help much but just swamp your disk
if they're all on the same filesystem.  Maybe that could use
rethinking, not sure.

regards, tom lane


Thanks Tom.  These are all smokingly fast SSDs so it would be 
interesting to see how well they'd hold up under that load.


Jeff

Re: 15 pg_upgrade with -j

2023-05-23 Thread Jeff Ross

On 5/22/23 5:43 PM, Adrian Klaver wrote:




From docs:

https://www.postgresql.org/docs/current/pgupgrade.html

The --jobs option allows multiple CPU cores to be used for 
copying/linking of files and to dump and restore database schemas in 
parallel; a good place to start is the maximum of the number of CPU 
cores and tablespaces. This option can dramatically reduce the time 
to upgrade a multi-database server running on a multiprocessor machine.


So is the 1400G mostly in one database in the cluster?



The full commands we are using for pg_upgrade are pretty stock:


Yes, one big database with about 80 schemas and several other smaller 
databases so -j should help, right?



As I understand it no. That the parallelism is between databases not 
within a database. Further that 'database schemas' refers to schema as 
the overall database object definitions not the namespaces known as 
schemas in the database.


Thanks Adrian.  That "restore database schemas in parallel" phrase seems 
like it would be really easy to read like we did and expect it to work 
with one database and multiple schemas.


Maybe it should be changed to "restore multiple databases in parallel" 
instead?


Jeff

Re: How to speed up product code and subcode match

2023-05-23 Thread Andrus

Hi!

Using index

create index on toode ( *split_part( toode, '/',1) *)

and query

select toode.toode , n2, n3, n4

from toode, vordlusajuhinnak

where *split_part( toode.toode, '/',1) *= vordlusajuhinnak.toode;

reduces run time to 5 minutes.

Andrus.


23.05.2023 17:26 Andrus kirjutas:


Hi!


Separate
the product code and size into two columns --- if there's somebody
who really wants to see them in the above format, give them a
view or generated column.  Then instead of the impossible-to-optimize
queries you showed, you could do something like

select toode.toode , n2, n3, n4
from toode, vordlusajuhinnak
where toode.toode = vordlusajuhinnak.toode;


Can function index

create index on toode ( *split_part( toode, '/',1) *)

and query

select toode.toode , n2, n3, n4

from toode, vordlusajuhinnak

where *split_part( toode.toode, '/',1) *= vordlusajuhinnak.toode;


used and keeping existing table structure? Functional index should 
produce same speed improvement as using separate column?


Andrus.



Re: Trying to understand a failed upgrade in AWS RDS

2023-05-23 Thread Alan Hodgson
On Sun, 2023-05-21 at 07:56 -0700, Mike Lissner wrote:
> > As far as I know it's impossible to reliably pg_upgrade a node
> > that has subscriptions and eventually resume logical
> > replication. 
> > 
> 
> 
> Should this go in the documentation somewhere? Maybe in the
> pg_upgrade notes? I still don't understand the mechanism. You also
> say that:
>  
> > It's possible to make it work with some efforts in some basic
> > configurations and / or if no changes happen on the publications
> > 
> 
> 
> But that kind of surprises me too, actually, because it seemed like
> pg_upgrade wiped out the LSN locations of the subcriber, making it
> start all over.
> 
> Upgrading a subscriber seems like something that could/should work,
> so it should be documented if pg_upgrade is incompatible with
> maintaining a subscription, shouldn't it? 

The docs are strangely silent on this. AFAIK pg_upgrade on either the
publisher or subscriber breaks logical replication, which does make
sense since pg_upgrade basically makes a new database cluster as it
runs.

There is a way to manually set the LSN position of an enabled=false
replication slot, but I've failed to make that work right in tests so
far.


Questions on PostgreSQL 13.4 Installer for Windows

2023-05-23 Thread Joel Rabinovitch
Hi,

Our developers do their work using a local PostgreSQL database server and local 
schemas.

This is typically done by installing the Windows version of the PostgreSQL 
installer from EnterpriseDB. Currently, version 13.4 (64-bit) of the installer 
is being used.

Recently, our organization has installed BeyondTrust Privilege Management 
software on our developer's PCs. The software forces the developer to 
authenticate when doing administrative tasks such as installing software and 
adjusting environment variables.

When a developer attempts to install the PostgreSQL 13.4 database server, the 
developer is authenticated and the software runs as an administrator.

The issue occurs when the installer initializes the database cluster. When it 
attempts to do this,  it requires that it be done as a non-administrator.

This causes the installation process to fail. We have a manual procedure to 
follow when this occurs, which involves manually initializing the database 
cluster as a non-administrator, installing the Windows service as an 
administrator and reinstalling some of the PostgreSQL components.

The process works, but it is quite cumbersome.

Has anyone had a similar issue when using the PostgreSQL installer for Windows 
when the BeyondTrust Privilege Management (or similar software) is installed on 
their PC. If so, what have you done to resolve this.

We have been in contact with the company that develops the BeyondTrust 
software, but they have not been able to help us correct the problem as of yet.

Thanks,

Joel


Re: 15 pg_upgrade with -j

2023-05-23 Thread Peter J. Holzer
On 2023-05-22 21:10:48 -0500, Ron wrote:
> On 5/22/23 18:42, Tom Lane wrote:
> > It looks like the assumption was that issuing link()
> > requests in parallel wouldn't help much but just swamp your disk
> > if they're all on the same filesystem.
> > Maybe that could use rethinking, not sure.
> 
> It does need rethinking in the era of VMs and SANs. /var/lib/pgsql/15 is
> going to be on a different LUN from /var/lib/pgsql/9.6

You can't hardlink between different file systems.

Even if you could assign single directories to specific LUNs (does any
file system allow this?) this would at best spread the updates across
two LUNs (the inodes would presumable stay on the source LUN and the
target directory would be on the target LUN).

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: 15 pg_upgrade with -j

2023-05-23 Thread Ron

On 5/23/23 12:19, Peter J. Holzer wrote:

On 2023-05-22 21:10:48 -0500, Ron wrote:

On 5/22/23 18:42, Tom Lane wrote:

It looks like the assumption was that issuing link()
requests in parallel wouldn't help much but just swamp your disk
if they're all on the same filesystem.
Maybe that could use rethinking, not sure.

It does need rethinking in the era of VMs and SANs. /var/lib/pgsql/15 is
going to be on a different LUN from /var/lib/pgsql/9.6

You can't hardlink between different file systems.


We'd never hardlink.  Eliminates the ability to return to the old system if 
something goes wrong.


--
Born in Arizona, moved to Babylonia.




Re: 15 pg_upgrade with -j

2023-05-23 Thread Christoph Moench-Tegeder
## Ron (ronljohnso...@gmail.com):

> We'd never hardlink.  Eliminates the ability to return to the old
> system if something goes wrong.

That's why you get yourself a recent XFS and use clone mode (still
sticks you to the same filesystem, but gets you up running much
faster).

Regards,
Christoph

-- 
Spare Space




Re: 15 pg_upgrade with -j

2023-05-23 Thread Peter J. Holzer
On 2023-05-23 13:17:24 -0500, Ron wrote:
> On 5/23/23 12:19, Peter J. Holzer wrote:
> > On 2023-05-22 21:10:48 -0500, Ron wrote:
> > > On 5/22/23 18:42, Tom Lane wrote:
> > > > It looks like the assumption was that issuing link()
  ^^
> > > > requests in parallel wouldn't help much but just swamp your disk
> > > > if they're all on the same filesystem.
> > > > Maybe that could use rethinking, not sure.
> > > It does need rethinking in the era of VMs and SANs. /var/lib/pgsql/15 is
> > > going to be on a different LUN from /var/lib/pgsql/9.6
> > You can't hardlink between different file systems.
> 
> We'd never hardlink.

But that was what Jeff and Tom were talking about. If you are changing
the subject you should at least make it explicit.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: 15 pg_upgrade with -j

2023-05-23 Thread Ron

On 5/23/23 13:58, Christoph Moench-Tegeder wrote:

## Ron (ronljohnso...@gmail.com):


We'd never hardlink.  Eliminates the ability to return to the old
system if something goes wrong.

That's why you get yourself a recent XFS and use clone mode (still
sticks you to the same filesystem, but gets you up running much
faster).


Sadly, our 9.6 systems (the ones we /really/ need to get off of) are RHEL 
6.10 and ext4.


--
Born in Arizona, moved to Babylonia.