Multi-master replication

2021-08-13 Thread Zahir Lalani
Confidential

Hello all

Does anyone on the list have production experience of using PG in a 
multi-master setup - if so using which tools?

Thx

Zahir



Re: Partitioning a table by integer value (preferably in place)

2021-08-13 Thread Pól Ua Laoínecháin
Hi Vijay, and thanks for replying,

>> I have a 400GB joining table (one SMALLINT and the other INTEGER -
>> Primary Keys on other tables) with 1000 fields on one side and 10M on
>> the other, so 10,000M (or 10Bn) records all told.


> My queries:

>>   results with the existing setup? Does it look problematic?


> How would your table grow on either side of the join?

In this case uniformly! 1 -> 10

> Append only, static data or too frequently updated etc, or dropped 
> periodically,

Append only in this case - not updated nor dropped.

> so that delete based bloating can be skipped completely.

It can be skipped!

> How distributed is the data based on smallint keys, equally or unequally.

Totally uniform - see my own answer to my question - if it wasn't
uniform, I might have considered RANGE based partitioning?

> What kind of queries would be run and results returned ? Oltp or olap like ? 
> Quick queries with few rows retuned or heavy queries with lot of rows 
> returned.

Pretty much OLAP like - summary queries. Point queries return in
sub-millisecond range when based on PK!

> Partitioning has been ever improving, so the best option if possible would be 
> to use the latest pg version is possible,.
> Also is there any scope of normalisation of that table, I mean I know 
> theoretically it is possible, but I have not seen any design with that wide 
> table( of 1000 cols), so would be good to know.

> Just asking, maybe partitioning would the best option but wanting to know/see 
> the benefit pre and post partitioning.


Thanks again for your questions - they gave me pause for thought and I
will try to apply them in future partitioning scenarios. (Unfortunatly
:-) ) there is no magic number of partitions for, say, a given size of
table - otherwise it would be the default and would be done
automatically!

Rgs,

Pól...


> Vijay




Re: Partitioning a table by integer value (preferably in place)

2021-08-13 Thread Pól Ua Laoínecháin
Dear Ron,

> > Just a quick follow-up - could I script the creation of 1000
> > partitions using bash

> Sure.  That's what scripting languages are for.

Thank you so much for your helpful and expansive answer. No wonder
everyone talks about how friendly and welcoming the PostgreSQL
community is!

Your friend always,

Pól...




Re: Partitioning a table by integer value (preferably in place)

2021-08-13 Thread Vijaykumar Jain
On Fri, 13 Aug 2021 at 14:13, Pól Ua Laoínecháin  wrote:

> Thanks again for your questions - they gave me pause for thought and I
> will try to apply them in future partitioning scenarios. (Unfortunatly
> :-) ) there is no magic number of partitions for, say, a given size of
> table - otherwise it would be the default and would be done
> automatically!
>

no worries, it seems you have  a fair idea how things would grow. so
partitioning would be of help.
My biggest worry around partitioning is around moving data around
partitions/ rebalancing partitions.
It requires making use of triggers, syncing data and then attach/remove
partitions if it is simple as
list/range and use binary split for large partitions, to avoid large row
movements.

But when it comes to hash, it requires rebuilding the entire setup either
via triggers if there is significant space on the db,
or logically replicating to another server
with a new hash scheme. because i do not have lot of exp around
sharding/partitioning,
nor about how others dbs on how they claim automatic movements of data
between
partitions, I think i wore myself out due to poor design.

-- 
Thanks,
Vijay
Mumbai, India


Re: Multi-master replication

2021-08-13 Thread Laura Smith
‐‐‐ Original Message ‐‐‐

On Friday, August 13th, 2021 at 9:00 AM, Zahir Lalani 
 wrote:

> Confidential
>
> Hello all
>
> Does anyone on the list have production experience of using PG in a 
> multi-master setup – if so using which tools?
>
> Thx
>
> Zahir



I have sometimes considered it, but it always struck me that given there is no 
first-class support for it, multi-master in postgres is just an opportunity for 
so many foot-guns to enable to things to go terribly, terribly wrong.

Maybe you need to look at something like CockroachDB which has a degree of 
Postgres syntax compatability 
(https://www.cockroachlabs.com/blog/why-postgres/).   N.B. Not speaking from 
experience here, CockroachDB is something I've wanted to play with for a long 
time but never had the time !






Re: Getting pg_stat_database data takes significant time

2021-08-13 Thread hubert depesz lubaczewski
On Thu, Aug 12, 2021 at 06:20:23PM +0200, Magnus Hagander wrote:
> On Thu, Aug 12, 2021 at 4:38 PM hubert depesz lubaczewski
>  wrote:
> >
> > On Thu, Aug 12, 2021 at 11:32:15AM +0200, Magnus Hagander wrote:
> > > Which database are you connected to? If you just want to look at the
> > > global stats, it might help to be connected to a database that is
> > > *not* the one with all the tables in -- e.g. connect to "postgres" and
> > > query pg_stat_database looking for values on a different database? In
> > > this case it would open files for "global", for "database postgres"
> > > and "shared relations" only and skip the file for your db with many
> > > objects. I think.
> >
> > I'm connected to the db I need data about, and I need data from virtualy
> > all columns of pg_stat_database.
> Try connecting to a different database, while still querying all columns.

Damn,  this is huge difference:
=$ time psql -d postgres -c "select * from pg_stat_database where datname = 
'appdb'" -qAtX > /dev/null

real0m0.058s
user0m0.026s
sys 0m0.013s

=$ time psql -d appdb -c "select * from pg_stat_database where datname = 
'appdb'" -qAtX > /dev/null

real0m0.466s
user0m0.031s
sys 0m0.008s

Not sure if I can make the monioring software use different connection for this
one particular query, though. Will check with them. Thanks.

Best regards,

depesz





RE: Multi-master replication

2021-08-13 Thread Zahir Lalani
Confidential

Thx Laura

So here is the dilemma - everything in the cloud world tends toward horizontal 
scaling. We do that with PG using single master and multiple slaves. But we are 
write heavy and of course the load on the master is quite high. It does seem 
that multi-master has risks (everyone seems to talk about conflict resolution!) 
so I wonder whether the sensible route is simply to scale vertically on the 
master?

We are looking at pg_pool II to manage the current config - which I suppose 
will help with the connection pooling etc. But even it lends caution to the 
multi-master route for write heavy applications.

Z

> -Original Message-
> From: Laura Smith 
> Sent: 13 August 2021 10:47
> To: Zahir Lalani 
> Cc: pgsql-general@lists.postgresql.org
> Subject: Re: Multi-master replication
>
> ‐‐‐ Original Message ‐‐‐
>
> On Friday, August 13th, 2021 at 9:00 AM, Zahir Lalani
>  wrote:
>
> > Confidential
> >
> > Hello all
> >
> > Does anyone on the list have production experience of using PG in a multi-
> master setup – if so using which tools?
> >
> > Thx
> >
> > Zahir
>
>
>
> I have sometimes considered it, but it always struck me that given there is no
> first-class support for it, multi-master in postgres is just an opportunity 
> for so
> many foot-guns to enable to things to go terribly, terribly wrong.
>
> Maybe you need to look at something like CockroachDB which has a degree of
> Postgres syntax compatability (https://www.cockroachlabs.com/blog/why-
> postgres/).   N.B. Not speaking from experience here, CockroachDB is
> something I've wanted to play with for a long time but never had the time !
>



Re: Multi-master replication

2021-08-13 Thread Bruce Momjian
On Fri, Aug 13, 2021 at 11:35:29AM +, Zahir Lalani wrote:
> Confidential
> 
> Thx Laura
> 
> So here is the dilemma - everything in the cloud world tends toward 
> horizontal scaling. We do that with PG using single master and multiple 
> slaves. But we are write heavy and of course the load on the master is quite 
> high. It does seem that multi-master has risks (everyone seems to talk about 
> conflict resolution!) so I wonder whether the sensible route is simply to 
> scale vertically on the master?
> 
> We are looking at pg_pool II to manage the current config - which I suppose 
> will help with the connection pooling etc. But even it lends caution to the 
> multi-master route for write heavy applications.

You might want to read this:

https://momjian.us/main/blogs/pgblog/2018.html#December_24_2018

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Multi-master replication

2021-08-13 Thread Vijaykumar Jain
On Fri, Aug 13, 2021, 5:05 PM Zahir Lalani 
wrote:

> Confidential
>
> Thx Laura
>
> So here is the dilemma - everything in the cloud world tends toward
> horizontal scaling. We do that with PG using single master and multiple
> slaves. But we are write heavy and of course the load on the master is
> quite high. It does seem that multi-master has risks (everyone seems to
> talk about conflict resolution!) so I wonder whether the sensible route is
> simply to scale vertically on the master?
>
>
>


http://bdr-project.org/docs/stable/conflicts.html

There are claims from other databases that they offer multi master
(cockroachdb, yugabyte, etc) out of the box, but there is little clarity on
what would one has to compromise with.

You can always try sharding, postgres fdw can help you with that setup. In
my implementation,  we agreed on write locally, read globally. That way
writes could scale without compromise, but reads would be scatter gather
(fdw support for parallel scan in pg14 is promising)

Then, there is citusdb that can help you shard seamlessly and rebalance too.

https://www.citusdata.com/


https://about.gitlab.com/handbook/engineering/development/enablement/database/doc/fdw-sharding.html


https://blog.timescale.com/blog/building-a-distributed-time-series-database-on-postgresql/

As you read more, multi master is not so simple, from what I read.
I think there needs to be a good amount of thoughts for long term growth of
apps, is sharding does not scale well.

In out case, apps were broken down from monoliths to small microservi es to
deal with growth issues, and now it works well as single db as growth is
not exponential ( also small dbs, less tuning, defaults work) but then one
has to upgrade all the dbs periodically to catch up on releases.

As this is a postgresql list, I should not be posting mongo, but never the
less a good read.
https://www.mongodb.com/developer/article/active-active-application-architectures/


RE: Multi-master replication

2021-08-13 Thread Laura Smith
‐‐‐ Original Message ‐‐‐

On Friday, August 13th, 2021 at 12:35 PM, Zahir Lalani 
 wrote:

> Confidential
>
> Thx Laura
>
> So here is the dilemma - everything in the cloud world tends toward 
> horizontal scaling. We do that with PG using single master and multiple 
> slaves. But we are write heavy and of course the load on the master is quite 
> high. It does seem that multi-master has risks (everyone seems to talk about 
> conflict resolution!) so I wonder whether the sensible route is simply to 
> scale vertically on the master?
>


Scaling on the master remains a popular choice these days even in the trendy 
world of cloud and horizontal scaling.  See, for example Let's Encrypt - 
admittedly a MySQL/InnoDB infrastructure, but still master/replica - 
https://letsencrypt.org/2021/01/21/next-gen-database-servers.html

I suppose (depending on the nature of your data), you could perhaps use 
sharding, so multiple master/slaves with data (and hence load) spread across.  
Most likely you would need something like Redis at the frontend to push clients 
in to the right shard.

Conflict resolution (whether due to split brain or otherwise) is a big worry of 
mine with multi-master setups.  The Googles and Facebook's of this world seem 
to be solving the problem on the basis of having super precise time sync setup 
on their database servers and use high-resolution timestamps to act as 
tie-break arbitrator.  But of course that comes with significant expense as it 
needs PTP infrastructure.




RE: Multiple Postgres process are running in background

2021-08-13 Thread Ram Charan Kallem
Hi,

We are using Postgres 10 (Single client)and observed that there are multiple 
PostgreSQL Server process are running in background.
Why these additional process are created  or is this an expected behavior.

[cid:image001.png@01D7905A.F629F440]

Regards,
RamCharan



Re: Partitioning a table by integer value (preferably in place)

2021-08-13 Thread Michael Lewis
On Fri, Aug 13, 2021 at 3:02 AM Vijaykumar Jain <
vijaykumarjain.git...@gmail.com> wrote:

> ... use binary split for large partitions, to avoid large row movements.
>

Would you expound on this?


Re: Partitioning a table by integer value (preferably in place)

2021-08-13 Thread Vijaykumar Jain
On Fri, 13 Aug 2021 at 21:07, Michael Lewis  wrote:

> On Fri, Aug 13, 2021 at 3:02 AM Vijaykumar Jain <
> vijaykumarjain.git...@gmail.com> wrote:
>
>> ... use binary split for large partitions, to avoid large row movements.
>>
>
> Would you expound on this?
>

if we have range partitions based on year, 2018 -2019, 2019-2020, etc and
for
any one year we see uneven growth in data, we simply split that partition
by 2 and move only 6 months worth of data
to another new partition.
If we see similar issues not solved by the first split, we split each again
by 2 and only move the remainder of data to all four partitions.
So at any point of time, if the range partition which was planned earlier
does not work well for some periods, we split the partitions.
and if we see the trend growing with increased volume, the new partitions
are created qtry etc.

I have not done hash partitioning rebuild, but i think this video explains
how we could best do it.
Ways of sharding and resharding in a distributed system - YouTube



Thanks,
Vijay
Mumbai, India


Re: Multiple Postgres process are running in background

2021-08-13 Thread Adrian Klaver

On 8/13/21 3:20 AM, Ram Charan Kallem wrote:

Hi,

We are using Postgres 10 (Single client)and observed that there are 
multiple PostgreSQL Server process are running in background.


Why these additional process are created  or is this an expected behavior.


Yes. From my Linux machine:

 ps ax | grep postgres
  818 ?Ss 0:00 /usr/local/pgsql12/bin/postgres -D 
/usr/local/pgsql12_test/data
  906 ?Ss 0:00 /usr/local/pgsql12/bin/postgres -D 
/usr/local/pgsql12/data

  954 ?Ss 0:00 postgres: logger
  982 ?Ss 0:00 postgres: logger
  984 ?Ss 0:00 postgres: checkpointer
  985 ?Ss 0:00 postgres: background writer
  986 ?Ss 0:00 postgres: walwriter
  987 ?Ss 0:00 postgres: autovacuum launcher
  988 ?Ss 0:00 postgres: stats collector
  989 ?Ss 0:00 postgres: logical replication launcher
 1001 ?Ss 0:00 postgres: checkpointer
 1002 ?Ss 0:00 postgres: background writer
 1003 ?Ss 0:00 postgres: walwriter
 1004 ?Ss 0:00 postgres: autovacuum launcher
 1005 ?Ss 0:00 postgres: stats collector
 1006 ?Ss 0:00 postgres: logical replication launcher
 7204 pts/0S+ 0:00 psql -d test -U postgres
 7205 ?Ss 0:00 postgres: postgres test [local] idle
 7250 pts/1S+ 0:00 psql -d production -U postgres -p 5442
 7251 ?Ss 0:00 postgres: postgres production [local] idle
 7296 pts/2S+ 0:00 grep --color=auto postgres

There are background processes that run as well as process for each 
connection.




Regards,

RamCharan




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Multiple Postgres process are running in background

2021-08-13 Thread Ranier Vilela
Em sex., 13 de ago. de 2021 às 11:55, Ram Charan Kallem <
ramcharan.kal...@non.se.com> escreveu:

> Hi,
>
>
>
> We are using Postgres 10 (Single client)and observed that there are
> multiple PostgreSQL Server process are running in background.
>
> Why these additional process are created  or is this an expected behavior.
>
This is a normal and expected behavior.

pgsql-hackers@, is not an appropriate place to such questions.

regards,
Ranier Vilela


Re: php connection failure

2021-08-13 Thread ourdiaspora


‐‐‐ Original Message ‐‐‐

On Wednesday, August 11th, 2021 at 3:13 PM, Adrian Klaver 
 wrote:

>
> Are you using some form of container/VM on the machine for either server?
>

The machine is a chromebook, GNU/Linux via 'crouton' chroot. Is this the 
potential cause of blockage of port 5432?





Re: php connection failure

2021-08-13 Thread Adrian Klaver

On 8/13/21 1:36 PM, ourdiaspora wrote:


‐‐‐ Original Message ‐‐‐

On Wednesday, August 11th, 2021 at 3:13 PM, Adrian Klaver 
 wrote:



Are you using some form of container/VM on the machine for either server?



The machine is a chromebook, GNU/Linux via 'crouton' chroot. Is this the 
potential cause of blockage of port 5432?


I would say it is the most likely culprit.

I would submit an issue here:

https://github.com/dnschneid/crouton/issues

and see what they say.



--
Adrian Klaver
adrian.kla...@aklaver.com




RE: Multi-master replication

2021-08-13 Thread Brent Wood
BDS
Only using the free v1.0...

Sent from Yahoo Mail on Android 
 
  On Sat, 14 Aug 2021 at 2:07, Laura Smith 
wrote:   ‐‐‐ Original Message ‐‐‐

On Friday, August 13th, 2021 at 12:35 PM, Zahir Lalani 
 wrote:

> Confidential
>
> Thx Laura
>
> So here is the dilemma - everything in the cloud world tends toward 
> horizontal scaling. We do that with PG using single master and multiple 
> slaves. But we are write heavy and of course the load on the master is quite 
> high. It does seem that multi-master has risks (everyone seems to talk about 
> conflict resolution!) so I wonder whether the sensible route is simply to 
> scale vertically on the master?
>


Scaling on the master remains a popular choice these days even in the trendy 
world of cloud and horizontal scaling.  See, for example Let's Encrypt - 
admittedly a MySQL/InnoDB infrastructure, but still master/replica - 
https://letsencrypt.org/2021/01/21/next-gen-database-servers.html

I suppose (depending on the nature of your data), you could perhaps use 
sharding, so multiple master/slaves with data (and hence load) spread across.  
Most likely you would need something like Redis at the frontend to push clients 
in to the right shard.

Conflict resolution (whether due to split brain or otherwise) is a big worry of 
mine with multi-master setups.  The Googles and Facebook's of this world seem 
to be solving the problem on the basis of having super precise time sync setup 
on their database servers and use high-resolution timestamps to act as 
tie-break arbitrator.  But of course that comes with significant expense as it 
needs PTP infrastructure.