RE: I'd like to discuss scaleout at PGCon

tsunakawa.ta...@fujitsu.com Wed, 17 Jun 2020 21:39:03 -0700

Hello,

It seems you didn't include pgsql-hackers.



From: Sumanta Mukherjee <sumanta.mukher...@enterprisedb.com>
> I saw the presentation and it is great except that  it seems to be unclear of 
> both SD and SN  if the storage and the compute are being explicitly 
> separated. Separation of storage and compute would have some cost advantages 
> as per my understanding. The following two work (ref below) has some 
> information about the usefulness of this technique for scale out and so it 
> would be an interesting addition  to see if in the SN architecture that is 
> being proposed could be modified to take care of this phenomenon and reap the 
> gain.

Thanks.  Separation of compute and storage is surely to be considered.  Unlike 
the old days when the shared storage was considered to be a bottleneck with 
slow HDDs and FC-SAN, we could now expect high speed shared storage thanks to 
flash memory, NVMe-oF, and RDMA.

> 1. Philip A. Bernstein, Colin W. Reid, and Sudipto Das. 2011. Hyder - A
> Transactional Record Manager for Shared Flash. In CIDR 2011.

This is interesting.  I'll go into this.  Do you know there's any product based 
on Hyder?  OTOH, Hyder seems to require drastic changes when adopting for 
Postgres -- OCC, log-structured database, etc.  I'd like to hear how feasible 
those are.  However, its scale-out capability without the need for data or 
application partitioning appears appealing.


To explore another possibility that would have more affinity with the current 
Postgres, let me introduce our proprietary product called Symfoware.  It's not 
based on Postgres.

It has shared nothing scale-out functionality with full ACID based on 2PC, 
conventional 2PL locking and distributed deadlock resolution.  Despite being 
shared nothing, all the database files and transaction logs are stored on 
shared storage.

The database is divided into "log groups."  Each log group has one transaction 
log and multiple tablespaces (it's called "database space" instead of 
tablespace.)

Each DB instance in the cluster owns multiple log groups, and handles 
reads/writes to the data in its owning log groups.  When a DB instance fails, 
other surviving DB instances take over the log groups of the failed DB 
instance, recover the data using the transaction log of the log group, and 
accepts reads/writes to the data in the log group.  The DBA configures which DB 
instance initially owns which log groups and which DB instances are candidates 
to take over which log groups.

This way, no server is idle as a standby.  All DB instances work hard to 
process read-write transactions.  This "no idle server for HA" is one of the 
things Oracle RAC users want in terms of cost.

However, it still requires data and application partitioning unlike Hyder.  
Does anyone think of a way to eliminate partitioning?  Data and application 
partitioning is what Oracle RAC users want to avoid or cannot tolerate.

Ref: Introduction of the Symfoware shared nothing scale-out called "load share."
https://pdfs.semanticscholar.org/8b60/163593931cebc58e9f637cfb501500230adc.pdf


Regards
Takayuki Tsunakawa


--- below is Sumanta's original mail ---
From: Sumanta Mukherjee <sumanta.mukher...@enterprisedb.com>
Sent: Wednesday, June 17, 2020 5:34 PM
To: Tsunakawa, Takayuki/綱川 貴之 <tsunakawa.ta...@fujitsu.com>
Cc: Bruce Momjian <br...@momjian.us>; Merlin Moncure <mmonc...@gmail.com>; 
Robert Haas <robertmh...@gmail.com>; maumau...@gmail.com
Subject: Re: I'd like to discuss scaleout at PGCon

Hello,

I saw the presentation and it is great except that  it seems to be unclear of 
both SD and SN  if the storage and the compute are being explicitly separated. 
Separation of storage and compute would have some cost advantages as per my 
understanding. The following two work (ref below) has some information about 
the usefulness of this technique for scale out and so it would be an 
interesting addition  to see if in the SN architecture that is being proposed 
could be modified to take care of this phenomenon and reap the gain.

1. Philip A. Bernstein, Colin W. Reid, and Sudipto Das. 2011. Hyder - A
Transactional Record Manager for Shared Flash. In CIDR 2011.

2. Dhruba Borthakur. 2017. The Birth of RocksDB-Cloud. http://rocksdb.
blogspot.com/2017/05/the-birth-of-rocksdb-cloud.html<http://blogspot.com/2017/05/the-birth-of-rocksdb-cloud.html>.

With Regards,
Sumanta Mukherjee.
EnterpriseDB: http://www.enterprisedb.com

RE: I'd like to discuss scaleout at PGCon

Reply via email to