On 09/01/2015 02:39 AM, Bruce Momjian wrote: > On Mon, Aug 31, 2015 at 01:16:21PM -0700, Josh Berkus wrote: >> I'm also going to pontificate that, for a future solution, we should not >> focus on write *IO*, but rather on CPU and RAM. The reason for this >> thinking is that, with the latest improvements in hardware and 9.5 >> improvements, it's increasingly rare for machines to be bottlenecked on >> writes to the transaction log (or the heap). This has some implications >> for system design. For example, solutions which require all connections >> to go through a single master node do not scale sufficiently to be worth >> bothering with. > > Well, I highlighted write IO for sharding because sharding is the only > solution that allows write scaling. If we want to scale CPU, we are > better off using server parallelism, and to scale CPU and RAM, a > multi-master/BDR solution seems best. (Multi-master doesn't do write > scaling because you eventually have to write all the data to each node.)
You're assuming that our primary bottleneck for writes is IO. It's not at present for most users, and it certainly won't be in the future. You need to move your thinking on systems resources into the 21st century, instead of solving the resource problems from 15 years ago. Currently, CPU resources and locking are the primary bottlenecks on writing for the vast majority of the hundreds of servers I tune every year. This even includes AWS, with EBS's horrible latency; even in that environment, most users can outstrip PostgreSQL's ability to handle requests by getting 20K PRIOPs. Our real future bottlenecks are: * ability to handle more than a few hundred connections * locking limits on the scalability of writes * ability to manage large RAM and data caches The only place where IO becomes the bottleneck is for the batch-processing, high-throughput DW case ... and I would argue that existing forks already handle that case. Any sharding solution worth bothering with will solve some or all of the above by extending our ability to process requests across multiple nodes. Any solution which does not is merely an academic curiosity. > For these reasons, I think sharding has a limited use, and hence, I > don't think the community will be willing to add a lot of code just to > enable auto-sharding. I think it has to be done in a way that adding > sharding also gives other benefits, like better FDWs and cross-node ACID > control. > > In summary, I don't think adding a ton of code just to do sharding will > be acceptable. A corollary of that, is that if FDWs are unable to > provide useful sharding, I don't see an acceptable way of adding > built-in sharding to Postgres. So, while I am fully in agreement with you that having side benefits to our sharding tools, I think you're missing the big picture entirely. In a few years, clustered/sharded PostgreSQL will be the default installation, or we'll be a legacy database. Single-node and single-master databases are rapidly becoming history. >From my perspective, we don't need an awkward, limited, bolt-on solution for write-scaling. We need something which will become core to how PostgreSQL works. I just don't see us getting there with the described FDW approach, which is why I keep raising issues with it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers