On Fri, Feb 9, 2018 at 4:14 PM, Shay Rojansky <r...@roji.org> wrote: > Am a bit late to this thread, sorry if I'm slightly rehashing things. I'd > like to go back to the basic on this. > > Unless I'm mistaken, at least in the Java and .NET world, clients are > almost always expected to have their own connection pooling, either > implemented inside the driver (ADO.NET model) or as a separate modular > component (JDBC). This approach has a few performance advantages: > > 1. "Opening" a new pooled connection is virtually free - no TCP connection > needs to be opened, no I/O, no startup packet, nothing (only a tiny bit of > synchronization). > 2. Important client state can be associated to physical connections. For > example, prepared statements can be tracked on the physical connection, and > persisted when the connection is returned to the pool. The next time the > physical connection is returned from the pool, if the user tries to > server-prepare a statement, we can check on the connection if it has > already been prepared in a "previous lifetime", and if so, no need to > prepare again. This is vital for scenarios with short-lived (pooled) > connections, such as web. Npgsql does this. > > Regarding the problem of idle connections being kept open by clients, I'd > argue it's a client-side problem. If the client is using a connection pool, > the pool should be configurable to close idle connections after a certain > time (I think this is relatively standard behavior). If the client isn't > using a pool, it seems to be the application's responsibility to release > connections when they're no longer needed. > > The one drawback is that the pooling is application-specific, so it can't > be shared by multiple applications/hosts. So in some scenarios it may make > sense to use both client pooling and proxy/server pooling. > > To sum it up, I would argue that connection pooling should first and > foremost be considered as a client feature, rather than a proxy feature > (pgpool) or server feature (the PostgreSQL pooling being discussed here). > This isn't to say server-side pooling has no value though. >
Recently, I did a large amount of parallel data processing where the results were stored in PG. I had about 1000 workers each with their own PG connection. As you pointed out, application pooling doesn't make sense in this scenario. I tried pgpool and pgbouncer, and both ended up as the bottleneck. Overall throughput was not great but it was highest without a pooler. That aligns with Konstantin's benchmarks too. As far as I know, server pooling is the only solution to increase throughput, without upgrading hardware, for this use case. I hope this PR gets accepted!