Re: Proposal: Adding json logging

2018-04-14 Thread Ryan Pedela
On Sat, Apr 14, 2018, 4:33 PM Andres Freund  wrote:

> On 2018-04-15 00:31:14 +0200, David Fetter wrote:
> > On Sat, Apr 14, 2018 at 01:20:16PM -0700, Andres Freund wrote:
> > > On 2018-04-14 18:05:18 +0200, David Fetter wrote:
> > > > CSV is very poorly specified, which makes it at best complicated to
> > > > build correct parsing libraries. JSON, whatever gripes I have about
> > > > the format[1] is extremely well specified, and hence has excellent
> > > > parsing libraries.
> > >
> > > Worth to notice that useful json formats for logging also kinda don't
> > > follow standards. Either you end up with entire logfiles as one big
> > > array, which most libraries won't parse and makes logrotate etc really
> > > complicated, or you end up with some easy to parse format where
> newlines
> > > have non-standard record separator meaning.
> >
> > I don't see this as a big problem.  The smallest-lift thing is to put
> > something along the lines of:
> >
> > When you log as JSON, those logs are JSON objects, one per output
> > event.  They are not guaranteed to break on newlines.
> >
> > A slightly larger lift would include escaping newlines and ensuring
> > that JSON output is always single lines, however long.
>
> Still obliterates your "standard standard standard" line of
> argument. There seem to valid arguments for adding json regardless, but
> that line is just bogus.
>
> Greetings,
>
> Andres Freund
>

The format is known as JSON Lines.
http://jsonlines.org/

Ryan

>


Re: Built-in connection pooling

2018-02-09 Thread Ryan Pedela
On Fri, Feb 9, 2018 at 4:14 PM, Shay Rojansky  wrote:

> Am a bit late to this thread, sorry if I'm slightly rehashing things. I'd
> like to go back to the basic on this.
>
> Unless I'm mistaken, at least in the Java and .NET world, clients are
> almost always expected to have their own connection pooling, either
> implemented inside the driver (ADO.NET model) or as a separate modular
> component (JDBC). This approach has a few performance advantages:
>
> 1. "Opening" a new pooled connection is virtually free - no TCP connection
> needs to be opened, no I/O, no startup packet, nothing (only a tiny bit of
> synchronization).
> 2. Important client state can be associated to physical connections. For
> example, prepared statements can be tracked on the physical connection, and
> persisted when the connection is returned to the pool. The next time the
> physical connection is returned from the pool, if the user tries to
> server-prepare a statement, we can check on the connection if it has
> already been prepared in a "previous lifetime", and if so, no need to
> prepare again. This is vital for scenarios with short-lived (pooled)
> connections, such as web. Npgsql does this.
>
> Regarding the problem of idle connections being kept open by clients, I'd
> argue it's a client-side problem. If the client is using a connection pool,
> the pool should be configurable to close idle connections after a certain
> time (I think this is relatively standard behavior). If the client isn't
> using a pool, it seems to be the application's responsibility to release
> connections when they're no longer needed.
>
> The one drawback is that the pooling is application-specific, so it can't
> be shared by multiple applications/hosts. So in some scenarios it may make
> sense to use both client pooling and proxy/server pooling.
>
> To sum it up, I would argue that connection pooling should first and
> foremost be considered as a client feature, rather than a proxy feature
> (pgpool) or server feature (the PostgreSQL pooling being discussed here).
> This isn't to say server-side pooling has no value though.
>

Recently, I did a large amount of parallel data processing where the
results were stored in PG. I had about 1000 workers each with their own PG
connection. As you pointed out, application pooling doesn't make sense in
this scenario. I tried pgpool and pgbouncer, and both ended up as the
bottleneck. Overall throughput was not great but it was highest without a
pooler. That aligns with Konstantin's benchmarks too. As far as I know,
server pooling is the only solution to increase throughput, without
upgrading hardware, for this use case.

I hope this PR gets accepted!