Re: libpq compression

Konstantin Knizhnik Tue, 19 Jun 2018 23:41:27 -0700



On 20.06.2018 00:04, Robbie Harwood wrote:

Konstantin Knizhnik <k.knizh...@postgrespro.ru> writes:

On 18.06.2018 23:34, Robbie Harwood wrote:

I also don't like that you've injected into the *startup* path -
before authentication takes place.  Fundamentally, authentication (if
it happens) consists of exchanging some combination of short strings
(e.g., usernames) and binary blobs (e.g., keys).  None of this will
compress well, so I don't see it as worth performing this negotiation
there - it can wait.  It's also another message in every startup.
I'd leave it to connection parameters, personally, but up to you.

  From my point of view compression of libpq traffic is similar with SSL
and should be toggled at the same stage.

But that's not what you're doing.  This isn't where TLS gets toggled.

TLS negotiation happens as the very first packet: after completing the
TCP handshake, the client will send a TLS negotiation request.  If it
doesn't happen there, it doesn't happen at all.

(You *could* configure it where TLS is toggled.  This is, I think, not a
good idea.  TLS encryption is a probe: the server can reject it, at
which point the client tears everything down and connects without TLS.
So if you do the same with compression, that's another point of tearing
down an starting over.  The scaling on it isn't good either: if we add
another encryption method into the mix, you've doubled the number of
teardowns.)

Yes, you are right. There is special message for enabling TLS procotol.
But I do not think that the same think is needed for compression.

This is why I prefer to specify compression in connectoin options. Socompression may be enabled straight after processing of startup package.Frankly speaking I still do no see reasons to postpone enablingcompression till some later moment.

Definitely authentication parameter are not so large to be efficiently
compressed, by compression (may be in future password protected) can
somehow protect this data.
In any case I do not think that compression of authentication data may
have any influence on negotiation speed.
So I am not 100% sure that toggling compression just after receiving
startup package is the only right solution.
But I am not also convinced in that there is some better place where
compressor should be configured.
Do you have some concrete suggestions for it? In InitPostgres just after
PerformAuthentication ?

Hmm, let me try to explain this differently.

pq_configure() (as you've called it) shouldn't send a packet.  At its
callsite, we *already know* whether we want to use compression - that's
what the port->use_compression option says.  So there's no point in
having a negotiation there - it's already happened.

My idea was the following: client want to use compression. But servermay reject this attempt (for any reasons: it doesn't support it, has noproper compression library,do not want to spend CPU for decompression,...) Right now compressionalgorithm is hardcoded. But in future client and server may negotiate tochoose proper compression protocol.This is why I prefer to perform negotiation between client and server toenable compression.


The other thing you do in pq_configure() is call zpq_create(), which
does a bunch of initialization for you.  I am pretty sure that all of
this can be deferred until the first time you want to send a compressed
message - i.e., when compress()/decompress() is called for the first
time from *secure_read() or *secure_write().

Also please notice that compression is useful not only for client-server
communication, but also for replication channels.
Right now it is definitely used in both cases, but if we move
pq_configure somewhere else, we should check that this code is invoked
in both for normal backends and walsender.

"We" meaning you, at the moment, since I don't think any of the rest of
us have set up tests with this code :)

If there's common code to be shared around, that's great.  But it's not
imperative; in a lot of ways, the network stacks are very different from
each other, as I'm sure you've seen.  Let's not have the desire for code
reuse get in the way of good, maintainable design.

Using terminology from https://facebook.github.io/zstd/zstd_manual.html :

Right now you use streaming (ZSTD_{compress,decompress}Stream()) as the
basis for your API.  I think this is probably a mismatch for what we're
doing here - we're doing one-shot compression/decompression of packets,
not sending video or something.

I think our use case is better served by the non-streaming interface, or
what they call the "Simple API" (ZSTD_{decompress,compress}()).
Documentation suggests it may be worth it to keep an explicit context
around and use that interface instead (i.e.,
ZSTD_{compressCCTx,decompressDCtx}()), but that's something you'll have
to figure out.

You may find making this change helpful for addressing the next issue.

Sorry, but here I completely disagree with you.
What we are doing is really streaming compression, not compression of
individual messages/packages.
Yes, it is not a video, but actually COPY data has the same nature as
video data.
The main benefit of streaming compression is that we can use the same
dictionary for compressing all messages (and adjust this dictionary
based on new data).
We do not need to write dictionary and separate header for each record.
Otherwize compression of libpq messages will be completely useless:
typical size of message is too short to be efficiently compressed. The
main drawback of streaming compression is that you can not decompress
some particular message without decompression of all previous messages.
This is why streaming compression can not be used to compress database
pages  (as it is done in CFS, provided in PostgresPro EE). But for libpq
it is no needed.

That makes sense, thanks.  The zstd documentation doesn't articulate
that at all.

I don't like where you've put the entry points to the compression logic:
it's a layering violation.  A couple people have expressed similar
reservations I think, so let me see if I can explain using
`pqsecure_read()` as an example.  In pseudocode, `pqsecure_read()` looks
like this:

      if conn->is_tls:
          n = tls_read(conn, ptr, len)
      else:
          n = pqsecure_raw_read(conn, ptr, len)
      return n

I want to see this extended by your code to something like:

      if conn->is_tls:
          n = tls_read(conn, ptr, len)
      else:
          n = pqsecure_raw_read(conn, ptr, len)

      if conn->is_compressed:
          n = decompress(ptr, n)

      return n

In conjunction with the above change, this should also significantly
reduce the size of the patch (I think).

Yes, it will simplify patch. But make libpq compression completely
useless (see my explanation above).
We need to use streaming compression, and to be able to use streaming
compression I have to pass function for fetching more data to
compression library.

I don't think you need that, even with the streaming API.

To make this very concrete, let's talk about ZSTD_decompressStream (I'm
pulling information from
https://facebook.github.io/zstd/zstd_manual.html#Chapter7 ).

Using the pseudocode I'm asking for above, the decompress() function
would look vaguely like this:

     decompress(ptr, n)
         ZSTD_inBuffer in = {0}
         ZSTD_outBuffer out = {0}

         in.src = ptr
         in.size = n

         while in.pos < in.size:
             ret = ZSTD_decompressStream(out, in)
             if ZSTD_isError(ret):
                 give_up()

         memcpy(ptr, out.dst, out.pos)
         return out.pos

(and compress() would follow a similar pattern, if we were to talk about
it).

It will not work in this way. We do not know how much input data we needto be able to decompress message.

So loop should be something like this:

 decompress(ptr, n)
        ZSTD_inBuffer in = {0}
        ZSTD_outBuffer out = {0}

        in.src = ptr
        in.size = n

        while true
            ret = ZSTD_decompressStream(out, in)
            if ZSTD_isError(ret):
                give_up()
            if out.pos != 0
                // if we deomcpress soemthing
                return out.pos;
            read_input(in);



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: libpq compression

Reply via email to