Re: libpq compression

Robert Haas Mon, 22 Feb 2021 11:49:15 -0800

On Thu, Feb 11, 2021 at 8:09 AM Daniil Zakhlystov
<usernam...@yandex-team.ru> wrote:
> [ benchmark results ]


So, if I read these results correctly, on the "pg_restore of IMDB
database" test, we get 88% of the RX bytes reduction and 99.8% of the
TX bytes reduction for 90% of the CPU cost. On the "pgbench" test,
which probably has much smaller packets, chunked compression gives us
no bandwidth reduction and in fact consumes slightly more network
bandwidth -- which seems like it has to be an implementation defect,
since we should always be able to fall back to sending the
uncompressed packet if the compressed one is larger, or will be after
adding the wrapper overhead. But with the current code, at least, we
pay about a 30% CPU tax, and there's no improvement. The permanent
compression imposes a whopping 90% CPU tax, but we save about 33% on
TX bytes and about 14% on RX bytes.

If that's an accurate reading of the results, then I would say the
"pg_restore of IMDB database" test is pretty much a wash. Some people
might prefer to incur the extra CPU cost to get the extra bandwidth
savings, and other people might not. But neither group of people
really has cause for complaint if the other approach is selected,
because the costs and benefits are similar in both cases. But in the
pgbench test case, the chunked compression looks horrible. Sure, it's
less costly from a CPU perspective than the other approach, but since
you don't get any benefit, you'd be far better off disabling
compression altogether than using the chunked approach.

However, I feel like some of this has almost got to be an
implementation deficiency in the "chunked" version of the patch. Now,
I haven't looked at that patch. But, there are certainly a number of
things that it might be failing to do that could make a big
difference:

1. As I mentioned above, we need to fall back to sending the
uncompressed message if compression fails to reduce the size, or if it
doesn't reduce the size by enough to compensate for the header we have
to add to the packet (I assume this is 5 bytes, perhaps 6 if you allow
a byte to mention the compression type).

2. Refining this further, if we notice that we are failing to compress
messages regularly, maybe we should adaptively give up. The simplest
idea would be something like: keep track of what percentage of the
time compression succeeds in reducing the message size. If in the last
100 attempts we got a benefit fewer than 75 times, then conclude the
data isn't very compressible and switch to only attempting to compress
every twentieth packet or so. If the data changes and becomes more
compressible again the statistics will eventually tilt back in favor
of compressing every packet again; if not, we'll only be paying 5% of
the overhead.

3. There should be some minimum size before we attempt compression.
pglz gives up right away if the input is less than 32 bytes; I don't
know if that's the right limit, but presumably it'd be very difficult
to save 5 or 6 bytes out of a message smaller than that, and maybe
it's not worth trying even for slightly larger messages.

4. It might be important to compress multiple packets at a time. I can
even imagine having two different compressed protocol messages, one
saying 'here is a compressed messages' and the other saying 'here are
a bunch of compressed messages rolled up into one packet'.

But there's a subtler way in which the permanent compression approach
could be winning, which is that the compressor can retain state over
long time periods. In a single pgbench response, there's doubtless
some opportunity for the compressor to find savings, but an individual
response doesn't likely include all that much duplication. But just
think about how much duplication there is from one response to the
next. The entire RowDescription message is going to be exactly the
same for every query. If you can represent that in just a couple of
bytes, it think that figures to be a pretty big win. If I had to
guess, that's likely why the permanent compression approach seems to
deliver a significant bandwidth savings even on the pgbench test,
while the chunked approach doesn't. Likewise in the other direction:
the query doesn't necessarily contain a lot of internal duplication,
but it duplicate the previous query to a very large extent. It would
be interesting to know whether this theory is correct, and whether
anyone can spot a flaw in my reasoning.

If it is, that doesn't necessarily mean we can't use the chunked
approach, but it certainly makes it less appealing. I can see two ways
to go. One would be to just accept that it won't get much benefit in
cases like the pgbench example, and mitigate the downsides as well as
we can. A version of this patch that caused a 3% CPU overhead in cases
where it can't compress would be far more appealing than one that
causes a 30% overhead in such cases (which seems to be where we are
now).

Alternatively, we could imagine that the compressed-message packets as
carrying a single continuous compressed stream of bytes, so that the
compressor state is retained from one compressed message to the next.
Any number of uncompressed messages could could be sent in between,
without doing anything to the compression state, but when you send the
next compression message, both the sender and receiver feel like the
bytes they're now being given are appended onto whatever bytes they
saw last. This would presumably reocup a lot of the compression
benefit that the permanent compression approach sees on the pgbench
test, but it has some notable downsides. In particular, now you have
to wonder what exactly you're gaining by not just compressing
everything. Nobody snooping on the stream can snoop on an individual
packet without having seen the whole history of compressed packets
from the beginning of time, nor can some kind of middleware like
pgbouncer decompress each payload packet just enough to see what the
first byte may be. It's either got to decompress all of every packet
to keep its compression state current, or just give up on knowing
anything about what's going on inside those packets. And you've got to
worry about all the same details about flushing the compressor state
that we were worrying about with the compress-everything approach.
Blech.

Or we could compress everything, as Konstantin proposes.

There's another point here that just occurred to me, though. In the
pgbench test, we send and receive roughly equal quantities of data at
a rate that works out, over the duration of the test, to about
4.8MB/s. On the data-loading test, one direction is insignificant, but
the other direction transfers data at a far higher rate, close to
25MB/s. I'm having a little trouble matching these numbers, which I
computed from the text results in the email, with the graphs, which
seem to show much higher values, but the basic point is clear enough
either way: the data load puts a LOT more strain on the network. The
reason why that's relevant is that if you're not saturating the
network anyway, then why do you want to use compression? It's bound to
cost something in terms of CPU, and all you're saving is network
bandwidth that wasn't the problem anyway. A bulk load is a lot more
likely to hit the limits of your network. At the same time, it's not
clear to me that a pgbench test *couldn't* saturate the network. This
one was only getting ~2.4k TPS, which is not much. A read-write
pgbench can go more than 10x that fast, and a read-only one can go
more than 100x that fast, and suddenly that's a whole lot more network
consumption.

So at the end of the day I'm not really quite sure what is best here.
I agree with all of Craig's points about the advantages of
packet-level compression, so I'd really prefer to make that approach
work if we can. However, it also seems to me that there's room to be
fairly concerned about what these test results are showing.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: libpq compression

Reply via email to