Re: libpq compression

Konstantin Knizhnik Fri, 22 Jun 2018 09:33:31 -0700



On 22.06.2018 18:59, Robbie Harwood wrote:

Konstantin Knizhnik <k.knizh...@postgrespro.ru> writes:

On 21.06.2018 20:14, Robbie Harwood wrote:

Konstantin Knizhnik <k.knizh...@postgrespro.ru> writes:

On 21.06.2018 17:56, Robbie Harwood wrote:

Konstantin Knizhnik <k.knizh...@postgrespro.ru> writes:

On 20.06.2018 23:34, Robbie Harwood wrote:

Konstantin Knizhnik <k.knizh...@postgrespro.ru> writes:

Well, that's a design decision you've made.  You could put
lengths on chunks that are sent out - then you'd know exactly how
much is needed.  (For instance, 4 bytes of network-order length
followed by a complete payload.)  Then you'd absolutely know
whether you have enough to decompress or not.

Do you really suggest to send extra header for each chunk of data?
Please notice that chunk can be as small as one message: dozen of
bytes because libpq is used for client-server communication with
request-reply pattern.

I want you to think critically about your design.  I *really* don't
want to design it for you - I have enough stuff to be doing.  But
again, the design I gave you doesn't necessarily need that - you
just need to properly buffer incomplete data.

Right now secure_read may return any number of available bytes. But
in case of using streaming compression, it can happen that available
number of bytes is not enough to perform decompression. This is why
we may need to try to fetch additional portion of data. This is how
zpq_stream is working now.

No, you need to buffer and wait until you're called again.  Which is
to say: decompress() shouldn't call secure_read().  secure_read()
should call decompress().

I this case I will have to implement this code twice: both for backend
and frontend, i.e. for secure_read/secure_write and
pqsecure_read/pqsecure_write.

Likely, yes.  You can see that this is how TLS does it (which you should
be using as a model, architecture-wise).

Frankly speaking i was very upset by design of libpq communication
layer in Postgres: there are two different implementations of almost
the same stuff for cbackend and frontend.

Changing the codebases so that more could be shared is not necessarily a
bad idea; however, it is a separate change from compression.

I do not understand how it is possible to implement in different way
and what is wrong with current implementation.

The most succinct thing I can say is: absolutely don't modify
pq_recvbuf().  I gave you pseudocode for how to do that.  All of your
logic should be *below* the secure_read/secure_write functions.

I cannot approve code that modifies pq_recvbuf() in the manner you
currently do.

Well. I understand you arguments.  But please also consider my
argument above (about avoiding code duplication).

In any case, secure_read function is called only from pq_recvbuf() as
well as pqsecure_read is called only from pqReadData.  So I do not
think principle difference in handling compression in secure_read or
pq_recvbuf functions and do not understand why it is "destroying the
existing model".

Frankly speaking, I will really like to destroy existed model, moving
all system dependent stuff in Postgres to SAL and avoid this awful mix
of code sharing and duplication between backend and frontend. But it
is a another story and I do not want to discuss it here.

I understand you want to avoid code duplication.  I will absolutely
agree that the current setup makes it difficult to share code between
postmaster and libpq clients.  But the way I see it, you have two
choices:

1. Modify the code to make code sharing easier.  Once this has been
    done, *then* build a compression patch on top, with the nice new
    architecture.

2. Leave the architecture as-is and add compression support.
    (Optionally, you can make code sharing easier at a later point.)

Fundamentally, I think you value code sharing more than I do.  So while
I might advocate for (2), you might personally prefer (1).

But right now you're not doing either of those.

If we are speaking about the "right design", then neither your
suggestion, neither my implementation are good and I do not see
principle differences between them.

The right approach is using "decorator pattern": this is how streams
are designed in .Net/Java. You can construct pipe of "secure",
"compressed" and whatever else streams.

I *strongly* disagree, but I don't think you're seriously suggesting
this.

My idea was the following: client want to use compression. But
server may reject this attempt (for any reasons: it doesn't support
it, has no proper compression library, do not want to spend CPU for
decompression,...) Right now compression algorithm is
hardcoded. But in future client and server may negotiate to choose
proper compression protocol.  This is why I prefer to perform
negotiation between client and server to enable compression.  Well,
for negotiation you could put the name of the algorithm you want in
the startup.  It doesn't have to be a boolean for compression, and
then you don't need an additional round-trip.

Sorry, I can only repeat arguments I already mentioned:
- in future it may be possible to specify compression algorithm
- even with boolean compression option server may have some reasons to
reject client's request to use compression

Extra flexibility is always good thing if it doesn't cost too
much. And extra round of negotiation in case of enabling compression
seems to me not to be a high price for it.

You already have this flexibility even without negotiation.  I don't
want you to lose your flexibility.  Protocol looks like this:

- Client sends connection option "compression" with list of
    algorithms it wants to use (comma-separated, or something).

- First packet that the server can compress one of those algorithms
    (or none, if it doesn't want to turn on compression).

No additional round-trips needed.

This is exactly how it works now...  Client includes compression
option in connection string and server replies with special message
('Z') if it accepts request to compress traffic between this client
and server.

No, it's not.  You don't need this message.  If the server receives a
compression request, it should just turn compression on (or not), and
then have the client figure out whether it got compression back.

How it will managed to do it. It receives some reply and first of all
it should know whether it has to be decompressed or not.

You can tell whether a message is compressed by looking at it.  The way
the protocol works, every message has a type associated with it: a
single byte, like 'R', that says what kind of message it is.


Compressed message can contain any sequence of bytes, including 'R':)


Thanks,
--Robbie


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: libpq compression

Reply via email to