Re: libpq compression

Konstantin Knizhnik Fri, 08 Feb 2019 09:18:48 -0800



On 08.02.2019 19:26, Robbie Harwood wrote:

Konstantin Knizhnik <k.knizh...@postgrespro.ru> writes:

On 08.02.2019 10:01, Iwata, Aya wrote:

I fixed all issues you have reported except using list of supported
compression algorithms.

Sure. I confirmed that.

It will require extra round of communication between client and
server to make a decision about used compression algorithm.

In beginning of this thread, Robbie Harwood said that no extra
communication needed.  I think so, too.

Well, I think that this problem is more complex and requires more
discussion.
There are three places determining choice of compression algorithm:
1. Specification of compression algorithm by client. Right now it is
just boolean "compression" parameter in connection string,
but it is obviously possible to specify concrete algorithm here.
2. List of compression algorithms supported by client.
3. List of compression algorithms supported by server.

Concerning first option I have very serious doubt that it is good idea
to let client choose compression protocol.
Without extra round-trip it can be only implemented in this way:
if client toggles compression option in connection string, then libpq
includes in startup packet list of supported compression algorithms.
Then server intersects this list with its own set of supported
compression algorithms and if result is not empty, then
somehow choose  one of the commonly supported algorithms and sends it to
the client with 'z' command.

The easiest way, which I laid out earlier in
id:jlgfu1gqjbk....@redhat.com, is to have the server perform selection.
The client sends a list of supported algorithms in startup.  Startup has
a reply, so if the server wishes to use compression, then its startup
reply contains which algorithm to use.  Compression then begins after
startup.

If you really wanted to compress half the startup for some reason, you
can even have the server send a packet which consists of the choice of
compression algorithm and everything else in it subsequently
compressed.  I don't see this being useful.  However, you can use a
similar approach to let the client choose the algorithm if there were
some compelling reason for that (there is none I'm aware of to prefer
one to the other) - startup from client requests compression, reply from
server lists supported algorithms, next packet from client indicates
which one is in use along with compressed payload.

I already replied you that that next package cannot indicate whichalgorithm the client has choosen.

Using magics or some other tricks are not reliable and acceptable solution/
It can be done only by introducing extra message type.

Actually it is not really needed, because if client sends to the serverlist of supported algorithms in startup packet,then server can made a decision and inform client about it (usingspecial message type as it is done now).Then client doesn't need to make a choice. I can do it if everybodythink that choice of compression algorithm is so important feature.Just wonder: Postgres now is living good with hardcoded zlib andbuilt-in LZ compression algorithm implementation.

It may help to keep in mind that you are defining your own message type
here.

Frankly speaking, I do not think that such flexibility in choosing
compression algorithms is really needed.
I do not expect that there will be many situations where old client has
to communicate with new server or visa versa.
In most cases both client and server belongs to the same postgres
distributive and so implements the same compression algorithm.
As far as we are compressing only temporary data (traffic), the problem
of providing backward compatibility seems to be not so important.

Your comments have been heard, but this is the model that numerous folks
from project has told you we have.  Your code will not pass review
without algorithm agility.

src/backend/libpq/pqcomm.c :
In current Postgres source code, pq_recvbuf() calls secure_read()
and pq_getbyte_if_available() also calls secure_read().
It means these functions are on the same level.
However in your change, pq_getbyte_if_available() calls pq_recvbuf(),
and  pq_recvbuf() calls secure_read(). The level of these functions is 
different.

I think the purpose of pq_getbyte_if_available() is to get a
character if it exists and the purpose of pq_recvbuf() is to acquire
data up to the expected length.  In your change,
pq_getbyte_if_available() may have to do unnecessary process waiting
or something.

Sorry, but this change is essential. We can have some available data
in compression buffer and we need to try to fetch it in
pq_getbyte_if_available() instead of just returning EOF.

Aya is correct about the purposes of these functions.  Take a look at
how the buffering in TLS or GSSAPI works for an example of how to do
this correctly.

As with agility, this is what multiple folks from the project have told
you is a hard requirement.  None of us will be okaying your code without
proper transport layering.

Guys, I wondering which layering violation you are talking about?

Right now there are two cut&pasted peace of almost the same code inpqcomm.c:


static int
pq_recvbuf(void)
...
    /* Ensure that we're in blocking mode */
    socket_set_nonblocking(false);

    /* Can fill buffer from PqRecvLength and upwards */
    for (;;)
    {
        int            r;

        r = secure_read(MyProcPort, PqRecvBuffer + PqRecvLength,
                        PQ_RECV_BUFFER_SIZE - PqRecvLength);

        if (r < 0)
        {
            if (errno == EINTR)
                continue;        /* Ok if interrupted */

            /*

* Careful: an ereport() that tries to write to the clientwould

             * cause recursion to here, leading to stack overflow and core
             * dump!  This message must go *only* to the postmaster log.
             */
            ereport(COMMERROR,
                    (errcode_for_socket_access(),
                     errmsg("could not receive data from client: %m")));
            return EOF;
        }
        if (r == 0)
        {
            /*
             * EOF detected.  We used to write a log message here, but it's
             * better to expect the ultimate caller to do that.
             */
            return EOF;
        }
        /* r contains number of bytes read, so just incr length */
        PqRecvLength += r;
        return 0;
    }
}

int
pq_getbyte_if_available(unsigned char *c)
{
...
    /* Put the socket into non-blocking mode */
    socket_set_nonblocking(true);

    r = secure_read(MyProcPort, c, 1);
    if (r < 0)
    {
        /*
         * Ok if no data available without blocking or interrupted (though

* EINTR really shouldn't happen with a non-blocking socket).Report

         * other errors.
         */
        if (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR)
            r = 0;
        else
        {
            /*

* Careful: an ereport() that tries to write to the clientwould

             * cause recursion to here, leading to stack overflow and core
             * dump!  This message must go *only* to the postmaster log.
             */
            ereport(COMMERROR,
                    (errcode_for_socket_access(),
                     errmsg("could not receive data from client: %m")));
            r = EOF;
        }
    }
    else if (r == 0)
    {
        /* EOF detected */
        r = EOF;
    }

    return r;
}

The only difference between them is that in first case we are usingblock mode and in the second case non-blocking mode.

Also there is loop in first case handling ENITR.

I have added nowait parameter to  pq_recvbuf(bool nowait)
and remove second fragment of code so it is changed just to one line:

if (PqRecvPointer < PqRecvLength || (r = pq_recvbuf(true)) > 0)

Both functions (pq_recvbuf and pq_getbyte_if_available) are defined inthe same file.So there is not any layering violation here. It is just elimination ofduplicated code.

If the only objection is that pq_getbyte_if_available is callingpq_recvbuf, then I can add some other function compress_read (as analogof secure read) and call it from both functions. But frankly speaking Ido not see any advantages of such approach.It just introduce extra function call and gives no extraencapsulation.modularity, flexibility or whatever else.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: libpq compression

Reply via email to