Hi Dmitry,
On 13.02.2019 16:56, Dmitry Dolgov wrote:
On Mon, Feb 11, 2019 at 4:52 PM Andres Freund <and...@anarazel.de> wrote:
On 2019-02-11 12:46:07 -0300, Alvaro Herrera wrote:
On 2019-Feb-11, Andreas Karlsson wrote:
Imagine the following query which uses the session ID from the cookie to
check if the logged in user has access to a file.
SELECT may_download_file(session_id => $1, path => $2);
When the query with its parameters is compressed the compressed size will
depend on the similarity between the session ID and the requested path
(assuming Zstd works similar to DEFLATE), so by tricking the web browser
into making requests with specifically crafted paths while monitoring the
traffic between the web server and the database the compressed request size
can be use to hone in the session ID and steal people's login sessions, just
like the CRIME attack[1].
I would have said that you'd never let the attacker eavesdrop into the
traffic between webserver and DB, but then that's precisely the scenario
you'd use SSL for, so I suppose that even though this attack is probably
just a theoretical risk at this point, it should definitely be
considered.
Right.
Now, does this mean that we should forbid libpq compression
completely? I'm not sure -- maybe it's still usable if you encapsulate
that traffic so that the attackers cannot get at it, and there's no
reason to deprive those users of the usefulness of the feature. But
then we need documentation warnings pointing out the risks.
I think it's an extremely useful feature, and can be used in situation
where this kind of attack doesn't pose a significant
danger. E.g. pg_dump, pg_basebackup seem candidates for that, and even
streaming replication seems much less endangered than sessions with lots
of very small queries. But I think that means it needs to be
controllable from both the server and client, and default to off
(although it doesn't seem crazy to allow it in the aforementioned
cases).
Totally agree with this point.
On Thu, Nov 29, 2018 at 8:13 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
On Mon, Aug 13, 2018 at 8:48 PM Robert Haas <robertmh...@gmail.com> wrote:
I agree with the critiques from Robbie Harwood and Michael Paquier
that the way in that compression is being hooked into the existing
architecture looks like a kludge. I'm not sure I know exactly how it
should be done, but the current approach doesn't look natural; it
looks like it was bolted on.
After some time spend reading this patch and investigating different points,
mentioned in the discussion, I tend to agree with that. As far as I see it's
probably the biggest disagreement here, that keeps things from progressing.
To address this point, I've spent some time playing with the patch and to see
how it would look like if compression logic would not incorporate everything
else. So far I've come up with a buffers juggling that you can find in the
attachments. It's based on the v11 (I'm not able to keep up with Konstantin),
doesn't change all relevant code (mostly stuff around secure read/write), and
most likely misses a lot of details, but at least works as expected
while testing
manually with psql (looks like tests are broken with the original v11, so I
haven't tried to run them with the attached patch either). I wonder if this
approach could be more natural for the feature?
Also I have few questions/commentaries:
* in this discussion few times was mentioned a situation when compression logic
nees to read more data to decompress the current buffer. Am I understand
correctly, that it's related to Z_BUF_ERROR, when zlib couldn't make any
progress and needs more data? If yes, where is it handled?
* one of the points was that the code should be copy pasted between be/fe for a
proper separation, but from attached experimental patch it doesn't look that
a lot of code should be duplicated.
* I've noticed on v11 a problem, when an attempt to connect to a db without
specified compression failed with the error "server is not supported
requested compression algorithm". E.g. when I'm just executing createdb.
I haven't looked at the negotiation logic yet, hope to do that soon.
First of all thank you for attempting to push this patch, because there
is really seems to be some disagreement which blocks
progress of this patch. Unfortunately first reviewer (Robbie Harwood)
think that my approach cause some layering violation
and should be done in other way. Robbie several times suggested me to
look "how the buffering in TLS or GSSAPI works" and do it in similar way.
Frankly speaking I do not see some principle differences differences.
As far as precise amount of data required for decompression algorithm to
produce some output is not known,
it is natural (from my point of view) to pass tx/rx functions to stream
compressor implementation to let it read or write data itself.
Also it allows to use streaming compression not only with libpq, but
with any other streaming data.
Right now we are reading data in two places - in frontend and backend.
Passing tx/rx function to compression stream implementation we avoid
code duplication.
In your implementation pair of zpq_read+zpq_read_drain is called twice -
in pqsecure_read and secure_read.
Moreover, please notice that your implementation is still passing
functions tx/rx functions to stream constructor and so zpq_read is still
able to read data itself.
So I do not understand which problem you have solved by replacing
zpq_read with pair of zpq_read_drain+zpq_read.
If we are speaking about layering, then from my point of view it is no a
good idea to let secure_read function perform decompression as well.
At least name will be confusing.
Answering your questions:
1. When decompressor has not enough data to produce any extra output, it
doesn't return error. It just not moving forward output position in the
buffer.
In my implementation (and actually in your's as well because you leave
this code), it is done in zpq_read function itself:
if (out.pos != 0)
{
/* If we have some decompressed data, then we return immediately */
zs->rx_total_raw += out.pos;
return out.pos;
}
if (zs->rx.pos == zs->rx.size)
{
zs->rx.pos = zs->rx.size = 0; /* Reset rx buffer */
}
/* Otherwise we try to fetch more data using rx function */
rc = zs->rx_func(zs->arg, (char*)zs->rx.src + zs->rx.size,
ZSTD_BUFFER_SIZE - zs->rx.size);
2. Code duplication is always bad, doesn't matter how much code is copied.
Frankly speaking I think that duplication of IO code between backend and
frontend is one of the most awful parts of Postgres.
It is always better to avoid duplication when it is possible.
3. Sorry, it was really a bug in 11 version of the patch, fixed in 12 patch.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company