Re: Extensibility of the PostgreSQL wire protocol

2021-02-19 Thread Damir Simunic


> On 11 Feb 2021, at 16:06, Tom Lane  wrote:
> 
> Maybe there is some useful thing that can be accomplished here, but we
> need to consider the bigger picture rather than believing (without proof)
> that a few hook variables will be enough to do anything.
> 
>   regards, tom lane
> 

Pluggable wire protocol is a game-changer on its own. 

The bigger picture is that a right protocol choice enables large-scale 
architectural simplifications for whole classes of production applications.

For browser-based applications (lob, saas, e-commerce), having the database 
server speak the browser protocol enables architectures without backend 
application code. This in turn leads to significant reductions of latency, 
complexity, and application development time. And it’s not just lack of backend 
code: one also profits from all the existing infrastructure like per-query 
compression/format choice, browser connection management, sse, multiple 
streams, prioritization, caching/cdns, etc.

Don’t know if you’d consider it as a proof, yet I am seeing 2x to 4x latency 
reduction in production applications from protocol conversion to http/2. My 
present solution is a simple connection pooler I built on top of Nginx 
transforming the tcp stream as it passes through.

In a recent case, letting the browser talk directly to the database allowed me 
to get rid of a ~100k-sloc .net backend and all the complexity and 
infrastructure that goes with coding/testing/deploying/maintaining it, while 
keeping all the positives: per-query compression/data conversion, querying 
multiple databases over a single connection, session cookies, etc. Deployment 
is trivial compared to what was before. Latency is down 2x-4x across the board.

Having some production experience with this approach, I can see how 
http/2-speaking Postgres would further reduce latency, processing cost, and 
time-to-interaction for applications.

A similar case can be made for IoT where one would want to plug an 
iot-optimized protocol. Again, most of the benefit is possible with a 
protocol-converting proxy, but there are additional non-trivial performance 
gains to be had if the database server speaks the right protocol.

While not the only use cases, I’d venture a guess these represent a sizable 
chunk of what Postgres is used for today, and will be used even more for, so 
the positive impact of a pluggable protocol would be significant.

--
Damir



Re: Extensibility of the PostgreSQL wire protocol

2021-02-19 Thread Damir Simunic


> On 19 Feb 2021, at 14:48, Heikki Linnakangas  wrote:
> 
> For example, there has been discussion elsewhere about integrating connection 
> pooling into the server itself. For that, you want to have a custom process 
> that listens for incoming connections, and launches backends independently of 
> the incoming connections. These hooks would not help with that.
> 

Not clear how the connection polling in the core is linked to discussing 
pluggable wire protocols. 

> Similarly, if you want to integrate a web server into the database server, 
> you probably also want some kind of connection pooling. A one-to-one 
> relationship between HTTP connections and backend processes doesn't seem nice.
> 

HTTP/2 is just a protocol, not unlike fe/be that has a one-to-one relationship 
to backend processes as it stands. It shuttles data back and forth in 
query/response exchanges, and happens to be used by web servers and web 
browsers, among other things. My mentioning of it was simply an example I can 
speak of from experience, as opposed to speculating. Could have brought up any 
other wire protocol if I had experience with it, say MQTT.

To make it clear, “a pluggable wire protocol” as discussed here is a set of 
rules that defines how data is transmitted: what the requests and responses 
are, and how is the data laid out on the wire, what to do in case of error, 
etc. Nothing to do with a web server; why would one want to integrate it in the 
database, anyway?

The intended contribution to the discussion of big picture of pluggable wire 
protocols is that there are significant use cases where the protocol choice is 
restricted on the client side, and allowing a pluggable wire protocol on the 
server side brings tangible benefits in performance and architectural 
simplification. That’s all. The rest were supporting facts that hopefully can 
also serve as a counterpoint to “pluggable wire protocol is primarily useful to 
make Postgres pretend to be Mysql."

Protocol conversion HTTP/2<—>FE/BE on the connection pooler already brings a 
lot of the mentioned benefits, and I’m satisfied with it. Beyond that I’m 
simply supporting the idea of  pluggable protocols as experience so far allows 
me to see advantages that might sound theoretical to someone who never tried 
this scenario in production.

Glad to offer a couple of examples where I see potential for performance gains 
for having such a wire protocol pluggable in the core. Let me know if you want 
me to elaborate.

> Querying multiple databases over a single connection is not possible with the 
> approach taken here. 

Indeed, querying multiple databases over a single connection is something you 
need a proxy for and a different client protocol from fe/be. No need to mix 
that with the talk about pluggable wire protocol. 

My mentioning of it was in the sense “a lot of LoB backend code is nothing more 
than a bloated protocol converter that happens to also allow connecting to 
multiple databases from a single client connection => letting the client speak 
to the database [trough a proxy in this case] removed the bloated source of 
latency but kept the advantages.”

--
Damir





Re: Extensibility of the PostgreSQL wire protocol

2021-02-19 Thread Damir Simunic


> On 19 Feb 2021, at 19:30, Jan Wieck  wrote:
> 
> An "extended" libpq protocol could allow the pool to give clients a unique 
> ID. The protocol handler would then maintain maps with the SQL of prepared 
> statements and what the client thinks their prepared statement name is. 

Or, the connection pooler could support a different wire protocol that has some 
form of client cookies and could let the client hold on to an opaque token to 
present back with every query and use that to route to the right backend with a 
prepared statement for that client (or match the appropriate cached p statement 
from the cache), even across client disconnections.

> Most of that would of course be possible on the pool side itself. But the 
> internal structure of pgbouncer isn't suitable for that. It is very 
> lightweight and for long SQL queries may never have the complete 'P' message 
> in memory. It would also not have direct access to security related 
> information like the search path, which would require extra round trips 
> between the pool and the backend to retrieve it.

> 
> So while not suitable to create a built in pool by itself, loadable wire 
> protocols can definitely help with connection pooling.

I think loadable wire protocols will have a positive effect on developing more 
sophisticated connection poolers.

> I also am not sure if building a connection pool into a background worker or 
> postmaster is a good idea to begin with. One of the important features of a 
> pool is to be able to suspend traffic and make the server completely idle to 
> for example be able to restart the postmaster without forcibly disconnecting 
> all clients.

Agreed. Going even further, a connection pooler supporting a protocol like quic 
(where the notion of connection is decoupled from the actual socket connection) 
could help a lot with balancing load between servers and data centers, which 
also would not be convenient for the actual Postgres to do with present 
architecture. (And here, too, a pluggable wire protocol would help with keeping 
tabs on individual backends).

--
Damir



Proposal: http2 wire format

2018-03-24 Thread Damir Simunic
Hello hackers,



I’d like to propose the implementation of new wire protocol using http2 
framing. 

It appears to me that http2 solves many of the issues on the TODO list under 
“Wire Protocol Changes / v4 Protocol,“ without any obvious downsides. 

The implementation I have in mind has zero impact on existing clients. No 
changes to the format of existing v3 protocol. The new protocol works through a 
few small additions to postmaster.c to intercept TLS requests, and the rest in 
new source files, linked through PQcommMethods.

I’d like to emphasize that this proposal is empathically NOT about “let’s 
handle REST in the database” or some such. It’s about upgrading the framing, 
where http2 offers many benefits: content negotiation, concurrent bidirectional 
streams, extensible frame types, metadata/data split into headers/trailers and 
data frames, flow control, etc. It’s at least as efficient as febe v3. A lot of 
research is going into it to make it even more efficient and latency friendly. 
The mechanisms it provides for content negotiation, (and with ALPN, protocol 
negotiation), offers us a future-friendly way to evolve without the burden of 
backward compatibility compromises.

Before writing this proposal, I set out to create a proof of concept. My goal 
for the PoC is to be able to connect to the server using an existing http2 
client and get json back:

curl -k https://localhost:5432/some_func \
--http2-prior-knowledge --tlsv1.2 \
-H 'pg-database: postgres' \
-H 'pg-user: web'  \
-H ‘authorization: ….’
-H ‘accept: application/json’

{ result: [ … ] }

After spending a week getting up to speed with C, libpq internals, http2 
standard, libnghttp2 interface, etc., I’m fairly convinced that pg/http2 is 
feasible.

Sadly, my experience with C and Postgres internals is non-existent, and I am 
not yet able to finalize a live demo. The above curl request does establish the 
connection, receives the settings frame and queries the database, but I’m still 
struggling with writing code to return the http2 response. At this stage, it’s 
purely an issue of mechanically writing the code, I think I solved how it all 
works in principle.

If anyone finds the idea of Postgres speaking http2 appealing, I’d welcome 
guidance/mentoring/coding help (or just plain taking over). I a put up a repo 
with the results so far and a longer writeup: https://github.com/dsimunic/pg_h2 

All changes I made to the codebase are in a single commit, hopefully easy to 
understand what is happening. You’ll need libnghttp2 and openssl 1.0.2 or newer 
to compile.

My hope is that this post leads to a conversation and gets a few people excited 
about the idea the way I am. Maybe even some of the GSoC students would take 
the implementation further?


Damir






Re: Proposal: http2 wire format

2018-03-25 Thread Damir Simunic
> On 25 Mar 2018, at 19:42, David Fetter  wrote:
> 
> On Sat, Mar 24, 2018 at 06:52:47PM +0100, Damir Simunic wrote:
>> Hello hackers,
>> 
>> I’d like to propose the implementation of new wire protocol using http2 
>> framing. 
> 
> Welcome to the PostgreSQL community!  This is a very interesting idea.
> Please send a patch to this mailing list on this thread.
> 

Thanks David, very excited to be part of pgsql-hackers!

> In order to get and keep it on the radar, you should know about how
> development works in PostgreSQL.
> 
> http://wiki.postgresql.org/wiki/Development_information
> 
> In particular, please look at: 
> http://wiki.postgresql.org/wiki/Submitting_a_Patch
> 

To put it out front: my forte is product design, not C coding. (Also, I made a 
grammar error in the opening sentence: I’m not proposing “the implementation”, 
but “implementing h2 as new wire proto”)

I did study all of the resources you mentioned. And am voraciously reading up 
on Postgres internals, scouring its source, practicing C development, etc. 

My email is the result of the first advice under “Brand new features” in “So 
you want to be a developer?”.

> I notice that you patched 10. New features, and this is definitely
> one, go against git master.
>  

Let me figure out how to do that pronto. 10.2 tarball was easier to learn from 
as it was not a moving target. Whatever I did so far is not yet patch-worthy.

>> It appears to me that http2 solves many of the issues on the TODO
>> list under “Wire Protocol Changes / v4 Protocol,“ without any
>> obvious downsides. 
> 
> Here are a few things to consider, at least from my perspective:
> 
> - Docs. Gotta have some: https://wiki.postgresql.org/wiki/Documentation_Tools

No worries about that—I love writing :)

> 
> - Testing. Gotta have some in src/test/regress in the source tree.

Before even getting to the patch stage, there will be a period of discussion 
about latency and other tradeoffs. Mandatory part of any conversation 
mentioning a wire protocol.

So the plan is to come up with a working prototype that we can plug into 
protocol testing tools and measure the heck out of it in context. Yet one more 
thing to figure out. BTW, are there any formal tests of that kind for v3 
protocol?

By that time I do hope to learn how to write code tests to put into 
src/test/regress.

> 
> - Tight coupling to OpenSSL, if that's actually what's happening.
>  We're actively trying to get away from this, so a TLS-neutral
>  implementation or at least one that's not specific to OpenSSL would
>  be good.

Didn’t know that. Will ifdef the openssl-dependent code. It’s not hard to 
implement ALPN nego to cover all viable libraries. Do you know what 
alternatives are being considered?

> 
> - Overhead for all clients. It may be tiny, but it needs to be
>  measured and that cost needs to be weighed against the benefits.
>  Maybe a cache miss in the context of a network connection is
>  negligible, but we do need to know.

Important point. If h2 is to be seriously considered, then it must be an 
improvement in absolutely every aspect. 

The core part of this proposal is that h2 is parallel to v3. Something one can 
opt into by compiling `--with_http2`. 

Even if h2 finds its way already into PG12, its likely that the existing 
installed base would elect not to compile it in as there are no immediate 
benefits to them. The first wave of users will be web-facing apps. They already 
pay the penalty of conversion to/from v3, so in those scenarios the switch will 
be a gain.

Then again, if h2 becomes the new v4, then libpq-fe will support for it, so we 
might find that the savings in one or two network round trips amply offset one 
byte socket peek, and everyone will eagerly upgrade. Who knows.

My PoC strategy is to touch existing code as little as possible. Yet if the 
ProcessStartupPacket can somehow return the consumed bytes back to the TLS lib 
for negotiation, then there’s zero cost to protocol detection for v2/v3 clients 
and only h2 clients pay the price of the extra check.

> 
> - Dependency on a new external library. Fortunately, it's MIT
>  licensed, so it's PostgreSQL compatible, but what happens if it
>  becomes unmaintained? This has happened a couple of times, and it
>  causes overhead that needs to be taken into account.

I chose nghttp because it gave me a quick start, it’s well designed, a good fit 
for this kind of work, and fortunately indeed, the license is compatible. 
(Also, curl links to it as well, so am pretty confident it’ll be around). Very 
possible that over time h2 parsing code migrates into pg codebase. There are so 
much similarities to v3 architecture, we might find a way to generalize both 
into a single codebase. Then h2 frame parser/state machine becomes only a 
handful of .c fil

Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic
Hi,

> On 26 Mar 2018, at 05:11, Craig Ringer  wrote:
> 
> On 26 March 2018 at 06:00, Damir Simunic  wrote:
>  
> > - Overhead for all clients. It may be tiny, but it needs to be
> >  measured and that cost needs to be weighed against the benefits.
> >  Maybe a cache miss in the context of a network connection is
> >  negligible, but we do need to know.
> 
> Important point. If h2 is to be seriously considered, then it must be an 
> improvement in absolutely every aspect.
> 
> The core part of this proposal is that h2 is parallel to v3. Something one 
> can opt into by compiling `--with_http2`.
> 
> IMO, a new protocol intended to supersede an old one must be a core, 
> non-optional feature. It won't reach critical mass of adoption if people 
> can't reasonably rely on it being there. There'll still be a multi-year lead 
> time as versions that support it become widespread enough to interest 
> non-libpq-based driver authors.

Agreed, it should be in core.

>  
> My PoC strategy is to touch existing code as little as possible. Yet if the 
> ProcessStartupPacket can somehow return the consumed bytes back to the TLS 
> lib for negotiation, then there’s zero cost to protocol detection for v2/v3 
> clients and only h2 clients pay the price of the extra check.
> 
> As others have noted, you'll want to find a way to handle this in the least 
> SSL-implementation-specific manner possible. IMO if it can't work with 
> OpenSSL, Windows's SSL implementation and OS X's SSL framework it's a 
> non-starter.

Understood.

Everyone that matters supports ALPN: 
https://en.wikipedia.org/wiki/Application-Layer_Protocol_Negotiation#Support

From the PoC standpoint, it’s now a straightforward chore to make sure it is 
supported for all possible build choices.

> 
> > - Dependency on a new external library. Fortunately, it's MIT
> >  licensed, so it's PostgreSQL compatible, but what happens if it
> >  becomes unmaintained? This has happened a couple of times, and it
> >  causes overhead that needs to be taken into account.
> 
> I chose nghttp because it gave me a quick start, it’s well designed, a good 
> fit for this kind of work, and fortunately indeed, the license is compatible. 
> (Also, curl links to it as well, so am pretty confident it’ll be around). 
> Very possible that over time h2 parsing code migrates into pg codebase. There 
> are so much similarities to v3 architecture, we might find a way to 
> generalize both into a single codebase. Then h2 frame parser/state machine 
> becomes only a handful of .c files.
> 
> h2 is a standard; however you decide to parse it, your code will eventually 
> converge to a stable state in the same manner that febe v3 code did. Once we 
> master the protocol, I don’t think there’ll be much need to touch the framing 
> code. IOW even if we just import what we need, it won’t be a big issue.
> 
> While I'm a big fan of code reuse and using existing libraries, I understand 
> others' hesitance here. Look at what happened with ossp-uuid; that was 
> painful and it was just a contrib.
> 
> It's a difficult balance between NIH and maintaining a stable core.

Enough important projects depend on libnghttp, I don’t think it will go away 
any time soon. And http2 is big; as more and more tools want to talk that 
protocol they’ll turn to libnghttp, so the signs of any troubles will be 
visible very very quickly.

>   
>  
> 
> * Is there merit in the idea of a completely new v4 protocol—one that freezes 
> the v3 and takes a new path?
> 
> Likely so... but it has to be pretty compelling IMO. And more importantly, 
> offer a smooth backwards- and forwards-compatible path.
>  
> 
> * What are the criteria for getting this into the core?
> 
> Mine would be: 
> 
> - No new/separate port required. Works on existing port.
> 
Check.

> - Doesn't break old clients connecting to new servers
> 
Check.

> - Doesn't break new clients connecting to old servers
> 

Old server sends “Invalid startup packet” and closes the connection; client’s 
TLS layer reports an error. Does that count as not breaking new clients? 

curl -v https://localhost:5432

...
* OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to localhost:5432
* stopped the pause stream!
* Closing connection 0
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 
localhost:5432

This applies to any TLS client (an h2-supporting libpq-fe will behave the same):
 
wget -v https://localhost:5432

Connecting to localhost|::1|:5432... connected.
Unable to establish SSL connection.


> - No extra round trips for new client -> old server . I don't personally care 
> about o

Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic
Hi,

> On 26 Mar 2018, at 06:47, Jacob Champion  wrote:
> 
> On Sun, Mar 25, 2018 at 8:11 PM, Craig Ringer  wrote:
>> As others have noted, you'll want to find a way to handle this in the least
>> SSL-implementation-specific manner possible. IMO if it can't work with
>> OpenSSL, Windows's SSL implementation and OS X's SSL framework it's a
>> non-starter.
> 
> +1.
> 
>> While I'm a big fan of code reuse and using existing libraries, I understand
>> others' hesitance here. Look at what happened with ossp-uuid; that was
>> painful and it was just a contrib.
>> 
>> It's a difficult balance between NIH and maintaining a stable core.
> 
> For whatever it's worth, I think libnghttp2 is an excellent choice for
> an HTTP/2 implementation, even when taking into account the risks of
> NIH. It's a well-designed library with mature clients (Curl and Apache
> HTTP Server, among others), and it's authored by an HTTP/2 expert. (If
> you're seriously considering HTTP/2, then you seriously need to avoid
> not-invented-here syndrome. Don't roll your own unless you're
> interested in becoming HTTP/2 protocol-layer security experts in
> addition to SQL security experts.)
> 
Agreed.

> As you move forward with the PoC, consider: even if you decide not to
> become protocol-layer experts, you'll still need to become familiar
> with application-layer security in HTTP.

Good point. Application layer security is indeed a concern. 

h2 has provisions for security by design, and a significant amount of research 
going into this on a large scale. Adopting h2 instead of inventing our own v4 
gets us all this research for free.


> You'll need to decide whether
> the HTTP browser/server security model -- which is notoriously
> unintuitive for many -- works well for Postgres. In particular, you'll
> want to make sure that the new protocol doesn't put your browser-based
> users in danger (I'm thinking primarily about cross-site request
> forgeries here). Always remember that one of a web browser's core use
> cases is the execution of untrusted code…

Mentioning h2 does bring browsers in mind, but this proposal is not concerned 
with that. (quick curl sketches are shown only because curl is an already 
available h2 client). Present web-facing designs already deal with browsers and 
API clients, there will be no change to that. Existing Postgres deployment and 
security practices must remain unchanged whether we use v3 or h2. Don’t think 
anyone would want to expose Postgres to the open web without a connection 
pooler in front of it.

When you say "browser/server model,” presumably you’re having http1 in mind. h2 
does not have much in common with http1 on the wire. In fact, h2 is 
architecturally closer to febe than http1. Both h2 and febe deal with multiple 
request/response pairs over a single connection. Server initiated requests are 
covered through push_promise frames, and logical replication (being more of a 
subscription thing in my mind) is covered through stream multiplexing.

Let's keep the discussion focused on the wire protocol: the sooner we can get 
to stable h2 framing in the core, the sooner we’ll be able to experiment with 
new use cases and possibilities. Only then it will make sense to bring back 
this discussion about browsers, content negotiation, etc.


Thanks,
Damir



> --Jacob




Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic

> On 26 Mar 2018, at 11:34, Craig Ringer  wrote:
> 
> On 26 March 2018 at 17:01, Damir Simunic  <mailto:damir.simu...@wa-research.ch>> wrote:
>  
> 
> > - Doesn't break new clients connecting to old servers
> >
> 
> Old server sends “Invalid startup packet” and closes the connection; client’s 
> TLS layer reports an error. Does that count as not breaking new clients?
> 
> 
> libpq would have to do something like it does now for ssl connections, 
> falling back to non-ssl, and offering a connection option to make it try the 
> v3 protocol immediately without bothering with v4.
>  
> > - No extra round trips for new client -> old server . I don't personally 
> > care about old client -> new server so much, but should be able to offer a 
> > pg_hba.conf option to ensure v3 proto only or otherwise prevent extra round 
> > trips in this case too.
> 
> Can we talk about this more, please?
> 
> As above. A newer libpq should not perform worse on an existing server than 
> an older libpq.

Wouldn’t newer libpq continue to support v3 as long as supported servers do? 
I’m confused with “no extra round trips” part and the “pg_hba.conf option". If 
I know I’m talking to the old server, I’ll just configure the client to talk 
febe v3 and not worry.

Anyway, I’ll document all the combinations to make it easier to discuss.

>  
>  
> 
> Check.
> 
> Extensibility is the essence of h2, we’re getting this for free.
> 
> 
> Please elaborate somewhat for people not already strongly familiar with HTTP2.
> 
> BTW, please stop saying "h2" when you mean HTTP2. It's really confusing, 
> because I keep thinking you are talking about H2, the database engine 
> (http://www.h2database.com/ <http://www.h2database.com/>), which has 
> PostgreSQL protocol and syntax compatibility as well as its own wire protocol.

Haha, I din’t know that! “h2” is the protocol identifier in the ALPN; in mind, 
http2 has more of the web and http1 baggage that I’m trying to avoid here. But 
let’s stick to http2 and define it better.

>  
> > - Has a wireshark dissector
> 
> Check.
> 
> ... including understanding of the PostgreSQL bits that are payload within 
> the protocol.
> 
> Look at what the current dissector does - capture some packets.
>  
> 
> >
> > - Is practical to implement in connection pooler proxies like pgbouncer, 
> > pgpool
> 
> Something I’m planning to look into and address.
> 
> New connection poolers might become feasible, too: nginx, nghttpx, etc. (for 
> non-web related scenarios as well). Opting into h2 lets us benefit from a 
> much larger amount of time and resources being spent on improving things that 
> matter. Reverse proxies face the same architectural challenges as pg-only 
> connection poolers do.
> 
> 
> ... which is nice, but doesn't change the fact that a protocol revision that 
> completely and unfixably breaks existing tools much of the community relies 
> on won't go far.
>  
> > - Any libraries used are widespread enough that they're present in at least 
> > RHEL7 and Debian Stable. We *can't* just bundle extras in our sources, and 
> > packagers are unlikely to be at all happy packaging an extra lib or 
> > backport for us. They'll probably just disable the new protocol.
> 
> Check.
> 
> Let me see if I can make a table showing parallel availability of Postgres 
> and libnghttp versions on mainstream platforms. If there are any gaps, I’m 
> sure it is possible to lobby for inclusion of libnghttp where it matters. I 
> see Debian has it for wheezy, jessie, and sid, while pg10 is on sid and 
> buster.
> 
> 
> Good plan. But be clear that this is super experimental.
>  
> >
> > - No regressions for support of SASL / SCRAM, GSSAPI, TLS with X.509 client 
> > certs, various other auth methods.
> >
> 
> Check.
> 
> Adding new auth method keyword (“h2”) in pg_hba will give us a clean code 
> path to work with.
> 
> I think you missed the point there entirely.
> 
> HTTP2 isn't an authentication method. It's a wire protocol. It will be 
> necessary to support authentication methods including, but not limited to, 
> GSSAPI, SSPI (windows), SCRAM, etc *on any new protocol*.
> 
> If you propose a new protocol, to replace the v3 protocol, and it doesn't 
> support SSPI or SCRAM I rate your chances as about zero of getting serious 
> interest. You'll be back in extension-for-webdevs town.
>  

Great points. I need to be more clear on that. My main concern was how to 
bypass the v3 auth negotiation that is closely linked to existing methods. From 
PoC perspective, I didn’t want 

Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic
> On 26 Mar 2018, at 12:47, Craig Ringer  wrote:
> 
> On 26 March 2018 at 17:34, Damir Simunic  wrote:
>  
> 
> > As you move forward with the PoC, consider: even if you decide not to
> > become protocol-layer experts, you'll still need to become familiar
> > with application-layer security in HTTP.
> 
> Good point. Application layer security is indeed a concern.
> 
> h2 has provisions for security by design, and a significant amount of 
> research going into this on a large scale. Adopting h2 instead of inventing 
> our own v4 gets us all this research for free.
> 
> HTTP2, please, not "h2".
> 
> It looks HTTP2 does use the term "h2" to mean "http2 over TLS", to 
> differentiate it from "h2c" which is HTTP2-over-cleartext.
> 
> IMO, you'd have to support both. Mandating TLS is going to be a non-starter 
> for sites that use loopback connections or virtual switches on VMs, VLAN 
> isolation, or other features to render traffic largely unsniffable. They 
> won't want to pay the price for crypto on all traffic. So this needs to be 
> "HTTP2 support" not "HTTP2/TLS (h2) support" anyway.

Makes sense; I’ll update all wording and function names, etc. No difference to 
the substance of this proposal. The same code path handles both h2 and h2c. TLS 
is optional, a matter of detecting the first byte of the request and taking the 
appropriate action. 

I think we can reliably and efficiently detect h2, h2c, and FEBE requests. Of 
course, the behavior needs to be configurable: which protocols to enable, and 
how to resolve the negotiation. In my mind this is self-evident.

> 
> Re Pg and security: By and large we don't invent our own security protocols. 
> We've adopted standard mechanisms like GSSAPI and SCRAM, and vendor ones like 
> SSPI. Some of the details of how they're implemented in the protocol are of 
> course protocol specific (and thus, opportunities for bugs/design mistakes), 
> of course.
> 
> But you will get _nowhere_ in making this a new default protocol if you just 
> try to treat those as outdated and uninteresting.
> 

Agreed: new default protocol must be covering 100% of existing use cases, _and_ 
add more compelling capabilities on top.

If anything I wrote made it appear contrary to that goal, it is purely because 
of my current focus on getting to a PoC. 

> In fact, part of extensibility considerations should be extensible 
> authentication.
> 
> Authentication and authorization (which any new protocol really should 
> separate) are crucial features, and there's no one-size-fits-all answer.
> 

I think that HTTP2 gets us much closer to that goal. My vision is to enable 
application-developer-defined authentication and/or authorization as well. This 
is something to research once the framing is in place.

> If you just assume, say, that everything happens over TLS with password auth 
> or x.509 client certs, you'll create a giant mess for all the sites that use 
> Kerberos or SSPI.
> 

100% agreed on everything you say, and thanks for taking the time to write this 
up. 

> 
> -- 
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services




Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic
> On 26 Mar 2018, at 11:13, Vladimir Sitnikov  
> wrote:
> 
> Damir> * What are the criteria for getting this into the core?
> Craig>Mine would be: 
> 
> +1
> 
> There's a relevant list as well: 
> https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md
>  
> 
>  
> 

This is a great addition to the list, thanks!

Damir



Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic
> On 26 Mar 2018, at 11:06, Vladimir Sitnikov  
> wrote:
> 
> Hi,
> 
> >If anyone finds the idea of Postgres speaking http2 appealing
> 
> HTTP/2 sounds interesting.
> What do you think of https://grpc.io/ ?
> 
> Have you evaluated it?
> It does sound like a ready RPC on top of HTTP/2 with support for lots of 
> languages.
> 
> The idea of reimplementing the protocol for multiple languages from scratch 
> does not sound too appealing.

This proposal takes the stance that having HTTP2 wire protocol in place will 
enable wide experimentation  with and implementation of many new features and 
content types, but is not concerned with the specifics of those.

---
Let me illustrate with an example how it would look if we already had HTTP2 as 
proposed.

Lets’ say you have a building automation device on your network that happens to 
speak grpc, and you decided to use Postgres to store published topics in the 
database. 

Your grpc-speaking device might connect to Postgres and issue a request like 
this:

HEADERS (flags = END_HEADERS)
:method = POST
:scheme = http
:path = /CreateTopic
pg-database = Publisher
content-type = application/grpc+proto
grpc-encoding = gzip
authorization = Bearer y235.wef315yfh138vh31hv93hv8h3v

DATA (flags = END_STREAM)


(This is from grpc.io homepage; uppercase HEADERS and DATA are frame names from 
the HTTP2 specification).

Postgres would take care of TLS negotiation, unpack the frames, decompress the 
headers (:method, :path, etc are transferred compressed with a lookup table) 
and copy the payload into memory and make it  all available to the backend. If 
this was the first request, it would start the backend for you as well.

Postgres doesn’t know about grpc, so it would just conveniently return "406 Not 
Supported” to your client and close the stream (but not the connection). Still 
connected and authenticated, the device could retry the request with 
`content-type: application/json`, and if you somehow programmed a function that 
accepts json, the request would go through. (Let’s imagine we have some kind of 
mechanism to associate functions to requests and content types, maybe through 
some function attributes in the catalog). 

Say that someone else took the time and programmed a plugin that knows how to 
talk grpc. Then the server would call that plugin for you, validate and insert 
the data in the right table, and return 200 OK or 204 or whatever is 
appropriate to return according to grpc protocol semantics. 

Obviously, someone has to implement a bunch of new code on the server side to 
ungzip, to interpret the content of the protobuf message and take action. But 
that someone doesn’t need to think of getting to all the metadata like 
compression type, payload format etc. Just somehow plug into the server at the 
right level read the data and metadata from memory, and then call into SPI to 
do its thing. Similar to how application servers work today. (Or Postgres for 
that matter, though it’s just it speaks FEBE and there’s no content type 
negotiation).

The same goes for the ‘authorization’ header. Postgres does not support Bearer 
token authorization today. But maybe you’ll be able to define a function that 
knows how to deal with the token, and somehow signal to Postgres that you want 
it to call this function when it sees such a header. Or maybe someone wrote a 
plugin that does that, and you configure your server to use it. 

Then when connecting to Postgres with the above request, it would start the 
backend and call the function/plugin for you to decide whether to authorize the 
request. (As a side note, subsequent requests within the same connection would 
have this header compressed on the wire; that’s also a HTTP2 feature).

---

That’s only one possible scenario, and not the only one. In this specific 
scenario, the benefit is that Postgres will give you content negotiation built 
in, and will talk to any HTTP2 conforming client. Like you said, you don’t want 
to reimplement the protocol over and over.

But whether that content is grpc or something else, that's for a future 
discussion. 

Current focus is really on getting the framing and extensibility in the core. 
Admittedly, haven’t yet figured out how to code all the details, but I’m more 
and more clear how this will work architecturally. Now it’s about putting lots 
of elbow grease into understanding the source, coding in C, and addressing all 
the issues that make sure the new protocol is 100% supporting all existing v3 
use cases. 

Beyond v3 use cases, top of my mind are improvements like you comment on in the 
topic “Binary transfer” in your “v4 wanted features” doc (and most of the other 
stuff you mention).


Damir


> 
> Vladimir




Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic

> On 26 Mar 2018, at 16:56, Tom Lane  wrote:
> 
> Damir Simunic  writes:
>>> On 26 Mar 2018, at 11:06, Vladimir Sitnikov  
>>> wrote:
>>>> If anyone finds the idea of Postgres speaking http2 appealing
> 
> TBH, this sounds like a proposal to expend a whole lot of work (much of it
> outside the core server, and thus not under our control) in order to get
> from a state of affairs where there are things we'd like to do but can't
> because of protocol compatibility worries, to a different state of affairs
> where there are things we'd like to do but can't because of protocol
> compatibility worries.  

What do you mean by compatibility worries? Is it backward compatibility?

If so, I’m not suggesting we get rid of FEBE, but leave it as is and complement 
it with a widely understood and supported protocol, that in fact takes 
compatibility way more seriously than FEBE. Just leave v3 frozen. Seems like 
ultimate backward compatibility, no? Or am I missing something?

You likely know every possible use case for Postgres, which makes you believe 
that the status quo is the right way. Or maybe I didn’t flesh out my proposal 
enough for you to give it a chance. Either way, I just can’t figure out where 
would HTTP2 be the same as status quo or a step backward compared to FEBE. I 
can see you’re super-busy and dedicated, but if you can find the time to 
enlighten me beyond just waving the “compatibility” and “engineering” banners, 
I’d appreciate you endlessly.

> Why would forcing our data into a protocol
> designed for a completely different purpose, and which we have no control
> over, be a step forward?  

What purpose do you see HTTP2 being designed for that is completely different 
from FEBE? Not being cynical, genuinely want to learn. (Oh, it’s my data, too; 
presently held hostage to the v3 protocol).

You mention twice loss of control--what exactly is the fear? 

> How would that address the fundamental issue of
> inertia in multiple chunks of software (ie, client libraries and
> applications as well as the server)?
> 

Is this inertia as in "our TODO list is years old and nobody’s doing anything 
about it"? If so, I posit here that using HTTP2 as the v4 protocol will lead to 
significant reduction of inertia. And that just because we’re talking HTTP2 and 
not some new obscure thing we invented.

The psychological and social aspects are not to be underestimated. 

>> This proposal takes the stance that having HTTP2 wire protocol in place will 
>> enable wide experimentation  with and implementation of many new features 
>> and content types, but is not concerned with the specifics of those.
> 
> That reads to me as pie in the sky, and uninformed by any engineering
> reality.  As an example, it's not the protocol's fault that database
> server processes are expensive to spin up; changing to a different
> protocol will do nothing to make them more lightweight.  We've thought
> about various ways to amortize that cost, but they tend to fall foul of
> the fact that sessions are associated with TCP connections, which we can't
> transparently remake or reattach to a different endpoint process.  HTTP2
> is not going to fix that, because it's still TCP based.  

That reads to me as uninformed engineering reality. Just because you are 
encumbered with the worries of compatibility and stuck in the world of TCP, 
doesn’t mean it can’t be done. 

You know what? HTTP2 just might fix it. Getting a new protocol into the core 
will force enough adjustments to the code to open the door for the next 
protocol on the horizon: QUIC, which happens to be UDP based, and might just be 
the ticket. At a minimum it will get significantly more people thinking about 
the possibility of reattaching sessions and doing all kinds of other things. 
Allowing multiple protocols is not very different from allowing a multitude of 
pl implementations.

Help me put HTTP2 in place, and I’ll bet you, within a few months someone will 
come up with a patch for QUIC. And then someone else will remember your 
paragraph above and say “hmm, let’s see…"

> I realize that
> webservers manage to have pretty lightweight sessions, but that's not a
> property of the protocol they use, it's a property of their internal
> architectures.  We can't get there without a massive rewrite of the PG
> server --- one that would be largely independent of any particular way of
> representing data on the wire, anyway.
> 

A smart outsider might come along, look at an ultra-fast web server, then look 
at Postgres and think, “Hmm, both speak HTTP2, but one is blazing fast, the 
other slow. Can I learn anything from the former to apply to the latter? Maybe 
I'll add another type of a backend that serves only a very very narrow use 
case, but makes it

Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic

> On 26 Mar 2018, at 18:19, Vladimir Sitnikov  
> wrote:
> 
> Tom>But starting from the assumption that HTTP2 solves our problems seems to 
> me to be "Here's a hammer.
> 
> Agree.

Funny you agree with that—for someone having the experience of writing a driver 
and having a long list of things that you find wrong and frustrating, one would 
expect you do look at how other protocols work, or at least consider that maybe 
the right way is to change something server side.

> 
> Just a side note: if v4 is ever invented I wish client language support
> is considered.
> It does take resources to implement message framing, and data parsing (e.g. 
> int, timestamp, struct, array, ...) for each language independently.

This is a strange statement about framing. Did you know that Go has HTTP2 
support in the standard library? And so does java, too? 
https://github.com/http2/http2-spec/wiki/Implementations 


The part I hinted at in the example but did not get the message across is that 
I’m advocating the best possible client language support. The right way is to 
stop writing drivers and format the data server side. Why parse some obscure 
Postgres-specific binary data types when you can have the database send you the 
data in the serialization format of your client language? Or JSON or protobuf 
or whatever you want. What if your application has data patterns that would 
benefit from being sent over the wire in some specific columnar format? 
Wouldn’t it be cool if you could add that to the server and have all clients 
just work, without being locked in into a language because of its driver?

My point is that you go in steps. Put the foot in the door first, enable 
experimentation and then you’ll get to where you want to be.


> 
> Vladimir 



Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic

> On 26 Mar 2018, at 15:42, Alvaro Hernandez  wrote:
> 
> 
> 
> On 26/03/18 13:11, Damir Simunic wrote:
>>> On 26 Mar 2018, at 11:13, Vladimir Sitnikov >> <mailto:sitnikov.vladi...@gmail.com>> wrote:
>>> 
>>> Damir> * What are the criteria for getting this into the core?
>>> Craig>Mine would be: 
>>> 
>>> +1
>>> 
>>> There's a relevant list as well: 
>>> https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md
>>>  
>>> <https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md>
>>>  
>>> 
>> 
>> This is a great addition to the list, thanks!
>> 
>> Damir
>> 
> 
> Hi Damir.
> 
> I'm interested in the idea. However, way before writing a PoC, IMVHO I'd 
> rather write a detailed document including:
> 
> - A brief summary of the main features of HTTP2 and why it might be a good 
> fit for PG (of course there's a lot of doc in the wild about HTTP/2, so just 
> a summary of the main relevant features and an analysis of how it may fit 
> Postgres).
> 
> - A more or less thorough description of how every feature in current 
> PostgreSQL protocol would be implemented on HTTP/2.
> 
> - Similar to the above, but applied to the v4 TODO feature list.
> 
> - A section for connection poolers, as  an auth, as these are very important 
> topics.
> 
> 
> Hope this helps,
> 
> Álvaro
> 

Álvaro, it does help, thanks. This discussion is to inform such a document. But 
the topic is such that having a good PoC will move the discussion further much 
faster. 

Can you help with thinking about how would HTTP2 impact connection poolers, I 
don’t know much about those?
> -- 
> 
> Alvaro Hernandez
> 
> 
> ---
> OnGres



Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic

> On 26 Mar 2018, at 18:09, Vladimir Sitnikov  
> wrote:
> 
> Damir>Postgres doesn’t know about grpc, s
> 
> I'm afraid you are missing the point.
> I would say PostgreSQL doesn't know about HTTP/2.
> It is the same as "PostgreSQL doesn't know about grpc".
> 
> Here's a quote from your pg_h2 repo:
> >What we need is to really build a request object and correctly extract
> > the full payload and parameters from the request. For example,
> >maybe we want to implement a QUERY method, similar to POST or PUT,
> > and pass the query text as the body of the request, with parameters
> > in the query string or in the headers
> 
> It basically suggests to implement own framing on top of HTTP/2.

Wouldn’t that be protocol semantics? Framing is already taken care of by the 
wire protocol.

> 
> When I say GRPC, I mean "implement PostgreSQL-specific protocol via GRPC 
> messages".
> 
> Let's take current message formats: 
> https://www.postgresql.org/docs/current/static/protocol-message-formats.html
> If one defines those message formats via GRPC, then GRPC would autogenerate 
> parsers and serializers for lots of languages "for free".
> 
> For instance
> Query (F)
>  Byte1('Q') Identifies the message as a simple query.
>  Int32 Length of message contents in bytes, including self.
>  String The query string itself.
> 
> can be defined via GPRC as
> message Query {
>   string queryText = 1;
> }
> 
> This is trivial to read, trivial to write, trivial to maintain, and it 
> automatically generates parsers/generators for lots of languages.
> 

I agree with you 100% here. But can you pull off grpc without HTTP2 framing in 
place? Would it be the only protocol supported? What if I wanted JSON or CSV 
returned, or just plain old Postgres v3 binary format, since I already have the 
parser written for it? Wouldn’t you need to first solve the problem of content 
negotiation?

HTTP2 proposal is pragmatically much smaller chunk, and it’s already hard to 
explain. Can you imagine the reaction and discussion if I came up with this?

In fact, if you ask yourself the question “how can I do something about the 
status quo of FEBE protocol that would be defensible in front of the Postgres 
community?” What would be your answer? 

> 
> Parsing of the current v3 protocol has to be reimplemented for each and every 
> language, and it would be pain to implement parsing for v4.
> Are you going to create "http/2" clients for Java, C#, Ruby, Swift, Dart, 
> etc, etc?
> 
> I am not saying that a mere redefinition of v3 messages as GRPC would do the 
> trick. I am saying that you'd better consider frameworks that would enable 
> transparent implementation of client libraries.
> 
> Damir>and will talk to any HTTP2 conforming client
> 
> I do not see where are you heading to.

Getting rid of having to write a framing parser in every client language?

> Is "curl as PostgreSQL client" one of the key objectives for you?

No, it’s just something that is available right now—the point is to demonstrate 
increased ability to get the data out, without having to write access code over 
and over, and then lug that whenever you install some data processing piece. 
Kind of the same motivation why you think grpc is it. I’m just proposing a 
layer under it that gets rid of a lot of pain.

> True clients (the ones that are used by the majority of applications) should 
> support things like "prepared statements", "data types", "cursors" (resultset 
> streaming), etc. I can hardly imagine a case when one would use "curl" and 
> operate with prepared statements.

Wouldn’t HTTP2 framing still allow prepared statements and cursors?

> I think psql is pretty good client, so I see no point in implementing HTTP/2 
> for a mere reason of using curl to fetch data from the DB.

> 
> Vladimir




Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic
Hi Andres,

> 
> At least I do *NOT* want many protocols in core. We've a hard enough
> time to keep up with integrating patches and maintenance to not just
> willy nilly integrate multiple new features with unclear lifetimes.

Admire your effort in applying all these patches—this commitfest thing looks 
frenetic now that I’m subscribed to the mailing list. Can only guess the effort 
required on the part of a few of you to study and triage everything. Respect.

Actually, I don’t advocate multiple protocols in core. But the exercise of 
considering one will help generalize the architecture enough to make all 
protocols pluggable. 

The most interesting part for me is working out content negotiation—I think 
being able to package data in new ways will be super-interesting.

> 
> *NONE* of the interesting problems are solved by HTTP2. You *still*
> need a full blown protocol ontop of it. So no, this doesn't change that.

If you had to nominate only one of those problems, which one would you consider 
the most interesting?


Thanks for chiming in, really appreciate your time,
Damir





Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic

> Currently it is implemented via different v3 messages (parse, bind, execute, 
> row description, data row, etc etc).
> 
> The claim is *any* implementation "on top of HTTP/2" would basically require 
> to implement those "parse, bind, execute, row data, etc" *messages*.

Why? Wouldn’t you be able to package that into a single request with query in 
the data frame and params as headers?

> Say you pick to use "/parse" url with SQL text in body instead of "parse 
> message". It does not make the whole thing "just HTTP/2". It just means 
> you've created "your own protocol on top of HTTP/2”.

It is new functionality, isn’t it? Of course you have to evolve protocol 
semantics for that. That’s the whole point! HTTP2 is just a nice substrate that 
comes with the way to negotiate capabilities and can separate the metadata from 
payload. Nothing revolutionary, but it lets you move forward without hurting 
existing applications. Isn’t that an upgrade from v3?

> 
> Clients would have to know the sequence of valid messages,
> clients would have to know if SQL should be present in body or in URL or in 
> form post data, etc, etc.
> 
> I believe Andres means exactly the same thing as he says
> 
> By the way: you cannot just "load balance" "parse/bind/exec" to different 
> backends, so the load balancer should be aware of meaning of those 
> "parse/bind/exec" messages. I believe that is one of the requirements Craig 
> meant by "Is practical to implement in connection pooler proxies”.

Why can’t I package this into a single request? Don’t modern web proxies deal 
with session affinity and stuff like that?

> 
> Andres>You *still* need a full blown protocol ontop of it. So no, this 
> doesn't change that
> 
> 
> Damir> Did you know that Go has HTTP2 support in the standard library? And so 
> does java, too?
> 
> Java has TCP implementation in the standard library.
> Does it help implementing v3 protocol?

It does. If Java only had IP, without TCP, would you be able to implement your 
driver? Yes, but you’d have to suffer longer.

> In the same way HTTP/2 "as a library" helps implementing v4. The problem is 
> it does not. Developer would have to somehow code the coding rules (e.g. 
> header names, body formats).
> HTTP/2 is just too low level.
> 

It’s just framing. But standard framing.

> 
> Damir>Why parse some obscure Postgres-specific binary data types when you can 
> have the database send you the data in the serialization format of your 
> client language?
> 
> From my own experience, automatic use of server-prepared statements (see 
> https://github.com/pgjdbc/pgjdbc/pull/319 
>  ) did cut end-user response times 
> of our business application in half.
> That is clients would have to know the way to use prepared statements in 
> order to get decent performance.
> If you agree with that, then "v3 parse message", "v3 bind message", "v3 
> execute message" is not that different from "HTTP/2 POST to /parse", "HTTP/2 
> POST to /bind", "HTTP/2 POST to /execute". It is still "obscure 
> PostgreSQL-specific HTTP/2 calls”.

What of having that in one single request?

> 
> Even if you disagree (really?) you would still have to know 
> PostgreSQL-specific way to encode SQL text and "number of rows returned" and 
> "wire formats for the columns" even for a single "HTTP POST 
> /just/execute/sql" kind of API. Even that is "a full blown protocol ontop of 
> HTTP2" (c) Andres.

What does your business app do with the data?



> 
> Vladimir



Re: Proposal: http2 wire format

2018-03-27 Thread Damir Simunic
> 
> 
> I'm rapidly losing interest. Unless this goes back toward the concrete and 
> practical I think it's going nowhere.


Your message is exactly what I was hoping for. Thanks for your guidance and 
support, really appreciate you. 

Let me now get busy and earn your continued interest and support. 


Damir
> 
> -- 
>  Craig Ringer   http://www.2ndQuadrant.com/ 
> 
>  PostgreSQL Development, 24x7 Support, Training & Services