Re: Feature Proposal: Connection Pool Optimization - Change the Connection User

2022-01-06 Thread Todd Hubers
Hi Everyone,

I have started working on this:

   - Benchmarking - increasingly more comprehensive benchmarking
   - Prototyping - to simulate the change of users (toggling back and forth)
   - Draft Implementation - of OPTION-1 (New Protocol Message)
   - (Then: Working with Odyssey and PgBouncer to add support (when the
   GRANT role privilege is available))

I hope to have a patch ready by the end of March.

Regards,

Todd

On Wed, 24 Nov 2021 at 02:46, Todd Hubers  wrote:

>
> Hi Jacob and Daniel,
>
> Thanks for your feedback.
>
> >@Daniel - I think thats conflating session_user and current_user, SET
> ROLE is not a login event. This is by design and discussed in the
> documentation..
>
> Agreed, I am using those terms loosely. I have updated option 4 in the
> proposal document. I have crossed it out. Option 5 is more suitable "SET
> SESSION AUTHORIZATION" for further consideration.
>
> >@Daniel - but it's important to remember that we need to cover the
> functionality in terms of *tests* first, performance benchmarking is
> another concern.
>
> For implementation absolutely, but not for a basic feasibility prototype.
> A quick non-secure non-reliable prototype is probably an important
> first-step to confirming which options work best for the stated goals.
> Importantly, if the improvement is only 5% (whatever that might mean), then
> the project is probably not work starting. But I do expect that a benchmark
> will prove benefits that justify the resources to build the feature(s).
>
> >@Jacob - A more modern approach might be to attach the authentication to
> the packet itself (e.g. cryptographically, with a MAC), if the goal is to
> enable per-statement authentication anyway. In theory that turns the
> middleware into a message passer instead of a confusable deputy. But it
> requires more complicated setup between the client and server.
>
> I did consider this, but I ruled it out. I have now added it to the
> proposal document, and included two Issues. Please review and let me know
> whether I might be mistaken.
>
> >@Jacob - Having protocol-level tests for bytes on the wire would not only
> help proposals like this but also get coverage for a huge number of edge
> cases. Magnus has added src/test/protocol for the server, written in Perl,
> in his PROXY proposal. And I've added a protocol suite for both the client
> and server, written in Python/pytest, in my OAuth proof of concept. I think
> something is badly needed in this area.
>
> Thanks for highlighting this emerging work. I have noted this in the
> proposal in the Next Steps section.
>
> --Todd
>
> Note: Here is the proposal document link again -
> https://docs.google.com/document/d/1u6mVKEHfKtR80UrMLNYrp5D6cCSW1_arcTaZ9HcAKlw/edit#
>
> On Tue, 23 Nov 2021 at 12:12, Jacob Champion  wrote:
>
>> On Sat, 2021-11-20 at 16:16 -0500, Tom Lane wrote:
>> > One more point is that the proposed business about
>> >
>> > * ImpersonateDatabaseUser will either succeed silently (0-RTT), or
>> >   fail. Upon failure, no further commands will be processed until
>> >   ImpersonateDatabaseUser succeeds.
>> >
>> > seems to require adding a huge amount of complication on the server
>> side,
>> > and complication in the protocol spec itself, to save a rather minimal
>> > amount of complication in the middleware.  Why can't we just say that
>> > a failed "impersonate" command leaves the session in the same state
>> > as before, and it's up to the pooler to do something about it?  We are
>> > in any case trusting the pooler not to send commands from user A to
>> > a session logged in as user B.
>>
>> When combined with the 0-RTT goal, I think a silent ignore would just
>> invite more security problems. Todd is effectively proposing packet
>> pipelining, so the pipeline has to fail shut.
>>
>> A more modern approach might be to attach the authentication to the
>> packet itself (e.g. cryptographically, with a MAC), if the goal is to
>> enable per-statement authentication anyway. In theory that turns the
>> middleware into a message passer instead of a confusable deputy. But it
>> requires more complicated setup between the client and server.
>>
>> > PS: I wonder how we test such a feature meaningfully without
>> > incorporating a pooler right into the Postgres tree.  I don't
>> > want to do that, for sure.
>>
>> Having protocol-level tests for bytes on the wire would not only help
>> proposals like this but also get coverage for a huge number of edge
>> cases. Magnus has added src/test/protocol for the server, written in
>> Perl, in his PROXY proposal. And I've added a protocol suite for both
>> the client and server, written in Python/pytest, in my OAuth proof of
>> concept. I think something is badly needed in this area.
>>
>> --Jacob
>>
>
>
> --
> --
> Todd Hubers
>


-- 
--
Todd Hubers


Feature Proposal: Connection Pool Optimization - Change the Connection User

2021-11-20 Thread Todd Hubers
Hi,

I have just joined to start a community consultation process for a
proposal. I just finished the proposal document, I spent time writing a
Problem and Solution section, and I have done quite a bit of upfront
exploration of the code.

See:

   - Google Document with Commenting turned on
   
https://docs.google.com/document/d/1u6mVKEHfKtR80UrMLNYrp5D6cCSW1_arcTaZ9HcAKlw/edit?usp=sharing.
   Feel free to request Edit access.
   - The current PDF version is attached too for archive purposes

I am very keen to make this happen.

Thanks

-- 
--
Todd Hubers


PostgreSQL Change User feature_20211121.pdf
Description: Adobe PDF document


Re: Feature Proposal: Connection Pool Optimization - Change the Connection User

2021-11-20 Thread Todd Hubers
ameter values that apply to the target user, and getting rid of such
settings that applied to the original user ID?"

I think you are suggesting to modify the already-logged-in user. That's
interesting. However, *many systems audit the logged in user by username* -
who they are. Furthermore, the modification of user privileges would not
help to enable connection pooling as rehashed in my answer to [8].

*10) Tom said:* "How would this interact with the "on login" triggers that
people keep asking for?

That's a good point. I would imagine that SET ROLE (which is currently
unsuitable) would have the same requirement. The answer is *Shared
Functions*. SET ROLE calls a function like "*SetSessionUserId*". Our
implementation should call the same function(s). If OnLogin functionality
is implemented they should trigger from there.

*11) Tom said:* "Also, you'd still have to do DISCARD ALL (at least) when
switching users, or else integrate all that cache-flushing right into the
switching mechanism.  So *I'm not entirely convinced about* how big
the *performance
benefits *are compared to a fresh session."

Agreed, nor am I convinced. We should only be guided by benchmarks and
tests, not subjective assumptions. *See the Benchmarking section* in the
document for details.

*12 Tom said:* "...[Upon failure, no further commands will be processed
until ImpersonateDatabaseUser succeeds.] *seems to require adding a huge
amount of complication on the server side*, and complication in the
protocol spec itself, to save a rather minimal amount of complication in
the middleware.  Why can't we just say that a failed "impersonate" command
leaves the session in the same state as before, and it's up to the pooler
to do something about it?  We are in any case trusting the pooler not to
send commands from user A to a session logged in as user B. We are in any
case trusting the pooler not to send commands from user A to a session
logged in as user B."

*I think you are overstating the complexity*. It only requires a
LastUserSwitchFailed boolean which is cleared to false when a UserSwitch
succeeds. If LastUserSwitchFailed is true, tcop ignores the messages and
sends back errors. This detail has been added to the proposal document.

*It's important that the implementation is objectively faster*. The 0-RTT
design is proposed for efficiency. The Middleware might be able to fit BOTH
the UserSwitch AND the Query within a 1500 MTU. If not, it shouldn't wait
for a confirmation - for efficiency. The middleware might be on localhost,
or it might be 1-5ms away on the LAN. Effectively, the UserSwitch is a sort
of "Header" before a series of commands. Performance is the goal.
Therefore, the connection cannot be left in the same state as before, or
else the pending Query will run in the incorrect context. This is a rare
failure mode, so failure is ideal.

Ultimately these are assumptions and *benchmarking results should drive
decisions* around the implementation of every aspect.

*13. Tom said:* "I wonder how we test such a feature meaningfully *without
incorporating a pooler right into the Postgres tree*."

*We can benchmark without a pooler* - see the Benchmark section for details
*.* (Furthermore, I propose that general benchmark tooling does belong in
Postgres for the benefit of the ecosystem of connection poolers. I have
included such a remark in the Benchmarking section "PostgreSQL is not
planning to incorporate Connection Pooling...".)

Thanks again everyone for the tough questions and the ideas!

Regards,

Todd

On Sun, 21 Nov 2021 at 08:16, Tom Lane  wrote:

> Justin Pryzby  writes:
> > On Sun, Nov 21, 2021 at 03:11:03AM +1100, Todd Hubers wrote:
> >> - Google Document with Commenting turned on
> >>
> https://docs.google.com/document/d/1u6mVKEHfKtR80UrMLNYrp5D6cCSW1_arcTaZ9HcAKlw/edit?usp=sharing
> .
>
> > You proposed a PQ protocol version of SET ROLE/SET SESSION authorization.
> > You'd need to make sure that a client connected to the connection pooler
> cannot
> > itself run the PQ "set role".
>
> It's not really apparent to me how this mechanism wouldn't be a giant
> security hole.  In particular, I see no meaningful difference between
> granting the proposed "impersonate" privilege and granting superuser.
> You could restrict it to not allow impersonating superusers, which'd
> make it a little better, but what of all the predefined roles we keep
> inventing that have privileges we don't want to be accessible to Joe
> User?  I think by the time you got to something where "impersonate"
> wasn't a security hole, it would be pretty much equivalent to SET ROLE,
> ie you could only impersonate users you'd been specifically given the
> privlege for.
>
> It a

Re: [RFC] ASOF Join

2021-11-20 Thread Todd Hubers
t; > hash table indexed by the equi keys. The algorithm is as follows:
> >
> > a. For a given left row, advance the right table until right timestamp
> > > left timestamp.
> >
> > b. While we advance the right table, put each right hand row into the
> > hash table indexed by the equi keys. Overwrite the previous row with
> > the same keys, if there was any.
> >
> > c. We have finished advancing the right table. The hash table now
> > contains the most recent right hand row for every value of equi-keys.
> > Most recent because the right hand table is sorted by (timestamp).
> >
> > d. For the left row, look up a right row that matches it by the equi
> > keys in the hash table. This is the right hand row that matches the
> > ASOF join conditions (equi-keys are equal, left timestamp >= right
> > timestamp, right timestamp is maximal for the given equi-keys). Output
> > the result.
> >
> > e. Go to the next left row. The left table is also sorted on
> > (timestamp), so we won't need to rewind the right table, only to
> > advance it forward.
> >
> > Given the sorted input paths, this algorithm is linear time in size of
> > the tables. A drawback of this algorithm is that it requires memory
> > proportional to the cardinality of the equi-columns. A possible
> > optimization is to split the equi-key hash table into hot and cold
> > parts by LRU, and dump the cold part to disk. This would help if each
> > equi-key only occurs for a small period of time.
> >
> >
> > 4. Nested Loop
> >
> > An efficient nested loop plan has to have a fast right-side subplan,
> > such as an index lookup. Unfortunately, there seems to be no way to
> > efficiently perform a last-point lookup for given equi-keys, if we
> > have separate btree indexes on timestamp and equi-keys. The nested
> > loop plan could work if we have a (timestamp, equi-keys) btree index.
> >
> >
> > Prototype Implementation
> >
> > For a prototype, I'd go with #3 "merge-something with a hash table of
> > most recent rows for equi-keys", because it works for big tables and
> > can reuse the physical data ordering.
> >
> >
> > I'll be glad to hear your thoughts on this.
> >
> >
> > --
> > Alexander Kuzmenkov
> > Timescale
> >
>
>
>

-- 
--
Todd Hubers


Re: Feature Proposal: Connection Pool Optimization - Change the Connection User

2021-11-23 Thread Todd Hubers
Hi Jacob and Daniel,

Thanks for your feedback.

>@Daniel - I think thats conflating session_user and current_user, SET ROLE
is not a login event. This is by design and discussed in the documentation..

Agreed, I am using those terms loosely. I have updated option 4 in the
proposal document. I have crossed it out. Option 5 is more suitable "SET
SESSION AUTHORIZATION" for further consideration.

>@Daniel - but it's important to remember that we need to cover the
functionality in terms of *tests* first, performance benchmarking is
another concern.

For implementation absolutely, but not for a basic feasibility prototype. A
quick non-secure non-reliable prototype is probably an important first-step
to confirming which options work best for the stated goals. Importantly, if
the improvement is only 5% (whatever that might mean), then the project is
probably not work starting. But I do expect that a benchmark will prove
benefits that justify the resources to build the feature(s).

>@Jacob - A more modern approach might be to attach the authentication to
the packet itself (e.g. cryptographically, with a MAC), if the goal is to
enable per-statement authentication anyway. In theory that turns the
middleware into a message passer instead of a confusable deputy. But it
requires more complicated setup between the client and server.

I did consider this, but I ruled it out. I have now added it to the
proposal document, and included two Issues. Please review and let me know
whether I might be mistaken.

>@Jacob - Having protocol-level tests for bytes on the wire would not only
help proposals like this but also get coverage for a huge number of edge
cases. Magnus has added src/test/protocol for the server, written in Perl,
in his PROXY proposal. And I've added a protocol suite for both the client
and server, written in Python/pytest, in my OAuth proof of concept. I think
something is badly needed in this area.

Thanks for highlighting this emerging work. I have noted this in the
proposal in the Next Steps section.

--Todd

Note: Here is the proposal document link again -
https://docs.google.com/document/d/1u6mVKEHfKtR80UrMLNYrp5D6cCSW1_arcTaZ9HcAKlw/edit#

On Tue, 23 Nov 2021 at 12:12, Jacob Champion  wrote:

> On Sat, 2021-11-20 at 16:16 -0500, Tom Lane wrote:
> > One more point is that the proposed business about
> >
> > * ImpersonateDatabaseUser will either succeed silently (0-RTT), or
> >   fail. Upon failure, no further commands will be processed until
> >   ImpersonateDatabaseUser succeeds.
> >
> > seems to require adding a huge amount of complication on the server side,
> > and complication in the protocol spec itself, to save a rather minimal
> > amount of complication in the middleware.  Why can't we just say that
> > a failed "impersonate" command leaves the session in the same state
> > as before, and it's up to the pooler to do something about it?  We are
> > in any case trusting the pooler not to send commands from user A to
> > a session logged in as user B.
>
> When combined with the 0-RTT goal, I think a silent ignore would just
> invite more security problems. Todd is effectively proposing packet
> pipelining, so the pipeline has to fail shut.
>
> A more modern approach might be to attach the authentication to the
> packet itself (e.g. cryptographically, with a MAC), if the goal is to
> enable per-statement authentication anyway. In theory that turns the
> middleware into a message passer instead of a confusable deputy. But it
> requires more complicated setup between the client and server.
>
> > PS: I wonder how we test such a feature meaningfully without
> > incorporating a pooler right into the Postgres tree.  I don't
> > want to do that, for sure.
>
> Having protocol-level tests for bytes on the wire would not only help
> proposals like this but also get coverage for a huge number of edge
> cases. Magnus has added src/test/protocol for the server, written in
> Perl, in his PROXY proposal. And I've added a protocol suite for both
> the client and server, written in Python/pytest, in my OAuth proof of
> concept. I think something is badly needed in this area.
>
> --Jacob
>


-- 
--
Todd Hubers


Re: Feature Proposal: Connection Pool Optimization - Change the Connection User

2022-02-01 Thread Todd Hubers
Hi Everyone,

Benchmarking work has commenced, and is ongoing.

   - *OPTIONS 5/6/7* - `SET SESSION AUTHORIZATION` takes double the time of
   a single separate SimpleQuery. This is to be expected, because double the
   amount of SimpleQuery messages are being sent, and that requires a full
   SimpleQuery/Result/Ready cycle. If there is significant latency between a
   Connection Pooler and the database, this delay is amplified. It would be
   possible to concatenate text into a single SimpleQuery. In the real world,
   the performance impact MAY be negligible.
   - *OPTION 0* - The time to reconnect (start a new connection from
   scratch with a different username/password) was found to be faster than
   using `SET SESSION AUTHORIZATION`.
   - *OPTION 1* - My team is continuing to explore a distinct Impersonate
   message (Option-1). We are completing a prototype-quality implementation,
   and then benchmarking it. Given that Option-1 is asynchronous (Request and
   expect to succeed) and it can even be included within the same TCP packet
   as the SimpleQuery (at times), we expect the performance will be better
   than restarting a connection, and not impacted by links of higher latency.

I will be recording benchmark results in the document:
https://docs.google.com/document/d/1u6mVKEHfKtR80UrMLNYrp5D6cCSW1_arcTaZ9HcAKlw/edit#
after completion of the OPTION-1 prototype and benchmarking of that
prototype.

Note: In order to accommodate something like OPTION-8, an Impersonation
message might have a flag (valid for 1x SimpleQuery only, then
automatically restore back to the last user).

Regards,

Todd


On Fri, 7 Jan 2022 at 10:55, Todd Hubers  wrote:

> Hi Everyone,
>
> I have started working on this:
>
>- Benchmarking - increasingly more comprehensive benchmarking
>- Prototyping - to simulate the change of users (toggling back and
>forth)
>- Draft Implementation - of OPTION-1 (New Protocol Message)
>- (Then: Working with Odyssey and PgBouncer to add support (when the
>GRANT role privilege is available))
>
> I hope to have a patch ready by the end of March.
>
> Regards,
>
> Todd
>
> On Wed, 24 Nov 2021 at 02:46, Todd Hubers  wrote:
>
>>
>> Hi Jacob and Daniel,
>>
>> Thanks for your feedback.
>>
>> >@Daniel - I think thats conflating session_user and current_user, SET
>> ROLE is not a login event. This is by design and discussed in the
>> documentation..
>>
>> Agreed, I am using those terms loosely. I have updated option 4 in the
>> proposal document. I have crossed it out. Option 5 is more suitable "SET
>> SESSION AUTHORIZATION" for further consideration.
>>
>> >@Daniel - but it's important to remember that we need to cover the
>> functionality in terms of *tests* first, performance benchmarking is
>> another concern.
>>
>> For implementation absolutely, but not for a basic feasibility prototype.
>> A quick non-secure non-reliable prototype is probably an important
>> first-step to confirming which options work best for the stated goals.
>> Importantly, if the improvement is only 5% (whatever that might mean), then
>> the project is probably not work starting. But I do expect that a benchmark
>> will prove benefits that justify the resources to build the feature(s).
>>
>> >@Jacob - A more modern approach might be to attach the authentication to
>> the packet itself (e.g. cryptographically, with a MAC), if the goal is to
>> enable per-statement authentication anyway. In theory that turns the
>> middleware into a message passer instead of a confusable deputy. But it
>> requires more complicated setup between the client and server.
>>
>> I did consider this, but I ruled it out. I have now added it to the
>> proposal document, and included two Issues. Please review and let me know
>> whether I might be mistaken.
>>
>> >@Jacob - Having protocol-level tests for bytes on the wire would not
>> only help proposals like this but also get coverage for a huge number of
>> edge cases. Magnus has added src/test/protocol for the server, written in
>> Perl, in his PROXY proposal. And I've added a protocol suite for both the
>> client and server, written in Python/pytest, in my OAuth proof of concept.
>> I think something is badly needed in this area.
>>
>> Thanks for highlighting this emerging work. I have noted this in the
>> proposal in the Next Steps section.
>>
>> --Todd
>>
>> Note: Here is the proposal document link again -
>> https://docs.google.com/document/d/1u6mVKEHfKtR80UrMLNYrp5D6cCSW1_arcTaZ9HcAKlw/edit#
>>
>> On Tue, 23 Nov 2021 at 12:12, Jacob Champion 
>> wrote:
>>
>>> On Sat, 20

Re: Feature Proposal: Connection Pool Optimization - Change the Connection User

2022-06-22 Thread Todd Hubers
Hi Everyone,

Here is a progress update. I have an established team of 2 fulltime systems
programmers who have been working on this area for a couple of months now.

   - Past
   - *Impersonation* - a prototype has been completed for Option-1
  "Impersonation"
  - *Benchmarking* - has been completed on a range of options,
  including Option-0 "Reconnection".
  - Both Impersonation and the Benchmarking is currently on the
  backburner
   - Current:* Notification Concentrator* - This is not PostgreSQL Codebase
   work. This project makes NOTIFY/LISTEN work in Odyssey (and others) while
   in Transaction mode. (Until now, NOTIFY/LISTEN can only work in SESSION
   mode). We intend to also build patches for PgBouncer, and other popular
   Connection Pool systems.
   - It works in Session mode, but my organisation needs it to work in
  Transaction mode. It works by intercepting LISTEN/UNLISTEN in SQL and
  redirecting them to a single shared connection. There will be a Pub/Sub
  system within Odyssey. The LISTEN/UNLISTEN is only sent for the first
  subscriber or last unsubscriber accordingly. The NOTIFICATION
messages are
  then dispatched to the Subscriber list. At most only one SESSION
connection
  is required.
  - Next:
   - *Update Benchmarking:* I then expect to update Benchmarks with a range
  of prototype solutions, with both Impersonation and Notification
  Concentrator for final review.
  - *Publishing Benchmarking*: I will send our results here, and offer
  a patch for such benchmarking code.
  - *Final Implementation:* The team will finalise code for
  production-grade implementations, and tests
  - *Patches:* Then my team will submit a patch for PostgreSQL, Odyssey
  and others; working to polish anything else that might be required of us.

Todd

On Wed, 2 Feb 2022 at 10:56, Todd Hubers  wrote:

> Hi Everyone,
>
> Benchmarking work has commenced, and is ongoing.
>
>- *OPTIONS 5/6/7* - `SET SESSION AUTHORIZATION` takes double the time
>of a single separate SimpleQuery. This is to be expected, because double
>the amount of SimpleQuery messages are being sent, and that requires a full
>SimpleQuery/Result/Ready cycle. If there is significant latency between a
>Connection Pooler and the database, this delay is amplified. It would be
>possible to concatenate text into a single SimpleQuery. In the real world,
>the performance impact MAY be negligible.
>- *OPTION 0* - The time to reconnect (start a new connection from
>scratch with a different username/password) was found to be faster than
>using `SET SESSION AUTHORIZATION`.
>- *OPTION 1* - My team is continuing to explore a distinct Impersonate
>message (Option-1). We are completing a prototype-quality implementation,
>and then benchmarking it. Given that Option-1 is asynchronous (Request and
>expect to succeed) and it can even be included within the same TCP packet
>as the SimpleQuery (at times), we expect the performance will be better
>than restarting a connection, and not impacted by links of higher latency.
>
> I will be recording benchmark results in the document:
> https://docs.google.com/document/d/1u6mVKEHfKtR80UrMLNYrp5D6cCSW1_arcTaZ9HcAKlw/edit#
> after completion of the OPTION-1 prototype and benchmarking of that
> prototype.
>
> Note: In order to accommodate something like OPTION-8, an Impersonation
> message might have a flag (valid for 1x SimpleQuery only, then
> automatically restore back to the last user).
>
> Regards,
>
> Todd
>
>
> On Fri, 7 Jan 2022 at 10:55, Todd Hubers  wrote:
>
>> Hi Everyone,
>>
>> I have started working on this:
>>
>>- Benchmarking - increasingly more comprehensive benchmarking
>>- Prototyping - to simulate the change of users (toggling back and
>>forth)
>>- Draft Implementation - of OPTION-1 (New Protocol Message)
>>- (Then: Working with Odyssey and PgBouncer to add support (when the
>>GRANT role privilege is available))
>>
>> I hope to have a patch ready by the end of March.
>>
>> Regards,
>>
>> Todd
>>
>> On Wed, 24 Nov 2021 at 02:46, Todd Hubers  wrote:
>>
>>>
>>> Hi Jacob and Daniel,
>>>
>>> Thanks for your feedback.
>>>
>>> >@Daniel - I think thats conflating session_user and current_user, SET
>>> ROLE is not a login event. This is by design and discussed in the
>>> documentation..
>>>
>>> Agreed, I am using those terms loosely. I have updated option 4 in the
>>> proposal document. I have crossed it out. Option 5 is more suitable "SET
>>> SESSION AUTHORIZATION" for further consid