> On Fri, Oct 25, 2024 at 3:34 AM Tim Düsterhus <t...@bastelstu.be> wrote:
> Apologies, I wanted to chime in before the vote started, but I was too
> busy.

I appreciate that you took the time to respond at all, so thank you.

> Persistent handles / resources / objects violate PHP’s shared-nothing
> request model, which is one fundamental part of how PHP works from my
> mental model. That's why I generally believe that persisting data across
> requests is generally unsafe, especially when used with stateful
> connections like a database connection. HTTP is generally stateless and
> I generally trust curl to do the right thing, which makes this somewhat
> safer, nevertheless the RFC would still be exposing a configurable (and
> thus stateful) object to an unknown number of future requests.

I am a strong believer of PHP's shared-nothing request model as well!
That said, I think it's great that applications can opt-in to
persistence where it benefits them. For example, Etsy, a monolithic
PHP web application, makes heavy use of persistence with memcached
connections - our application performance would not be the same
without it. Our application also makes heavy use of "fan-out" by using
curl to issue parallel downstream requests to assemble and render a
page, and persistence here with the curl share interface would again
benefit the design.

For what it's worth, one thing I like about persistence in PHP is that
while it's not exactly shared-nothing, it is still only shared by a
single worker. This means that you don't have to think about locking,
etc. like you may have to with other languages, which is great!

> I'm especially concerned, because the documentation for
> `curl_share_init()` uses `CURL_LOCK_DATA_COOKIE` as the example. I would
> also assume that sharing a cookie jar amongst several requests is the
> primary use case for leveraging a curl share handle, given that curl
> already performs in-request connection reuse when reusing a single
> CurlHandle or when performing requests with a CurlMultiHandle.

I don't believe that sharing a cookie jar is the primary use case; it
certainly wouldn't be for us. Our primary use case is DNS and
connection sharing, for the aforementioned fanout.

> Accidentally sharing a cookie jar for unrelated requests due to a badly
> chosen `$persistent_id` sounds like a vulnerability to is bound to
> happen to someone.

I'll admit that I don't have a good response to this, since while I
agree this is possible, I don't think it outweighs the benefit that
persistence gives to the applications that need it (again, we won't be
using the cookie jar). I see you've noted concerns about
documentation, would this be something you'd at least feel more
comfortable with if curl persistence documentation included a strong
warning?

> Your RFC then lists "DNS lookups" as one of the possible things that
> would be sped up. Caching DNS lookups is already a solved problem:
> Install a caching resolver on your server. These commonly come with
> features such as pre-fetching, ensuring that commonly-performed lookups
> will stay cached at all times.

Interestingly, one of the reasons that I began this work was that we
found a measurable performance impact of talking to the caching
resolver on localhost, let alone the connection overhead. Essentially
nearly all of our requests will subsequently turn around and issue
multiple downstream requests, and each one to a separate host will
involve DNS lookups and establishing connections. When I looked into
mitigating this, I found the "share" interface, and here we are.

So I agree that a caching resolver can reduce the performance cost of
doing DNS lookups anew on each request, but as it does not completely
eliminate it, nor does it address the cost of connections, there's
still some performance gains left on the table.

> As for sharing a TLS or HTTP connection,
> this might or might not lead to vulnerabilities similarly to cookie
> sharing, when TLS client certificates are used for authentication.

I believe curl itself handles this - it won't re-use a connection if
the settings aren't the same.

> Lastly there does not appear to be a way to close a persistent share
> handle either implicitly or explicitly, making it hard to reset the
> share handle to a well-defined state / to know what state the share
> handle is in.

I had originally asked this in the "open questions" of the RFC, but as
I received no response, I figured that we could address that at the
pull-request level; apologies if that was presumptuous. Máté Kocsis
asked this very question in the pull request and I expected to
re-raise it there. We do already have a curl_share_close method, so
the RFC would not need to add a new method, and since the method is
called curl_share_close, it's not really changing the meaning of the
method either.

> Also badly chosen `$persistent_id`s might result in a
> large number of handles accumulating, without any kind of garbage
> collection. For the existing persistent handles (e.g. for database
> connections), the ID is chosen by PHP itself, ensuring a somewhat
> predictable behavior.

I agree that this may be a risk, but it's not clear to me that this is
a risk that people would realistically run into, considering you'd
want to have a small number of persistent IDs in order to benefit from
re-using them. Dynamically picking your persistent IDs would lead to
less re-use.

> All in all, this RFC sounds like a foot-gun just waiting to go off and
> the RFC text is little more than a stub which does not inspire
> confidence that all the possible consequences have been fully
> thought-through.

I, of course, disagree that this is a "foot-gun just waiting to go
off". I would maybe agree if persistence was the default behavior, but
that is not what I am proposing - a developer must opt-in. From a
cursory GitHub search, curl_share_init is very rare, and not something
that developers are reaching for. Frankly, I see very little reason to
use it without persistence, considering that curl_multi_init does the
same thing, and is simpler to use.

Regarding the RFC text, I apologize for not filling it out more, but
it seemed straightforward to me: curl supports the share interface as
a primary feature, and there are no glaring warnings in the
documentation (https://curl.se/libcurl/c/libcurl-share.html)
indicating that you might hold the share interface incorrectly. As I
noted, I believe that TLS reuse is safe per the library itself. The
only thing new to PHP here is the persistence aspect, which I noted
would impact the SAPI by increasing memory usage "proportional to the
number of persistent curl share handles."

> After all the pros and cons would also need to be
> documented in PHP’s documentation and the RFC text would be a good
> source for the documentation team.

My plan was to help add the documentation (and tests) after this
passed; it was unclear to me that documentation may need to be sorted
out up-front in the RFC process. That said, I believe I can summarize
the remaining cons from your email:

- A developer could reuse a cookie jar with cookies from user A in a
subsequent request from user B to origin Z if they opted in to
persistence, used CURL_LOCK_DATA_COOKIE, and used the same persistent
handle for both users.
- A developer could balloon memory usage by using dynamic persistent
IDs, especially so if curl_share_close doesn't close them.

If I were to add notes about these both to the documentation, and
curl_share_close closed persistent share handles, would you change
your vote for this or a future RFC?

If not, and you're fundamentally opposed to persistent handles, I can
only urge you to reconsider based on the impact this would have on
applications that benefit from persistence. I'd like to share this one
piece of feedback from Matthieu Napoli
(https://github.com/php/php-src/pull/15603#issuecomment-2348583997):

> [...] I think this could have HUGE benefits to applications out there.
>
> Switching from PHP-FPM to a runtime like FrankenPHP, Laravel Octane 
> (Roadrunner/Swoole), or similar, usually gives the following benefits:
>
> - no longer need to boot the framework on every request (saves a significant 
> amount of time)
> - but also no longer need to open new HTTPS connections on every request
>
> The second part is much less talked about, but very important in terms of 
> performance. Opening an HTTPS request can sometimes take 100ms or more!
>
> I believe such a feature could be a performance game changer for the PHP 
> ecosystem, without having to rewrite apps so that they work in long-lived 
> processes (assuming libraries like Guzzle, etc. take advantage of it ofc).

At Etsy and likely other organizations, it would be infeasible to
switch to these frameworks as the effort required would be herculean.
Persistence in PHP allows applications like ours to benefit from
avoiding duplicative work, while still maintaining the fundamentals of
the shared-nothing runtime we're familiar with.

Thanks again for your response,
Eric

Reply via email to