Re: [PHP-DEV] PHP True Async RFC

Larry Garfield Wed, 05 Mar 2025 10:41:58 -0800

On Wed, Mar 5, 2025, at 3:37 AM, Edmond Dantes wrote:
> Good day, Larry.
>
>> First off, as others have said, thank you for a thorough and detailed 
>> proposal.
> Thanks!
>
>> * A series of free-standing functions.
>> * That only work if the scheduler is active.
>> * The scheduler being active is a run-once global flag.
>> * So code that uses those functions is only useful based on a global state 
>> not present in that function.
>> * And a host of other seemingly low-level objects that have a myriad of 
>> methods on them that do, um, stuff.
>> * Oh, and a lot of static methods, too, instead of free-standing functions.
>
> Suppose these shortcomings don’t exist, and we have implemented the 
> boldest scenario imaginable. We introduce Structured Concurrency, 
> remove low-level elements, and possibly even get rid of `Future`. Of 
> course, there are no functions like `startScheduler` or anything like 
> that.
>
>  1. In this case, how should PHP handle `Fiber` and all the behavior 
> associated with it? Should `Fiber` be declared deprecated and removed 
> from the language? What should the flow be?


I'm not sure yet.  I was quite hesitant about Fibers when they went in because 
they were so low-level, but the authors were confident that it was enough for a 
user-space toolchain to be iterated on quickly that everyone could use.  That 
clearly didn't pan out as intended (Revolt exists, but usage of it is still 
rare), so here we are with a half-finished API.

Thinking aloud, perhaps we could cause `new Fiber` to create an automatic async 
block?  Or we do deprecate it and discourage its use.  Something to think 
through, certainly.

>  2. What should be done with I/O functions? Should they remain 
> blocking, with a separate API provided as an extension?

The fact that IO functions become transparently async when appropriate is the 
best part of the current RFC.  Please keep that. :-)

>  3. Would it be possible to convince the maintainers of XDEBUG and 
> other extensions to rewrite their code to support the new model? ( *If 
> you're reading this question now, please share your opinion.* )

I cannot speak for Derick.

>  4. If transparent concurrency is introduced for I/O in point 2, what 
> should be done with `Revolt` + `AMPHP`? This would break their code. 
> Should an additional function or option be introduced to switch PHP 
> into "legacy mode"?

Also an excellent question, to which I do not yet have an answer.  (See 
previous point about Fibers being half-complete.)  I would want to involve 
Aaron, Christian, and Ces-Jan before trying to make any suggestions here.


> Structured concurrency is a great thing. However, I’d like to avoid 
> changing the language syntax and make something closer to Go’s 
> semantics. I’ll think about it and add this idea to my TODO.

Well, as noted in the article, structured concurrency done right means *not* 
having unstructured concurrency.  Having Go-style async and then building a 
structured nursery system on top of it means you cannot have any of the 
guarantees of the structured approach, because the other one is still poking 
out the side and leaking.  We're already stuck with mutable-by-default, global 
variables, and other things that prevent us from making helpful assumptions.  
Please, let's try to avoid that for async.  We don't need more gotos.

>> async $context {
>> // $context is an object of AsyncContext, and can be passed around as such.
>> // It is the *only* way to span anything async, or interact with the async 
>> controls.
>> // If a function doesn't take an AsyncContext param, it cannot control 
>> async.  This is good.
>
> This is a very elegant solution. Theoretically.
>
> However, in practice, if you require explicitly passing the context to 
> all functions, it leads to the following consequences:
>
>  1. The semantics of all functions increase by one additional parameter 
> (*Signature bloat*).

No, just those functions/objects that necessarily involve running async control 
commands.  Most wouldn't.  They would just silently context switch when they 
hit an IO operation (which as noted above is transparency supported, which is 
what makes this work) and otherwise behave the same.

But if something does actively need to do async stuff, it should have a context 
to work within.  It's the same discussion as:

A: "Pass/inject a DB connection to a class that needs it, don't just call a 
global db() function."
B: "But then I have to pass it to all these places explicitly!"
A: "That's a sign your SQL is too scattered around the code base. Fix that 
first and your problem goes away."

Explicit flow control is how you avoid bugs.  It's also self-documenting, as 
it's patently obvious what code expects to run in an async context and which 
doesn't care.

>  2. If an asynchronous call needs to be added to a function, and other 
> functions depend on it, then the semantics of all dependent functions 
> must be changed as well. 

This is no different than DI of any other service.  I have restructured code to 
handle temporary contexts before.  (My AttributeUtils and Serde libraries.)  
The result was... much better code than I had before.  I'm glad I made those 
refactors.

> In this example, there is another aspect: the fact that async execution 
> is explicitly limited to a specific scope. This is essentially the same 
> as `startScheduler`, and it is one of the options I was considering.
>
> Of course, `startScheduler` can be replaced with a construction like 
> `async(function() { ... })`.
> This means that async execution is only active within the closure, and 
> coroutines can only be created inside that closure.
>
> This is one of the semantic solutions that allows removing 
> `startScheduler`, but at the implementation level, it is exactly the 
> same.
>
> What do you think about this?

That looks mostly like the async block syntax I proposed, spelled differently.  
The main difference is that the body of the wrapped function would need to 
explicitly `use` any variables from scope that it wanted, rather than getting 
them implicitly.  Whether that's good or bad is probably subjective.

But it would allow for a syntax like this for the context, which is quite 
similar to how database transactions are often done:

$val = async(function(AsyncContext $ctx) use ($stuff, $fn) {
  $result = [];
  foreach ($stuff as $item) {
    $result[] = $ctx->run($fn);
  }

  // We block/wait here until all subtasks are complete, then the async() call 
returns this value.
  return $result;
});

And of course in both cases you could use a pre-defined callable instead of 
inlining one.  At this point I think it's mostly a stylistic difference, 
function vs block.

>> I'm not convinced that sticking arbitrary key/value pairs into the Context 
>> object is wise;
>
> Why not? 
>
>> that's global state by another name
>
>   Static variables inside a function are also global state. Are you 
> against static variables?

Vocally, in fact. :-)

>> But if we must, the above would handle all the inheritance and override 
>> stuff quite naturally. Possibly with:
>
>  How will a context with open string keys help preserve service data 
> that the service doesn't want to expose to anyone? The `Key()` solution 
> is essentially the same as `Symbol` in JS, which is used for the same 
> purpose. Of course, we could add a `coroutine static $var` construct to 
> the language syntax. But it's all the same just syntactic sugar that 
> would require more code to support. 

I cannot speak to JS Symbols as I haven't used them.  I am just vhemently 
opposed to globals, no matter how many layers they're wrapped in. :-)  Most 
uses could be replaced by proper DI or partial application.

>> [$in, $out] = Channel::create($buffer_size);
>
> This semantics require the programmer to remember that two variables 
> actually point to the same object. If a function has multiple channels, 
> this makes the code quite verbose. Additionally, such channels are 
> inconvenient to store in lists because their structure becomes more 
> complex.
>
> I would suggest a slightly different solution:
>
> <code php>
> $in = new Channel()->getProducer();
> async myFunction($in->getConsumer());
> <code>
>
> This semantics do not restrict the programmer in usage patterns while 
> still allowing interaction with the channel through a well-defined 
> contract.

I'd go slightly differently if you wanted to go that route:

$ch = new Channel($buffer_size);
$in = $ch->producer();
$out = $ch->consumer();

// You do most interaction with $in and $out.

I could probably work with that as well.

(Or even just $ch->inPipe and $ch->outPipe, now that we have nice property 
support.)

But the overall point, I think, is avoiding implicit modal logic.  If my code 
doesn't need to care if it's in an async world, it doesn't care.  If it does, 
then I need an explicit async world to work within, rather than relying on one 
implicitly existing, I hope.  And I shouldn't have to think about "who owns 
this end of this channel".  I just have an in and out hose I stick stuff into 
and pull out from, kthxbye.

> Thanks for the great examples, and a special thanks for the article.
> I also like the definition of context.
>
> Ed

--Larry Garfield

Re: [PHP-DEV] PHP True Async RFC

Reply via email to