On Sat, Mar 1, 2025, at 10:11, Edmond Dantes wrote:
> Good day, everyone. I hope you're doing well.
>
> I’d like to introduce a draft version of the RFC for the True Async component.
>
> https://wiki.php.net/rfc/true_async
>
> I believe this version is not perfect and requires analysis. And I strongly
> believe that things like this shouldn't be developed in isolation. So, if you
> think any important (or even minor) aspects have been overlooked, please
> bring them to attention.
>
> The draft status also highlights the fact that it includes doubts about the
> implementation and criticism. The main global issue I see is the lack of
> "future experience" regarding how this API will be used—another reason to
> bring it up for public discussion.
>
> Wishing you all a great day, and thank you for your feedback!
>
Hey Edmond:
I find this feature quite exciting! I've got some feedback so far, though most
of it is for clarification or potential optimizations:
> A PHP developer *SHOULD NOT* make any assumptions about the order in which
> Fibers will be executed, as this order may change or be too complex to
> predict.
There should be a defined ordering (or at least, some guarantees). Being able
to understand what things run in what order can help with understanding a
complex system. Even if it is just a vague notion (user tasks are processed
before events, or vice versa), it would still give developers more confidence
in the code they write. You actually mention a bit of the order later
(microtasks happen before fibers/events), so this sentence maybe doesn't make
complete sense.
Personally, I feel as though an async task should run as though it were a
function call until it hits a suspension. This is mostly an optimization though
(C# does this), but it could potentially reduce overhead of queueing a function
that may never suspend (which you mention as a potential problem much later on):
Async\run(*function*() {
$fiber = Async\async(*function*() {
sleep <http://www.php.net/sleep>(1); // this gets enqueued now
return "Fiber completed!";
});
*// Execution is paused until the fiber completes*
$result = Async\await($fiber); // immediately enter $fiber without queuing
echo $result . "*\n*";
echo "Done!*\n*";
});
> Until it is activated, PHP code behaves as before: calls to blocking
> functions will block the execution thread and will not switch the *Fiber*
> context. Thus, code written without the *Scheduler* component will function
> exactly the same way, without side effects. This ensures backward
> compatibility.
I'm not sure I understand this. Won't php code behave exactly the same as it
did before once enabling the scheduler? Will libraries written before this
feature existed suddenly behave differently? Do we need to worry about the
color of functions because it changes the behavior?
> `True Async` prohibits initializing the `Scheduler` twice.
How will a library take advantage of this feature if it cannot be certain the
scheduler is running or not? Do I need to write a library for async and another
version for non-async? Or do all the async functions with this feature work
without the scheduler running, or do they throw a catchable error?
> This is crucial because the process may handle an OS signal that imposes a
> time limit on execution (for example, as Windows does).
Will this change the way os signals are handled then? Will it break
compatibility if a library uses pcntl traps and I'm using true async traps too?
Note there are several different ways (timeout) signals are handled in PHP --
so if (per-chance) the scheduler could always be running, maybe we can unify
the way signals are handled in php.
> Code that uses *Resume* cannot rely on when exactly the *Fiber* will resume
> execution.
What if it never resumes at all? Will it call a finally block if it is
try/catched or will execution just be abandoned? Is there some way to ensure
cleanup of resources? It should probably mention this case and how abandoning
execution works.
> If an exception is thrown inside a fiber and not handled, it will stop the
> Scheduler and be thrown at the point where `Async\launchScheduler()` is
> called.
The RFC doesn't mention the stack trace. Will it throw away any information
about the inner exception?
> The *Graceful Shutdown* mode can also be triggered using the function:
What will calling `exit` or `die` do?
> A concurrent runtime allows handling requests using Fibers, where each Fiber
> can process its own request. In this case, storing request-associated data in
> global variables is no longer an option.
Why is this the case? Furthermore, if it inherits from the fiber that started
its current fiber, won't using Resume/Notifier potentially cause problems when
used manually? There are examples over the RFC using global variables in
closures; so do these examples not actually work? Will sharing instances of
objects in scope of the functions break things? For example:
Async\run($obj->method1(...));
Async\run($obj->method2(...));
This is technically sharing global variables (well, global to that scope --
global is just a scope after all) -- so what happens here? Would it make sense
to delegate this fiber-local storage to user-land libraries instead?
> Objects of the `Future` class are high-level patterns for handling deferred
> results.
By this point we have covered FiberHandle, Resume, and Contexts. Now we have
Futures? Can we simplify this to just Futures? Why do we need all these
different ways to handle execution?
> A channel is a primitive for message exchange between `Fibers`.
Why is there an `isEmpty` and `isNotEmpty` function? Wouldn't
`!$channel->isEmpty()` suffice?
It's also not clear what the value of most of these function is. For example:
if ($chan->isFull()) {
doSomething(); // suspends at some point inside? We may not know when we
write the code.
// chan is no longer full, or maybe it is -- who knows, but the original
assumption entering this branch is no longer true.
...
}
Whether a channel is full or not is not really important, and if you rely on
that information, this is usually an architectural smell (at least in other
languages). Same thing with empty or writable, or many others of these
functions. You basically just write to a channel and eventually (or not, which
is a bug and causes a deadlock) something will read it. The entire point is to
use channels to decouple async code, but most of the functions here allow for
code to become strongly coupled.
As for the single producer method, I am not sure why you would use this. I can
see some upside for the built-in constraints (potentially in a dev-mode
environment) but in a production system, single-producer bottlenecks are a real
thing that can cause serious performance issues. This is usually something you
explicitly want to avoid.
> In addition to the `send/receive` methods, which suspend the execution of a
> `Fiber`, the channel also provides non-blocking methods: `trySend`,
> `tryReceive`, and auxiliary explicit blocking methods: `waitUntilWritable`
> and `waitUntilReadable`.
It isn't clear what happens when `trySend` fails. Is this an error or does
nothing?
Thinking through it, there may be cases where `trySend` is valid, but more
often than not, it is probably an antipattern. I cannot think of a valid reason
for `tryReceive` and it's usage is most likely guaranteed to cause a deadlock
in real code. For true multi-threaded applications, it makes more sense, but
not for single-threaded concurrency like this.
In other words, the following code is likely to be more robust, and not depend
on execution order (which we are told at the beginning not to do):
Async\run(*function*() {
$channel = *new* Async\Channel();
$reader = Async\async(*function*() *use*($channel) {
while ($data = $channel->read() && $data !== NULL) {
echo "receive: *$data**\n*";
}
});
for ($i = 0; $i < 4; $i++) {
echo "send: event data *$i**\n*";
$data = $channel->send("event data *$i*");
}
$reader->cancel(); // clean up our reader
// or
$channel->close(); // will receive NULL I believe?
});
A `trySend` is still useful when you want to send a message but don't want to
block if it is full. However, this is going to largely depend on how long is
has been since the developer last suspended the current fiber, and nothing else
-- thus it is probably an antipattern since it totally depends on the literal
structure of the code, not the structure of the program -- if that makes sense.
> This means that `trapSignal` is not intended for “regular code” and should
> not be used “anywhere”.
Can you expand on what this means in the RFC? Why expose it if it shouldn't be
used?
-----
I didn't go into the low level api details yet -- this email is already pretty
long. But I would suggest maybe thinking about how to unify
Notifiers/Resume/FiberHandle/Future into a single thing. These things are
pretty similar to one another (from a developer's standpoint) -- a way to
continue execution, and they all offer a slightly different api.
I also noticed that you seem to be relying heavily on the current
implementation to define behavior. Ideally, the RFC should define behavior and
the implementation implement that behavior as described in the RFC. In other
words, the RFC is used as a reference point as to whether something is a bug or
an enhancement in the future. There has been more than once where the list
looks back at an old RFC to try and determine the intent for discovering if
something is working as intended or a bug. RFCs are also used to write
documentation, so the more detailed the RFC, the better the documentation will
be for new users of PHP.
— Rob