On Sat, Mar 8, 2025, at 1:05 AM, Edmond Dantes wrote:
> Hello all.
>
> A few thoughts aloud about the emerging picture.
>
> ### Entry point into the asynchronous context
> Most likely, it should be implemented as a separate function (I haven't 
> come up with a good name yet), with a unique name to ensure its 
> behavior does not overlap with other operators. It has a unique 
> property: it waits for the full completion of the event loop and the 
> Scheduler. 
>
> Inside the asynchronous context, `Fiber` is prohibited, and conversely, 
> inside a `Fiber`, the asynchronous context is prohibited.

Yes.

> ### The `async` operator
> The `async` (or *spawn*?) operator can be used as a shorthand for 
> spawning a coroutine:

This is incorrect.  "Create an async bounded context playpen" (what I called 
"async" in my example) and "start a fiber/thread/task" (what I called "spawn") 
are two *separate* operations, and must remain so.

create space for async stuff {
  start async task a();
  start async task b();
}

However those get spelled, they're necessarily separate things.  If any 
creation of a new async task also creates a new async context, then we no 
longer have the ability to run multiple tasks in parallel in the same context.  
Which is, as I understand it, kinda the point.

I also don't believe that an async bounded context necessarily needs to be a 
function, as doing so introduces a lot of extra complexity for the user when 
they need to manually "use" things.  (Though perhaps sometimes we can have a 
shorthand for that; that comes later.)

I am also still very much against allowing tasks to "detach".  If a thread is 
allowed to escape its bounded context, then I can no longer rely on that 
context being bounded.  It removes the very guarantee that we're trying to 
provide.  There are better ways to handle "throwing off a long-running 
background task."  (See below.)


Edmond, correct me if I'm wrong here, but in practice, the *only* places that 
it makes sense to switch fibers are:

1. At an otherwise-blocking IO call.
2. In a very long running CPU task, where the task is easily broken up into 
logical pieces so that we can interleave it with shorter tasks in the same 
process.  This is only really necessary when running a shared single process 
for multiple requests.

And in this proposal, IO operations auto-switch between blocking and 
thread-sharing as appropriate.

To be more concrete, let's consider specific use cases that should be addressed:

1. Multiplexing IO, within an otherwise sync context like PHP-FPM

I predict that, in the near term, this will be the most common usage pattern.  
(Long term, who knows.)  This one is easily solvable; it's basically par_map() 
and variations therein.

// Creates a context in which async is allowed to happen. IO operations auto 
async $ctx = new AsyncContext() {
  $val1 = spawn task1();
  $val2 = spawn task2();
  // Do stuff with those values.
}
// We are absolutely certain nothing started in that block is still running.

(I'm still unclear if $val1 and $val2 should be values or a Future object.  
Possibly the latter.)

4. Shared-process async server

This is the ReactPHP/Swoole space.  This... honestly gets kind of easy.

Wrap the entire application in an async {} block. Boom.  All IO is now async.

<?php

async {
  while (true) {
    $request = spawn listen_for_request();
    spawn handle_request($request);
  }
}

Importantly, since IO is the primary switch point, and IO automatically deals 
with thread switching, my DB-query-heavy Repository object doesn't care if I'm 
doing this or not.  If each $handler (controller, whatever) is written 100% 
sync, with lots of IO... it still works fine.

3. Set-and-forget background job

This is the logger example, but probably also queue tasks, etc.  This is where 
the request for detaching comes from.  I would argue detaching is both the 
wrong approach, and an unnecessary one.  Because you can send data to fibers 
from OTHER contexts... via channels.

So rather than this:

spawn detatch log('message'); // Who the hell knows when this will complete, or 
if it ever does.

We have this:

async {
  $logger = new AsyncLogger();
  $channel = $logger->inputChannel();

  spawn handler($logChannel);
}

function handler($logger) {
  async {
      while (true) {
        $request = spawn listen_for_request();
        spawn handle_request($request, $logChannel);
      } // An exception could get us to here.
    }
}

function handle_request($request, $logChannel) {
  $logChannel->send($request->url());
  // Do other complex stuff with the request.
}

This is probably not the ideal way to structure it in practice, but it should 
get the point across.  The background logger fiber already exists in the parent 
async playpen.  That's OK!  We can send messages to it via a channel.  It can 
keep running after the inner async block ends.  The logger fiber doesn't need 
to be attached, because it was already attached to a parent playpen anyway!

This means passing either a channel-enabled logger instance around (probably 
better for BC; this should be easy to do behind PSR-3) or the sending channel 
itself.  I'm sure someone will object that is too much work.  However, it is no 
more, or less, work than passing a PSR-3 logger to services today.  And in 
practice "your DI container handles that, stop worrying" is a common and 
effective answer.

An async-aware DI Container could have an Async-aware PSR-3 logger it passes to 
various services like any other boring PSR-3 instance.  That logger forwards 
the message across a channel to a waiting parent-playpen-bound fiber, where it 
just enters the rotation of other fibers getting run.

Services don't need to be modified at all.  We don't need to have dangling 
fibers.  And for smaller, more contained cases, eh, Go has shown that "just 
pass the channel around and move on with life" can be an effective approach.  
The only caveat is you can't pass a channel-based logger to a scope that will 
be called outside of an async playpen... But that would be the case anyway, so 
it's not really an issue.

There's still the context question, as well as whether spawn is a method on a 
context object or a keyword, but I think this gets us to 80% of what the 
original RFC tries to provide, with 20% of the mental overhead.

--Larry Garfield

Reply via email to