Re: [PHP-DEV] PHP True Async RFC

2025-03-09 Thread Edmond Dantes
Good day, Alex.

>
>  Can you please share a bit more details on how the Scheduler is
implemented, to make sure that I understand why this contradiction exists?
Also with some examples, if possible.
>

```php
$fiber1 = new Fiber(function () {
echo "Fiber 1 starts\n";

$fiber2 = new Fiber(function () use (&$fiber1) {
echo "Fiber 2 starts\n";

Fiber::suspend(); // Suspend the inner fiber
echo "Fiber 2 resumes\n";

});

});
```

Yes, of course, let's try to look at this in more detail.
Here is the classic code demonstrating how Fiber works. Fiber1 creates
Fiber2. When Fiber2 yields control, execution returns to Fiber1.

Now, let's try to do the same thing with Fiber3. Inside Fiber2, we create
Fiber3. Everything will work perfectly—Fiber3 will return control to Fiber2,
and Fiber2 will return it to Fiber1—this forms a hierarchy.

Now, imagine that we want to turn Fiber1 into a *Scheduler* while following
these rules.
To achieve this, we need to ensure that all Fiber instances are created
from the *Scheduler*, so that control can always be properly returned.

```php

class Scheduler {
private array $queue = [];

public function add(callable $task) {
$fiber = new Fiber($task);
$this->queue[] = $fiber;
}

public function run() {
while (!empty($this->queue)) {
$fiber = array_shift($this->queue);

if ($fiber->isSuspended()) {
$fiber->resume($this);
}
}
}

public function yield() {
$fiber = Fiber::getCurrent();
if ($fiber) {
$this->queue[] = $fiber;
Fiber::suspend();
}
}
}

$scheduler = new Scheduler();

$scheduler->add(function (Scheduler $scheduler) {
echo "Task 1 - Step 1\n";
$scheduler->yield();
echo "Task 1 - Step 2\n";
});

$scheduler->add(function (Scheduler $scheduler) {
echo "Task 2 - Step 1\n";
$scheduler->yield();
echo "Task 2 - Step 2\n";
});

$scheduler->run();

```

So, to successfully switch between Fibers:

   1. A Fiber must return control to the *Scheduler*.
   2. The *Scheduler* selects the next Fiber from the queue and switches to
   it.
   3. That Fiber then returns control back to the *Scheduler* again.

This algorithm has one drawback: *it requires two context switches instead
of one*. We could switch *FiberX* to *FiberY* directly.

Breaking the contract not only disrupts the code in this RFC but also
affects Revolt's functionality. However, in the case of Revolt, you can
say: *"If you use this library, follow the library's contracts and do not
use Fiber directly."*

But PHP is not just a library, it's a language that must remain consistent
and cohesive.

>
>  Reading the RFC initially, I though that the Scheduler is using fibers
for everything that runs.
>

Exactly.

>
>  You mean that when one of the fibers started by the Scheduler is
starting other fibers they would usually await for them to finish, and that
is a blocking operating that blocks also the Scheduler?
>

When a *Fiber* from the *Scheduler* decides to create another *Fiber* and
then tries to call blocking functions inside it, control can no longer
return to the *Scheduler* from those functions.

Of course, it would be possible to track the state and disable the
concurrency mode flag when the user manually creates a *Fiber*. But… this
wouldn't lead to anything good. Not only would it complicate the code, but
it would also result in a mess with different behavior inside and outside
of *Fiber*.

This is even worse than calling *startScheduler*.

The hierarchical switching rule is a *design flaw* that happened
because a *low-level
component* was introduced into the language as part of the implementation
of a *higher-level component*. However, the high-level component is in
*User-land*, while the low-level component is in *PHP core*.

It's the same as implementing $this in OOP but requiring it to be
explicitly passed in every method. This would lead to inconsistent behavior.

So, this situation needs to be resolved one way or another.

--

Ed


Re: [PHP-DEV] PHP True Async RFC

2025-03-09 Thread Edmond Dantes
>
>  Maybe, we could create a different version of fibers ("managed fibers",
maybe?) distinct from the current implementation, with the idea to
deprecate them in PHP 10?
> Then, at least, the scheduler could always be running. If you are using
existing code that
> uses fibers, you can't use the new fibers but it will "just work" if you
aren't using the new fibers (since the scheduler will never pick up those
fibers).
>

Yes, that can be done. It would be good to maintain compatibility with
XDEBUG, but that needs to be investigated.

During our discussion, everything seems to be converging on the idea that
the changes introduced by the RFC into Fiber would be better moved to a
separate class. This would reduce confusion between the old and new
solutions. That way, developers wouldn't wonder why Fiber and coroutines
behave differently—they are simply different classes.

The new *Coroutine* class could have a different interface with new logic.
This sounds like an excellent solution.

The interface could look like this:

   - *suspend* (or another clear name) – a method that explicitly hands
   over execution to the *Scheduler*.
   - *defer* – a handler that is called when the coroutine completes.
   - *cancel* – a method to cancel the coroutine.
   - *context* – a property that stores the execution context.
   - *parent* (public property or getParent() method) – returns the parent
   coroutine.

(*Just an example for now.*)

The *Scheduler* would be activated automatically when a coroutine is
created. If the index.php script reaches the end, the interpreter would
wait for the *Scheduler* to finish its work under the hood.

Do you like this approach?

---

Ed.


Re: [PHP-DEV] PHP True Async RFC

2025-03-09 Thread Alexandru Pătrănescu
On Sun, Mar 9, 2025, 09:05 Edmond Dantes  wrote:

> When a *Fiber* from the *Scheduler* decides to create another *Fiber* and
> then tries to call blocking functions inside it, control can no longer
> return to the *Scheduler* from those functions.
>
> Of course, it would be possible to track the state and disable the
> concurrency mode flag when the user manually creates a *Fiber*. But… this
> wouldn't lead to anything good. Not only would it complicate the code, but
> it would also result in a mess with different behavior inside and outside
> of *Fiber*.
>
>
Thank you for explaining the problem space.
Now let's see what solutions we can find.

First of all, I think it would be better for the language to assume the
Scheduler is always running and not have to be manually started.

An idea that I have for now:
Have a different method `Fiber::suspendToScheduler(Resume $resume)` that
would return the control to the Scheduler. And this one would be used by
all internal functions that does blocking operations, and maybe also user
land ones if they need to. Of course, the name can be better, like
`Fiber::await`.

Maybe that is what we need: to be able to return control both to the parent
fiber for custom logic that might be needed, and to the Scheduler so that
the language would be concurrent.

As for userland event loops, like Revolt, I am not so sure they fit with
the new language level async model.
But I can see how they could implement a different Event loop that would
run only one "loop", schedule a deffered callback and pass control to the
Scheduler (that would return the control in the next iteration to perform
one more loop, and so on.

-- 
Alex

>


Re: [PHP-DEV] PHP True Async RFC

2025-03-09 Thread Rob Landers
On Sun, Mar 9, 2025, at 09:05, Edmond Dantes wrote:
> Good day, Alex.
> 
> >
> >  Can you please share a bit more details on how the Scheduler is 
> > implemented, to make sure that I understand why this contradiction exists? 
> > Also with some examples, if possible.
> >
> 
> ```php
> $fiber1 = new Fiber(function () {
> echo "Fiber 1 starts\n";
> 
> $fiber2 = new Fiber(function () use (&$fiber1) {
> echo "Fiber 2 starts\n";
> 
> Fiber::suspend(); // Suspend the inner fiber
> echo "Fiber 2 resumes\n";
> 
> });
> 
> });
> ```
> 
> 
> Yes, of course, let's try to look at this in more detail.
> Here is the classic code demonstrating how `Fiber` works. `Fiber1` creates 
> `Fiber2`. When `Fiber2` yields control, execution returns to `Fiber1`.
> 
> Now, let's try to do the same thing with `Fiber3`. Inside `Fiber2`, we create 
> `Fiber3`. Everything will work perfectly—`Fiber3` will return control to 
> `Fiber2`, and `Fiber2` will return it to `Fiber1`—this forms a hierarchy.
> 
> 
> Now, imagine that we want to turn `Fiber1` into a *Scheduler* while following 
> these rules.
> To achieve this, we need to ensure that all `Fiber` instances are created 
> from the *Scheduler*, so that control can always be properly returned.
> 
> ```php
> 
> 
> class Scheduler {
> private array $queue = [];
> 
> public function add(callable $task) {
> $fiber = new Fiber($task);
> $this->queue[] = $fiber;
> }
> 
> public function run() {
> while (!empty($this->queue)) {
> $fiber = array_shift($this->queue);
> 
> if ($fiber->isSuspended()) {
> $fiber->resume($this);
> }
> }
> }
> 
> public function yield() {
> $fiber = Fiber::getCurrent();
> if ($fiber) {
> $this->queue[] = $fiber;
> Fiber::suspend();
> }
> }
> }
> 
> $scheduler = new Scheduler();
> 
> $scheduler->add(function (Scheduler $scheduler) {
> echo "Task 1 - Step 1\n";
> $scheduler->yield();
> echo "Task 1 - Step 2\n";
> });
> 
> $scheduler->add(function (Scheduler $scheduler) {
> echo "Task 2 - Step 1\n";
> $scheduler->yield();
> echo "Task 2 - Step 2\n";
> });
> 
> $scheduler->run();
> 
> ```
> 
> So, to successfully switch between Fibers:
> 
>  1. A Fiber must return control to the *Scheduler*.
>  2. The *Scheduler* selects the next Fiber from the queue and switches to it.
>  3. That Fiber then returns control back to the *Scheduler* again.
> 
> 
> This algorithm has one drawback: *it requires two context switches instead of 
> one*. We could switch *FiberX* to *FiberY* directly. 
> 
> Breaking the contract not only disrupts the code in this RFC but also affects 
> Revolt's functionality. However, in the case of Revolt, you can say: *"If you 
> use this library, follow the library's contracts and do not use Fiber 
> directly."*
> 
> 
> 
> But PHP is not just a library, it's a language that must remain consistent 
> and cohesive.
> 
> 
> >
> >  Reading the RFC initially, I though that the Scheduler is using fibers for 
> > everything that runs. 
> >
> 
> Exactly.  
> 
> 
> >
> >  You mean that when one of the fibers started by the Scheduler is starting 
> > other fibers they would usually await for them to finish, and that is a 
> > blocking operating that blocks also the Scheduler?
> >
> 
> When a *Fiber* from the *Scheduler* decides to create another *Fiber* and 
> then tries to call blocking functions inside it, control can no longer return 
> to the *Scheduler* from those functions.
> 
> Of course, it would be possible to track the state and disable the 
> concurrency mode flag when the user manually creates a *Fiber*. But… this 
> wouldn't lead to anything good. Not only would it complicate the code, but it 
> would also result in a mess with different behavior inside and outside of 
> *Fiber*.
> 
> 
> 
> This is even worse than calling *startScheduler*.
> 
> The hierarchical switching rule is a *design flaw* that happened because a 
> *low-level component* was introduced into the language as part of the 
> implementation of a *higher-level component*. However, the high-level 
> component is in *User-land*, while the low-level component is in *PHP core*.
> 
> It's the same as implementing `$this` in OOP but requiring it to be 
> explicitly passed in every method. This would lead to inconsistent behavior.
> 
> 
> 
> So, this situation needs to be resolved one way or another.  
> 
> --
> 
> Ed
> 

Hi Ed,

If I remember correctly, the original implementation of Fibers were built in 
such a way that extensions could create their own fiber types that were 
distinct from fibers but reused the context switch code.

>From the original RFC:

> An extension may still optionally provide their own custom fiber 
> implementation, but an internal API would allow the extension to use the 
> fiber implementation provided by PHP.

Maybe, we could create a different ve

Re: [PHP-DEV] PHP True Async RFC

2025-03-09 Thread Edmond Dantes
>
> The wait_all block is EXPLICITLY DESIGNED to meddle with the internals of
async libraries,
>

How exactly does it interfere with the implementation of asynchronous
libraries?
Especially considering that these libraries operate at the User-land level?
It’s a contract. No more. No less.

>
>  Libraries can full well handle cleanup of fibers in __destruct by
themselves, without a wait_all block forcing them to reduce concurrency
whenever the caller pleases.
>
Fiber is a final class, so there can be no destructors here. Even if you
create a "Coroutine" class and allow defining a destructor, the result will
be overly verbose code. I and many other developers have tested this.
And the creators of AMPHP did not take this approach. Go doesn’t have it
either. This is not a coincidence.

>
>  It is, imo, a MAJOR FOOTGUN, and should not be even considered for
implementation.
>

Why exactly is this a FOOTGUN?

   - Does this block lead to new violations of language integrity?
   - Does this block increase the likelihood of errors?

A FOOTGUN is something that significantly breaks the language and pushes
developers toward writing bad code. This is a rather serious flaw.


Re: [PHP-DEV] PHP True Async RFC

2025-03-09 Thread Rowan Tommins [IMSoP]

On 08/03/2025 22:28, Daniil Gentili wrote:
Even its use is optional, its presence in the language could lead 
library developers to reduce concurrency in order to allow calls from 
async blocks, (i.e. don't spawn any background fiber in a method call 
because it might be called from an async {} block) which is what I 
meant by crippling async PHP.



I think you've misunderstood what I meant by optional. I meant that 
putting the fiber into the managed context would be optional *at the 
point where the fiber was spawned*.


A library wouldn't need to "avoid spawning background fibers", it would 
simply have the choice between "spawn a fiber that is expected to finish 
within the current managed scope, if any", and "spawn a fiber that I 
promise to manage myself, and please ignore anyone trying to manage it 
for me".


There have been various suggestions of exactly what that could look 
like, e.g. in https://externals.io/message/126537#126625 and 
https://externals.io/message/126537#126630



The naming of "async {}" is also very misleading, as it does the 
opposite of making things async, if anything it should be called 
"wait_all {}"



Yes, "async{}" is a bit of a generic placeholder name; I think Larry was 
the first to use it in an illustration, and we've been discussing 
exactly what it might mean. As we pin down more precise suggestions, we 
can probably come up with clearer names for them.


The tone of your recent e-mails suggests you believe someone is forcing 
this precise keyword into the language, right now, and you urgently need 
to stop it before it's too late. That's not where we are at all, we're 
trying to work out if some such facility would be useful, and what it 
might look like.



It sounds like you think:

1) The language absolutely needs a "spawn detached" operation, i.e. a 
way of starting a new fiber which is queued in the global scheduler, but 
has no automatic relationship to its parent.
2) If the language offered both "spawn managed" and "spawn detached", 
the "detached" mode would be overwhelmingly more common (i.e. users and 
library authors would want to manage the lifecycle of their coroutines 
manually), so the "spawn managed" mode isn't worth implementing.


Would that be a fair summary of your opinion?

--
Rowan Tommins
[IMSoP]


Re: [PHP-DEV] Re: PHP True Async RFC

2025-03-09 Thread Edmond Dantes
>
>  I think the same thing applies to scheduling coroutines: we want the
Scheduler to take over the "null fiber",
>

Yes, you have quite accurately described a possible implementation.
When a programmer loads the initial index.php, its code is already running
inside a coroutine.
We can call it the main coroutine or the root coroutine.

When the index.php script reaches its last instruction, the coroutine
finishes, execution is handed over to the Scheduler, and then everything
proceeds as usual.

Accordingly, if the Scheduler has more coroutines in the queue, reaching
the last line of index.php does not mean the script terminates. Instead, it
continues executing the queue until... there is nothing left to execute.

>
> At that point, the relationship to a block syntax perhaps becomes clearer:
>

Thanks to the extensive discussion, I realized that the implementation with
startScheduler raises too many questions, and it's better to sacrifice a
bit of backward compatibility for the sake of language elegance.

After all, Fiber is unlikely to be used by ordinary programmers.


Re: [PHP-DEV] PHP True Async RFC

2025-03-09 Thread Edmond Dantes
>
>  I can give you several examples where such logic is used in Amphp
libraries, and it will break if they are invoked within an async block.
>

Got it, it looks like I misunderstood the post due to my focus. So,
essentially, you're talking not so much about wait_all itself, but rather
about the parent-child vs. free model.

This question is what concerns me the most right now.

If you have real examples of how this can cause problems, I would really
appreciate it if you could share them. Code is the best criterion of truth.

>
>  You misunderstand:
>

Yes, I misunderstood. It would be interesting to see the code with the
destructor to analyze this approach better.

*Let me summarize the current state for today:*

   1.

   I am abandoning startScheduler and the idea of preserving backward
   compatibility with await_all or anything else in that category. The
   scheduler will be initialized implicitly, and this does not concern
   user-land. Consequently, the spawn function() code will work everywhere
   and always.
   2.

   I will not base the implementation on Fiber (perhaps only on the
   low-level part). Instead of Fiber, there will be a separate class. There
   will be no changes to Fiber at all. This decision follows the principle
   of Win32 COM/DCOM: old interfaces should never be changed. If an old
   interface needs modification, it should be given a new name. This should
   have been done from the start.
   3.

   I am abandoning low-level objects in PHP-land (FiberHandle, SocketHandle
   etc). Over time, no one has voted for them, which means they are
   unnecessary. There might be a low-level interface for compatibility with
   Revolt.
   4.

 It might be worth restricting microtasks in PHP-land and keeping them
   only for C code. This would simplify the interface, but we need to ensure
   that it doesn’t cause any issues.

The remaining question on the agenda: deciding which model to choose —
*parent-child* or the *Go-style model*.

Thanks

---

Ed


Re: [PHP-DEV] Re: PHP True Async RFC

2025-03-09 Thread Rowan Tommins [IMSoP]

On 08/03/2025 20:22, Edmond Dantes wrote:


For coroutines to work, a Scheduler must be started. There can be only 
one Scheduler per OS thread. That means creating a new async task does 
not create a new Scheduler.


Apparently, async {} in the examples above is the entry point for the 
Scheduler.




I've been pondering this, and I think talking about "starting" or 
"initialising" the Scheduler is slightly misleading, because it implies 
that the Scheduler is something that "happens over there".


It sounds like we'd be writing this:

// No scheduler running, this is probably an error
Async\runOnScheduler( something(...) );

Async\startScheduler();
// Great, now it's running...

Async\runonScheduler( something(...) );

// If we can start it, we can stop it I guess?
Async\stopScheduler();


But that's not we're talking about. As the RFC says:

> Once the Scheduler is activated, it will take control of the 
Null-Fiber context, and execution within it will pause until all Fibers, 
all microtasks, and all event loop events have been processed.


The actual flow in the RFC is like this:

// This is queued somewhere special, ready for a scheduler to pick it up 
later

Async\enqueueForScheduler( something(...) );

// Only now does anything actually run
Async\runSchedulerUntilQueueEmpty();
// At this point, the scheduler isn't running any more

// If we add to the queue now, it won't run unless we run another scheduler
Async\enqueueForScheduler( something(...) );


Pondering this, I think one of the things we've been missing is what 
Unix[-like] systems call "process 0". I'm not an expert, so may get 
details wrong, but my understanding is that if you had a single-tasking 
OS, and used it to bootstrap a Unix[-like] system, it would look 
something like this:


1. You would replace the currently running single process with the new 
kernel / scheduler process
2. That scheduler would always start with exactly one process in the 
queue, traditionally called "init"
3. The scheduler would hand control to process 0 (because it's the only 
thing in the queue), and that process would be responsible for starting 
all the other processes in the system: TTYs and login prompts, network 
daemons, etc



I think the same thing applies to scheduling coroutines: we want the 
Scheduler to take over the "null fiber", but in order to be useful, it 
needs something in its queue. So I propose we have a similar "coroutine 
zero" [name for illustration only]:


// No scheduler running, this is an error
Async\runOnScheduler( something(...) );

Async\runScheduler(
    coroutine_zero: something(...);
);
// At this point, the scheduler isn't running any more

It's then the responsibility of "coroutine 0", here the function 
"something", to schedule what's actually wanted, like a network 
listener, or a worker pool reading from a queue, etc.



At that point, the relationship to a block syntax perhaps becomes clearer:

async {
   spawn start_network_listener();
}

is roughly (ignoring the difference between a code block and a closure) 
sugar for:


Async\runScheduler(
    coroutine_zero: function() {
       spawn start_network_listener();
   }
);


That leaves the question of whether it would ever make sense to nest 
those blocks (indirectly, e.g. something() itself contains an async{} 
block, or calls something else which does).


I guess in our analogy, nested blocks could be like running Containers 
within the currently running OS: they don't actually start a new 
Scheduler, but they mark a namespace of related coroutines, that can be 
treated specially in some way.


Alternatively, it could simply be an error, like trying to run the 
kernel as a userland program.



--
Rowan Tommins
[IMSoP]


Re: [PHP-DEV][RFC][VOTE] Add mb_levenshtein function

2025-03-09 Thread youkidearitai
2025年3月8日(土) 19:06 Niels Dossche :
>
> On 08/03/2025 03:30, youkidearitai wrote:
> > Hi, Internals
> >
> > The add mb_levenshtein  was end and declined.
> > Vote result is one yes and 5 no.
> >
> > Thank you very much voting.
> >
> > By the way, This message is means add grapheme_levenshtein instead of
> > mb_levenshtein?
> > Or nothing to do?
> > Feel free to comment.
> >
> > Thank you again.
> > Yuya.
> >
>
> Hi Yuya
>
> I think an RFC for grapheme_levenshtein would be better, it would have my 
> vote at least.
> Levenshtein makes more sense on graphemes than on unicode codepoints.
>
> Kind regards
> Niels

Hi, Niels

Thank you very much for reply.
Okay. I will go to grapheme_levenshtein RFC.

Kind regards
Yuya

-- 
---
Yuya Hamada (tekimen)
- https://tekitoh-memdhoi.info
- https://github.com/youkidearitai
-


Re: [PHP-DEV] PHP True Async RFC

2025-03-09 Thread Daniil Gentili

> I think you've misunderstood what I meant by optional. I meant that putting 
> the fiber into the managed context would be optional *at the point where the 
> fiber was spawned*. 
>
> It sounds like you think:
>
> 1) The language absolutely needs a "spawn detached" operation, i.e. a way of 
> starting a new fiber which is queued in the global scheduler, but has no 
> automatic relationship to its parent.
> 2) If the language offered both "spawn managed" and "spawn detached", the 
> "detached" mode would be overwhelmingly more common (i.e. users and library 
> authors would want to manage the lifecycle of their coroutines manually), so 
> the "spawn managed" mode isn't worth implementing. 
>
> Would that be a fair summary of your opinion?

Indeed, yes! That would be a complete summary of my opinion.

If the user could choose whether to add fibers to the managed context or not, 
that would be more acceptable IMO.

Then again see point 2, plus even an optional managed fiber context still 
introduces a certain degree of "magicness" and non-obvious/implicit behavior on 
initiative of the caller, that can be avoided by simply explicitly returning 
and awaiting any spawned fibers.

Regards,
Daniil Gentili.


Re: [PHP-DEV] PHP True Async RFC

2025-03-09 Thread Edmond Dantes
>
> Have a different method `Fiber::suspendToScheduler(Resume $resume)` that
would return the control to the Scheduler.
>

That's exactly how it works. The RFC includes the method Async\wait()
(Fiber::await() is nice), which hands control over to the Scheduler.
At the PHP core level, there is an equivalent method used by all blocking
functions. In other words, Fiber::suspend is not needed; instead, the
Scheduler API is used.

The only question is backward compatibility. If, for example, it is agreed
that the necessary changes will be made in Revolt when this feature is
released and we do not support the old behavior, then there is no problem.

>
>  Maybe that is what we need: to be able to return control both to the
parent fiber for custom logic that might be needed, and to the Scheduler so
that the language would be concurrent.
>

100% yes.

>
>  As for userland event loops, like Revolt, I am not so sure they fit with
the new language level async model.
>

Revolt can be adapted to this RFC by modifying the Driver module. I
actually reviewed its code again today to assess the complexity of this
change. It looks like it shouldn’t be difficult at all.

The only problem arises with the code that has already been written and is
publicly available. I know that the AMPHP stack is in use, so we need a
*flow* that ensures a smooth transition.

As I understand it, you believe that it’s better to introduce more radical
changes and not be afraid of breaking old code. In that case, there are no
questions at all.

>


Re: [PHP-DEV] PHP True Async RFC

2025-03-09 Thread Daniil Gentili

>> The wait_all block is EXPLICITLY DESIGNED to meddle with the internals of 
>>async libraries,
>>
>
> How exactly does it interfere with the implementation of asynchronous 
> libraries? 
> Especially considering that these libraries operate at the User-land level? 
> It’s a contract. No more. No less.


When you have a construct that is forcing all code within it to to terminate 
all running fibers.

If any library invoked within a wait_all block suddenly decides to spawn a 
long-running fiber that is not stopped when exiting the block, but for example 
later, when the library itself decides to, the wait_all block will not exit, 
essentially forcing the library user or developer to mess with the internal and 
forcefully terminate the background fiber.

The choice should never be up to the caller, and the presence of the wait_all 
block gives any caller the option to break the internal logic of libraries.

I can give you several examples where such logic is used in Amphp libraries, 
and it will break if they are invoked within an async block.

>>  Libraries can full well handle cleanup of fibers in __destruct by 
>>themselves, without a wait_all block forcing them to reduce concurrency 
>>whenever the caller pleases.
>>
> Fiber is a *final* class, so there can be no destructors here. Even if you 
> create a "Coroutine" class and allow defining a destructor, the result will 
> be overly verbose code. I and many other developers have tested this.

You misunderstand: this is about storing the FiberHandles of spawned fibers and 
awaiting them in the __destruct of an object (the same object that spawned them 
in a method), in order to make sure all spawned fibers are awaited and all 
unhandled exceptions are handled somewhere (in absence of an event loop error 
handler).
Also see my discussion about ignoring referenced futures: 
https://externals.io/message/126537#126661

>
>>
>>  It is, imo, a MAJOR FOOTGUN, and should not be even considered for 
>>implementation.
>>   
>
> Why exactly is this a FOOTGUN?
>
> * Does this block lead to new violations of language integrity?
> * Does this block increase the likelihood of errors?

1) Yes, because it gives users tools to mess with the internal behavior of 
userland libraries
2) Yes, because (especially given how it's named) accidental usage will break 
existing and new async libraries by endlessly awaiting upon background fibers 
when exiting an async {} block haphazardly used by a newbie when calling most 
async libraries, or even worse force library developers to reduce concurrency, 
killing async PHP just because users can use async {} blocks.


> A FOOTGUN is something that significantly breaks the language and pushes 
> developers toward writing bad code. This is a rather serious flaw.

Indeed, this is precisely the case.

As the maintainer of Psalm, among others, I fully understand the benefits of 
purity and immutability: however, this keyword is a toy exercise in purity, 
with no real usecases (all real usecases being already covered by awaitAll), 
which cannot work in the real world in current codebases and will break 
real-world applications if used, with consequences on the ecosystem.

I don't know what else to say on the topic, I feel like I've made myself clear 
on the matter: if you still feel like it's a good idea and it should be added 
to the RFC as a separate poll, I can only hope that the majority will see the 
danger of adding such a useless keyword and vote against on that specific 
matter.

Regards,
Daniil Gentili.


Re: [PHP-DEV] Re: PHP True Async RFC

2025-03-09 Thread Iliya Miroslavov Iliev
Edmond,
The language barrier is bigger (because of me, I cannot properly explain
it) so I will keep it simple. Having "await" makes it sync, not async. In
hardware we use interrupts but we have to do it grandma style... The main
loop checks from variables set on the interrupts which is async. So you
have a main loop that checks a variable but that variable is set from
another part of the processor cycle that has nothing to do with the main
loop (it is not fire and forget style it is in real time). Basically you
can have a standard `int main()`function that is sync because you can delay
in it (yep sleep(0)) and while you block it you have an event that
interrupts a function that works on another register which is independent
from the main function. More details of this will be probably not
interesting so I will stop. If you want to make async PHP with multiple
processes you have to check variables semaphored to make it work.

On Sun, Mar 9, 2025 at 8:16 PM Edmond Dantes  wrote:

> >
> >  I think the same thing applies to scheduling coroutines: we want the
> Scheduler to take over the "null fiber",
> >
>
> Yes, you have quite accurately described a possible implementation.
> When a programmer loads the initial index.php, its code is already
> running inside a coroutine.
> We can call it the main coroutine or the root coroutine.
>
> When the index.php script reaches its last instruction, the coroutine
> finishes, execution is handed over to the Scheduler, and then everything
> proceeds as usual.
>
> Accordingly, if the Scheduler has more coroutines in the queue, reaching
> the last line of index.php does not mean the script terminates. Instead,
> it continues executing the queue until... there is nothing left to execute.
>
> >
> > At that point, the relationship to a block syntax perhaps becomes
> clearer:
> >
>
> Thanks to the extensive discussion, I realized that the implementation
> with startScheduler raises too many questions, and it's better to
> sacrifice a bit of backward compatibility for the sake of language elegance.
>
> After all, Fiber is unlikely to be used by ordinary programmers.
>


-- 
Iliya Miroslavov Iliev
i.mirosla...@gmail.com


Re: [PHP-DEV] Re: PHP True Async RFC

2025-03-09 Thread Edmond Dantes
> Edmond,
>


> If you want to make async PHP with multiple processes you have to check
> variables semaphored to make it work.
>
>
Hello, Iliya.

Thank you for your feedback. I'm not sure if I fully understood the entire
context. But.

At the moment, I have no intention of adding multitasking to PHP in the
same way it works in Go.

Therefore, code will not require synchronization. The current RFC proposes
adding only asynchronous execution. That means each thread will have its
own event loop, its own memory, and its own coroutines.

P.s.  I know also Russian and a bit asm.
Ed.

>
>


Re: [PHP-DEV] Re: PHP True Async RFC

2025-03-09 Thread Rob Landers
On Sun, Mar 9, 2025, at 14:17, Rowan Tommins [IMSoP] wrote:
> On 08/03/2025 20:22, Edmond Dantes wrote:
> >
> > For coroutines to work, a Scheduler must be started. There can be only 
> > one Scheduler per OS thread. That means creating a new async task does 
> > not create a new Scheduler.
> >
> > Apparently, async {} in the examples above is the entry point for the 
> > Scheduler.
> >
> 
> I've been pondering this, and I think talking about "starting" or 
> "initialising" the Scheduler is slightly misleading, because it implies 
> that the Scheduler is something that "happens over there".
> 
> It sounds like we'd be writing this:
> 
> // No scheduler running, this is probably an error
> Async\runOnScheduler( something(...) );
> 
> Async\startScheduler();
> // Great, now it's running...
> 
> Async\runonScheduler( something(...) );
> 
> // If we can start it, we can stop it I guess?
> Async\stopScheduler();
> 
> 
> But that's not we're talking about. As the RFC says:
> 
> > Once the Scheduler is activated, it will take control of the 
> Null-Fiber context, and execution within it will pause until all Fibers, 
> all microtasks, and all event loop events have been processed.
> 
> The actual flow in the RFC is like this:
> 
> // This is queued somewhere special, ready for a scheduler to pick it up 
> later
> Async\enqueueForScheduler( something(...) );
> 
> // Only now does anything actually run
> Async\runSchedulerUntilQueueEmpty();
> // At this point, the scheduler isn't running any more
> 
> // If we add to the queue now, it won't run unless we run another scheduler
> Async\enqueueForScheduler( something(...) );
> 
> 
> Pondering this, I think one of the things we've been missing is what 
> Unix[-like] systems call "process 0". I'm not an expert, so may get 
> details wrong, but my understanding is that if you had a single-tasking 
> OS, and used it to bootstrap a Unix[-like] system, it would look 
> something like this:
> 
> 1. You would replace the currently running single process with the new 
> kernel / scheduler process
> 2. That scheduler would always start with exactly one process in the 
> queue, traditionally called "init"
> 3. The scheduler would hand control to process 0 (because it's the only 
> thing in the queue), and that process would be responsible for starting 
> all the other processes in the system: TTYs and login prompts, network 
> daemons, etc

Slightly off-topic, but you may find the following article interesting: 
https://manybutfinite.com/post/kernel-boot-process/

It's a bit old, but probably still relevant for the most part. At least for x86.

— Rob

Re: [PHP-DEV] Re: PHP True Async RFC

2025-03-09 Thread Larry Garfield
On Sun, Mar 9, 2025, at 8:17 AM, Rowan Tommins [IMSoP] wrote:

> That leaves the question of whether it would ever make sense to nest 
> those blocks (indirectly, e.g. something() itself contains an async{} 
> block, or calls something else which does).
>
> I guess in our analogy, nested blocks could be like running Containers 
> within the currently running OS: they don't actually start a new 
> Scheduler, but they mark a namespace of related coroutines, that can be 
> treated specially in some way.
>
> Alternatively, it could simply be an error, like trying to run the 
> kernel as a userland program.

Support for nested blocks is absolutely mandatory, whatever else we do.  If you 
cannot nest one async block (scheduler instance, coroutine, whatever it is) 
inside another, then basically no code can do anything async except the top 
level framework.

This function needs to be possible, and work anywhere, regardless of whether 
there's an "open" async session 5 stack calls up.

function par_map(iterable $it, callable $c) {
  $result = [];
  async {
foreach ($it as $val) {
  $result[] = $c($val);
}
  }
return $result;
}

However it gets spelled, the above code needs to be supported.

--Larry Garfield


Re: [PHP-DEV] PHP True Async RFC

2025-03-09 Thread Larry Garfield
On Sun, Mar 9, 2025, at 11:56 AM, Edmond Dantes wrote:

> *Let me summarize the current state for today:*
>
>  1. I am abandoning `startScheduler` and the idea of preserving 
> backward compatibility with `await_all` or anything else in that 
> category. The scheduler will be initialized implicitly, and this does 
> not concern user-land. Consequently, the `spawn function()` code will 
> work everywhere and always.
>
>  2. I will not base the implementation on `Fiber` (perhaps only on the 
> low-level part). Instead of `Fiber`, there will be a separate class. 
> There will be no changes to `Fiber` at all. This decision follows the 
> principle of Win32 COM/DCOM: old interfaces should never be changed. If 
> an old interface needs modification, it should be given a new name. 
> This should have been done from the start.
>
>  3. I am abandoning low-level objects in PHP-land (FiberHandle, 
> SocketHandle etc). Over time, no one has voted for them, which means 
> they are unnecessary. There might be a low-level interface for 
> compatibility with Revolt.
>
>  4.   It might be worth restricting microtasks in PHP-land and keeping 
> them only for C code. This would simplify the interface, but we need to 
> ensure that it doesn’t cause any issues.  
>
>
> The remaining question on the agenda: deciding which model to choose — 
> *parent-child* or the *Go-style model*.

As noted, I am in broad agreement with the previously linked article on 
"playpens" (even if I hate that name), that the "go style model" is too 
analogous to goto statements.

Basically, this is asking "so do we use gotos or for loops?"  For which the 
answer is, I hope obviously, for loops.

Offering both, frankly, undermines the whole point of having structured, 
predictable concurrency.  The entire goal of that is to be able to know if 
there's some stray fiber running off in the background somewhere still doing 
who knows what, manipulating shared data, keeping references to objects, and 
other nefarious things.  With a nursery, you don't have that problem... *but 
only if you remove goto*.  A language with both a for loop and an arbitrary 
goto statement gets basically no systemic benefit from having the for loop, 
because neither developers nor compilers get any guarantees of what will or 
won't happen.

Especially when, as demonstrated, the "this can run in the background and I 
don't care about the result" use case can be solved more elegantly with nested 
blocks and channels, and in a way that, in practice, would probably get 
subsumed into DI Containers eventually so most devs don't have to worry about 
it.

Of interesting note along similar lines would be Rust, and... PHP. 

Rust's whole thing is memory safety.  The language simply will not let you 
write memory-unsafe code, even if it means the code is a bit more verbose as a 
result.  In exchange for the borrow checker, you get enough memory guarantees 
to write extremely safe parallel code.  However, the designers acknowledge that 
occasionally you do need to turn off the checker and do something manually... 
in very edge-y cases in very small blocks set off with the keyword "unsafe".  
Viz, "I know what I'm doing is stupid, but trust me."  The discouragement of 
doing so is built into the language, and tooling, and culture.

PHP... has a goto operator.  It was added late, kind of as a joke, but it's 
there.  However, it is not a full goto.  It can only jump within the current 
function, and only "up" control structures.  It's basically a named break.  
While it only rarely has value, it's not al that harmful unless you do 
something really dumb with it.  And then it's only harmful within the scope of 
the function that uses it.  And, very very rarely, there's some 
micro-optimization to be had.  (cf, this classic: 
https://github.com/igorw/retry/issues/3).  But PHP has survived quite well for 
30 years without an arbitrary goto statement.

So if we start from a playpen-like, structured concurrency assumption, which 
(as demonstrated) gives us much more robust code that is easier to follow and 
still covers nearly all use cases, there's two questions to answer:

1. Is there still a need for an "unsafe {}" block or in-function goto 
equivalent?
2. If so, what would that look like?

I am not convinced of 1 yet, honestly.  But if it really is needed, we should 
be targeting the least-uncontrolled option possible to allow for those edge 
cases.  A quick-n-easy "I'mma violate the structured concurrency guarantees, 
k?" undermines the entire purpose of structured concurrency.

> During our discussion, everything seems to be converging on the idea 
> that the changes introduced by the RFC into `Fiber` would be better 
> moved to a separate class. This would reduce confusion between the old 
> and new solutions. That way, developers wouldn't wonder why `Fiber` and 
> coroutines behave differently—they are simply different classes.
> The new *Coroutine* class could have a different interface w

Re: [PHP-DEV] RFC: short and inner classes

2025-03-09 Thread Rob Landers
On Thu, Mar 6, 2025, at 09:04, Tim Düsterhus wrote:
> Hi
> 
> Am 2025-03-06 07:23, schrieb Rob Landers:
> > So, technically, they aren’t required to be in the same RFC; but also, 
> > they complement each other very well.
> 
> They really should be separate RFCs then. Your RFC text acknowledges 
> that in the very first sentence: “two significant enhancements to the 
> language”. Each individual proposal likely has sufficient bike-shedding 
> potential on its own and discussion will likely get messy, because one 
> needs to closely follow which of the two proposals an argument relates 
> to.

I put a lot of thought into this issue off and on, all day. I've decided to 
remove short syntax from the RFC and focus on inner classes. If this passes, 
then I will propose it as a separate RFC. Introducing them concurrently makes 
little sense in light of the feedback I have gotten so far, and it is turning 
out that there is much more to discuss than I initially expected.

Thus, I will skip replying about short classes.

> 
> As for the “Inner classes” proposal:
> 
> - “abstract is not allowed as an inner class cannot be parent classes.” 
> - Why?

This is mostly a technical reason, as I was unable to determine a grammar rule 
that didn't result in ambiguity. Another reason is to ensure encapsulation and 
prevent usages outside their intended scope. We can always add it later.

> - “type hint” - PHP does not have type hints, types are enforced. You 
> mean “Type declaration”.

Thank you for pointing this out! I learned something new today! I've updated 
the RFC.

> - “this allows you to redefine an inner class in a subclass, allowing 
> rich hierarchies” - The RFC does not specify if and how this interacts 
> with the LSP checks.

It doesn't affect LSP. I've updated the RFC accordingly.

On Thu, Mar 6, 2025, at 20:08, Niels Dossche wrote:
> Hi Rob
> 
> Without looking too deep (yet) into the details, I'm generally in favor of 
> the idea.
> What I'm less in favor of is the implementation choice to expose the inner 
> class as a property/const and using a fetch mode to grab it.
> That feels quite weird to me honestly. How did you arrive at this choice?
> 
> Kind regards
> Niels

It's a slightly interesting story about how I arrived at this particular 
implementation. If you noticed the branch name, this is the second 
implementation. The first implementation used a dedicated list on the 
class-entry for inner classes. Since I wanted to prevent static property/consts 
from being declared with the same name, I had just set it to a string of the 
full class name as a placeholder. That implementation also required some pretty 
dramatic OPcache changes, which I didn't like. At one point, I went to add the 
first test that did `new Outer::Inner()` and the test passed... 

You can imagine my surprise to see a test pass that I had expected to fail, and 
it was then that I went into the details of what was going on. Any `new 
ClassName` essentially results in the following AST:

ZEND_AST_NEW
-- ZEND_AST_ZVAL
-- "ClassName"
-- (... args)

The original grammar, at the time, was to reuse the existing static property 
access AST until I could properly understand OPcache/JIT. My change had 
resulted in (approximately) this AST:

ZEND_AST_NEW
-- ZEND_AST_ZVAL
-- ZEND_AST_STATIC_PROP
-- "Outer::Inner"
-- (... args)

Which, effectively resulted in emitting opcodes that found the prop + string 
value I happened to put there as a placeholder until I figured out a better 
solution, handling autoloading properly and everything. This pretty much 
negated all efforts up to that point, and I was stunned.

So, I branched off from an earlier point and eventually wrote the version you 
see today. It's 1000x simpler and faster than the original implementation 
(literally), since it uses all pre-existing (optimized)) infrastructure instead 
of creating entirely new infrastructure. It doesn't have to check another 
hashmap (which is slow) for static props vs. constants vs. inner classes. 

In essence, while the diff can be improved further, it is quite simple; the 
core of it is less than 500 lines of code.

I'd recommend leaving any comments about the PR on the PR itself (or via 
private email if you'd prefer that). I'm by no means an expert on this code 
base, and if it is not what you'd expect, being an expert yourself, I'd love to 
hear any suggestions for improvements or other approaches.

On Thu, Mar 6, 2025, at 20:33, Larry Garfield wrote:
> My biggest concern with this is that it makes methods and short-classes 
> mutually incompatible.  So if you have a class that uses short-syntax, and as 
> it evolves you realize it needs one method, sucks to be you, now you have to 
> rewrite basically the whole class to a long-form constructor.  That sucks 
> even more than rewriting a short-lambda arrow function to a long-form 
> closure, except without the justification of capture semantics.

I literally fell out of my chair laugh