Hi internals,

I've created a new RFC https://wiki.php.net/rfc/deque to add a `final class 
Deque`

This is based on the `Teds\Deque` implementation I've worked on
for the https://github.com/TysonAndre/pecl-teds PECL.

While `SplDoublyLinkedList` and its subclass `SplQueue`/`SplStack` exist in the 
SPL, they have several drawbacks
that are addressed by this RFC to add a `Deque` class (to use instead of those):

1. `SplDoublyLinkedList` is internally represented by a doubly linked list,
   making it use roughly twice as much memory as the proposed `Deque`
2. `push`/`pop`/`unshift`/`shift` from `SplDoublyLinkedList` are slower due to
   needing to allocate or free the linked list nodes.
3. Reading values in the middle of the `SplDoublyLinkedList` is proportional to 
the length of the list,
   due to needing to traverse the linked list nodes.
4. `foreach` Iteration behavior cannot be understood without knowing what 
constructed the
   `SplDoublyLinkedList` instance or set the flags.

It would be useful to have an efficient `Deque` container in the standard 
library
to provide an alternative without those drawbacks,
as well as for the following reasons:

1. To save memory in applications or libraries that may need to store many 
lists of values or run for long periods of time.
   Notably, PHP's `array` type will never release allocated capacity.
   See https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html
2. To provide a better alternative to `SplDoublyLinkedList`, `SplStack`, and 
`SplQueue`
   for use cases that require stacks or queues.
3. As a more efficient option than `array` and `SplDoublyLinkedList`
   as a queue or `Deque`, especially for `unshift`.

A `Deque` is more efficient than an `array` when used as a queue, more 
readable, and easier to use correctly.
While it is possible to efficiently remove elements from the start of an 
`array` (in terms of insertion order),
it is very inefficient to prepend elements to the start of a large `array` due 
to needing to either copy the array
or move all elements in the internal array representation,
and an `array` would use much more memory than a `Deque` when used that way 
(and be slower).

There are also several pitfalls to using an array as a queue for larger queue 
sizes,
some of which are not obvious and discovered while writing the benchmarks.
(Having a better (double-ended) queue datastructure (`Deque`) than the 
`SplDoublyLinkedList`
would save users from needing to write code with these pitfalls):

1. `array_key_first()` takes time proportional to the number of elements 
`unset` from the start of an array,
   causing it to unexpectedly be extremely slow (quadratic time) after 
unsetting many elements at the start of the queue.
   (when the array infrequently runs out of capacity, buckets are moved to the 
front)
2. `reset()` or `end()` will convert a variable to a reference,
   and php is significantly less efficient at reading or writing to reference.
   Opcache is also less efficient at optimizing uses of variables using 
references.
3. More obviously, `array_unshift` and `array_shift` will take time 
proportional to the number of elements in the array
   (to reindex and move existing/remaining elements).

After the discussion period ends, I currently plan to start voting on the 
`Deque` RFC and await the results
to determine next steps for the `Vector` RFC.

----

The thread for my other open proposal `final class Vector` 
(https://externals.io/message/116048) has prior discussion on implementation 
choices,
naming choices, and motivation for adding datastructures to php-src.

- E.g. the question of why we should add general-purpose datastructures
  to php-src itself rather than have users rely on PECLs,
  and why this proposal doesn't and can't use `php-ds`/`ext-ds`.

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Reply via email to