On Fri, Sep 17, 2021, at 8:49 PM, tyson andre wrote: > > > Improving collection/set operations in PHP is something near and dear to my > > heart, > > so I'm in favor of adding a Vector class or similar to the stdlib. > > > > However, I am not a fan of this particular design. > > > > As Levi noted, this being a mutable object that passes by handle is asking > > for trouble. > > It should either be some by-value internal type, or an immutable object > > with evolver methods on it. > > (E.g., add($val): Vector). Making it a mutable object is creating spooky > > action at a distance problems. > > An immutable object seems likely easier to implement than a new type, > > but both are beyond my capabilities so I defer to those who could do so. > > https://wiki.php.net/rfc/vector#adding_a_native_type_instead_is_vec > discusses why I'm doubtful of `is_vec` getting implemented or passing. > Especially with `add()` taking linear time to copy all elements of the > existing value if you mean an array rather than a linked list-like > structure, and any referenced copies taking a lot more memory than an > imperative version would. > > > PHP's end users and internals members come from a wide variety of > backgrounds, > and I assume most beginning or experienced PHP programmers would tend > towards imperative&mutable programming rather than functional&immutable > programming. > > PHP provides tools such as `clone`, private visibility, etc to deal with that. > > The lack of any immutable object datastructures in core and the lack of > immutable focused extensions in > PECL https://pecl.php.net/package-search.php?pkg_name=immutable > https://www.php.net/manual-lookup.php?pattern=immutable&scope=quickref > (other than DateTimeImmutable) > heavily discourage me from proposing anything immutable. > > (Technically, https://github.com/TysonAndre/pecl-teds has minimal > implementations of immutable data structures, but the api is still > being revised and Vector is the primary focus, followed by iterable > functions. e.g. there's no `ImmutableSequence::add($value): > ImmutableSequence` method.) > > > > The methods around size control are seemingly pointless from a user POV. > > setSize is useful in allocating exactly the variable amount of memory > needed while using less memory than a PHP array. > `setSize($newSize, 0)` would be much more efficient and concise in > initializing the value. > > - Or in quickly reducing the size of the array rather than repeatedly > calling pop in a loop. > > And while methods around capacity control exist in many other > programming languages, they aren't used by most users and most users > are fine with functionality they don't use existing. > The applications or libraries that do have a good use case to reduce > memory will take advantage of them and end users of those > applications/libraries would benefit from the memory usage reduction. > > > I understand the memory optimization value they have, but that's not > > something PHP developers are at all used to dealing with. > > That makes it less of a convenient drop-in replacement for array and more > > just another user-space collection object, but in C with internals > > endorsement. > > If such logic needs to be included, it should be kept as minimalist as > > possible for usability, > > even at the cost of a little memory usage in some cases. > > If the functionality was just a drop-in replacement for array, others > may say "why not just use array and the array libraries?" (or Vector). > With the strategy of doubling capacity, it can be up to 99% more memory > than needed in some cases (Even more wastage after shrinking from the > maximum size). > > > There is no reason to preserve keys. > > A Vector or list type should not have user-defined keys. > > It should just be a linear list. If you populate it from an existing > > array/iterable, the keys should be entirely ignored. > > If you care about keys you want a HashMap or Dictionary or similar (which > > we also desperately need in the stdlib, but that's a separate thing). > > The behavior is similar to > https://www.php.net/manual/en/splfixedarray.fromarray.php > It tries to preserve the keys, and fills in gaps with null. > > 1. There's the consistency with existing functionality such as > SplFixedArray::fromArray, or existing constructors preserving keys. > 2. And I'd imagined that a last minute objection of "Wait, `new > SplFixedArray([1 => 'second', 0 => 'first'])` does what by default? > Isn't this using the keys 0 and 1?", and the same for gaps > > I was considering only having the no-param constructor, and adding > the static method fromValues(iterable $it) to make it clearer keys are > ignored. > > > Whether or not contains() needs a comparison callback in my mind depends > > mainly on whether or not the operator overloading RFC passes. > > If it does, then contains() can/should use the __compareTo() method on > > objects. > > If it doesn't, then there needs to be some other way to compare > > non-identical objects or else that method becomes mostly useless. > > There's a distinction between needs and very nice to have - a contains > check for some predicate on a Vector can be done with a userland helper > method and a foreach. > > Also, you're requesting functionality that I don't believe is currently > available for arrays, either. > > > To echo Pierre, a Vector needs to be of a single guaranteed type. > > Yes, this gets us back to the generics conversation again, but I presume > > (perhaps naively?) there are ways to address this question without getting > > into full-blown generics. > > Yep, as you said, this type is mixed, just like the SplFixedArray, > ArrayObject, values of SplObjectStorage/WeakMap, etc. > Generic support is something that's been brought up before, > investigated, then abandoned. > > My concerns with adding StringVector, MixedVector, IntVector, > FloatVector, BoolVector, ArrayVector (confusing), ObjectVector, etc is > that > > - I doubt many people would agree that there's a wide use case for any > specific one of them compared to a vector of any type. > > This would be even harder to argue for than just a single Vector type. > - Mixes of null and type `T` might make sense in many cases (e.g. > optional objects, statistics that failed to get computed, etc) but > would be forbidden by that > - It would be a bad choice if generic support did get added in the > future. > > I'm not sure if we're thinking of the same thing. > Could you provide more details on how that would be implemented? Have > other PECLs done something similar? > > > But really, a non-type-guaranteed Vector/List construct is of fairly little > > use to me in practice, and that's before we even get into the potential > > performance optimizations for map() and filter() from type guarantees. > > See earlier comments on `vec`/Generics not being actively worked on > right now and probably being a far way away from an implementation that > would pass a vote. > > As for optimizations, opcache currently doesn't optimize individual > global functions (let alone methods), it optimizes opcodes. > Even array_map()/array_filter() aren't optimized, they call callbacks > in an ordinary way. > E.g. https://github.com/php/php-src/pull/5588 or > https://externals.io/message/109847 regarding ordinary methods. > > Aside: In the long term, I think the opcache core team had a long-term > plan of changing the intermediate representation to make these types of > optimizations feasible without workarounds like the one I proposed in > 5588 > > > I can write a type-guaranteed user-space class that does what I need in > > under 10 minutes, and for most low cardinality sets that's adequately > > performant. A built-in needs to be better than that. > > > > I very much appreciate the chicken-and-egg challenge of wanting to get > > something useful in despite the absence of a larger plan, and also the > > challenge of getting buy-in on a larger plan. > > Really. :-) This is an area where PHP's current dev process is very lacking. > > Still, I also agree with others that we need to be thinking holistically > > about this problem space, which will inform what the steps are. > > The approach we took for enums could be a model to consider (multiple RFCs > > clustered together under an RFC "epic".) > > That would allow for a long-term design, and the influence that offers, > > while still having milestones along the way that offer value unto > > themselves. (I'm happy to help with that, since that's about all I'm good > > for around here. :-) ) > > Enums were extensions of existing class types (is_object(Suit::Hearts) > is true) rather than adding a whole separate type to the type system > and don't need to support generics or contain anything other than an > int/string. > I don't think the choice of "epic" widely influenced the vote.
Rather than go point by point, I'm going to respond globally here. I am frequently on-record hating on PHP arrays, and stating that I want something better. The problems with PHP arrays include: 1. They're badly performing (because they cannot be optimized) 2. They're not type safe 3. They're mutable 4. They mix sequences (true arrays) with dictionaries/hashmaps, making everything uglier 5. People keep using them as structs, when they're not 6. The API around them is procedural, inconsistent, and overall gross 7. They lack a lot of native shorthand operations found in other languages (eg, slicing) 8. Their error handling is crap Any new native/stdlib alternative to arrays needs to address at least half of those issues, preferably most/all. This proposal addresses the first point and... that's it. Point 5 is sort of covered by virtue of being out of scope, so maybe this covers 1.5 out of 8. That's insufficient to be worth the effort to support and deal with in code. That makes this approach a strong -1 for me. "Fancy algorithms are slow when n is small, and n is usually small." -- Rob Pike That some of the design choices here mirror existing poor implementations is not an endorsement of them. I don't think I've seen anyone on this list say anything good about SPL beyond iterators and autoloading, so it's not really a good model to emulate. Additionally, please don't play into the trope about procedural/mutable code being more beginner friendly. That's not the case, beyond being a self-fulfilling prophesy. (If we teach procedural/mutable code first, then most beginners will be most proficient in procedural/mutable code.) I would argue that, on the whole, immutable values make code easier to reason about and write once you get past trivially small sizes. We do new developers a gross disservice by treating immutability as an "advanced" technique, when it should really be the default, beginner technique taught from day one. I am not aware of any PECL implementations of lists that have type safety, because I don't use many PECL packages. However, in user space it's quite simple to do: https://presentations.garfieldtech.com/slides-never-use-arrays/phpkonf2021/#/5/2 See that slide and scroll down for additional examples. Every one of those examples took me less than 5 minutes to write. If we want to have a better alternative in core, it needs to be *at least* as capable as what I can throw together in 5 minutes. The proposal as-is is not even as capable as those examples. --Larry Garfield -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: https://www.php.net/unsub.php