On Sat, Dec 5, 2020 at 12:25 AM Larry Garfield <la...@garfieldtech.com>
wrote:

> Greetings, denizens of Internals!
>
> Ilija Tovilo and I have been working for the last few months on adding
> support for enumerations and algebraic data types to PHP.  This is a
> not-small task, so we've broken it up into several stages.  The first
> stage, unit enumerations, are just about ready for public review and
> discussion.
>
> The overarching plan (for context, NOT the thing to comment on right now)
> is here: https://wiki.php.net/rfc/adts
>
> The first step, for unit enumerations, is here:
>
> https://wiki.php.net/rfc/enumerations
>
> There's still a few bits we're sorting out and the implementation is
> mostly done, but not 100% complete.  Still, it's far enough along to start
> a discussion on and get broader feedback on the outstanding nits.
>
> I should note that while the design has been collaborative, credit for the
> implementation goes entirely to Ilija.  Blame for any typos in the RFC
> itself go entirely to me.
>
> *dons flame-retardant suit*
>

Thanks for the proposal Ilija and Larry! Enums are long overdue. Some
initial thoughts, in no particular order:

* I think that serialization support needs to be part of the initial
proposals. Otherwise you wouldn't be able to store an enum in a property
(without poisoning the whole object graph as "non-serializable"). I expect
that a clean solution to this will require a new serialization modifier,
but don't think this should be overly hard to add. This should also not
introduce any serialization format compatibility concerns as long as it is
introduced in the same version that enums are, as any payloads using the
new format would only be meaningful on PHP versions that support enums.

* As "enum" becomes a reserved keyword, you can' have an interface called
"Enum"... If you wanted to, you could probably avoid a reserved keyword by
taking a page out of C++'s book and making the syntax "enum class Foo {}"
rather than "enum Foo {}", where "enum" would be a contextual keyword. I
think this is worth at least considering, because I expect that there's a
lot of existing enum libraries that will break due to this reserved
keyword. While they can now be replaced by native enums, this will cause
issues during migration and with code that is compatible with more than one
PHP version.

* Rather than WeakMap, the possibly more natural choice for using enum keys
is SplObjectStorage. Of course, SplObjectStorage, like anything that is
part of SPL, has some peculiarities... Of course, just allowing them as
array keys would be ideal, but I agree that this should not be covered by
this RFC. This is something I may look into.

* While not mentioned in the RFC, you mentioned in this discussion that
enum cases cannot be stored in constants:

> At present, no.  They're "just" objects, and you can't assign an object
to a constant.  Unfortunately I'm not sure how to enable that without
making them not-objects, which introduces all sorts of other complexity.

This should at the least be clarified in the RFC. Is this a limitation for
constants specifically, or anything using constexpr initializers? For
example, is writing "function foo(MyEnum $e = MyEnum::SomeDefaultCase)"
possibly? It would be a significant limitation if it weren't.

Generally, I think the limitation on objects in constants is mostly
artificial and you should consider lifting it as part of this RFC.
Previously you simply couldn't create an object in a constexpr initializer,
so supporting this wasn't really relevant. With enums, this becomes
important.

* This has already been mentioned by others, but (in conjunction with the
previous point), I think that allowing class constants on enums is pretty
useful to allow case aliases. I agree that cases should be unique to start
with, but aliases should be possible, and class constants provide a very
neat way to provide this.

* While I originally liked the idea, after perusing the examples in the
RFC, I am not convinced that it is a good idea to allow methods (or
anything else) to be defined on a per-case level. Having methods on the
enum itself makes sense, but having them on each case seems like it
unnecessarily complicates the design, gives people multiple ways to write
the same thing and may encourage bad design.

I think there are two primary ways in which methods might be used: First,
defining a method for each case, such as your example in
https://wiki.php.net/rfc/enumerations#advanced_exclusive_values. In this
case you have the choice of either defining it on each case, or to define
it as a method on the whole enum. When would you choose one approach over
the other?

Defining the method on the whole enum seems generally superior to me,
because it guarantees that all cases have the method from an API
perspective (rather than just making it an incidental fact -- though I
guess you could add an abstract method to the enum?) Additionally it
requires a lot less code, especially if match is used. The example in the
RFC is even a bit skewed, because once PSR gets its dirty fingers on this
feature, all those "{" will get broken out on a new line and it will take
even more code.

The other usage is if a method is only defined by *some* of the cases, as
in the https://wiki.php.net/rfc/enumerations#state_machine example. This is
something I find very dubious from a design perspective, and not something
I would like to enable by making it simpler to implement. Do you know if
other languages have precedent for methods on individual enum cases?

* On the implementation side, a general concern I have is that this
requires generating a new class not just for each enum, but for each enum
case. Some usages of enums (say lexer tokens, AST node kinds, etc) may
require hundreds of enum cases, and will generate hundreds of separate
classes. Unlike objects, class entries are not cheap and are not designed
to be cheap.

* As another implementation note, existing switch/match jumptable
optimizations will not work for enums. This is pretty unfortunate, but I
don't have an idea on how we could make them work.

* I find the automatic downcast of enums to their scalar values a bit
problematic when taking the overall direction of the language into account.
We want less implicit casts, not more. While I'm sure this will work nicely
in some cases, it certainly won't in others. I daresay that passing an enum
to the $offset parameter of substr() doesn't make sense regardless of
whether the enum has an int backing it or not. Explicitly requiring a
->value() call doesn't seem like an undue burden to me.

Regards,
Nikita

Reply via email to