On Thu, 30 Mar 2023 at 03:50, Mark Baker <m...@demon-angel.eu> wrote:

> On 28/03/2023 00:36, G. P. B. wrote:
> > Hello internals,
> >
> > While working on analysing the impact of the changes proposed by amending
> > the behaviour of the increment and decrement operators (
> > https://wiki.php.net/rfc/saner-inc-dec-operators) I discovered that the
> > range() function has some rather lax behaviour that is very unintuitive.
> >
> > I therefore propose the "Define proper semantics for range() function"
> RFC
> > to address the unintuitive behaviour that sees no usage and/or hide bugs:
> > https://wiki.php.net/rfc/proper-range-semantics
> >
> > The change propose to throw TypeErrors and ValueErrors for case where I
> > couldn't find occurrences in the wild and hide bugs, and emit some
> > E_WARNINGs for cases that are hard to detect via static analysis.
>
> Unlike your changes to the increment operator, I'd love to see this
> rationalisation put in place, though like many here I don't see problems
> with using a negative step with decreasing ranges, but would consider it
> strange for increasing ranges.


I still find it somewhat odd, but this is not a hill I'm going to die on.
I've changed the behaviour to throw a ValueError if a negative step is
provided with increasing range and accept negative steps for decreasing
ranges.
Furthermore, I've also made passing an empty string an E_WARNING with a
cast to 0, same as the current behaviour.

See new version:
https://wiki.php.net/rfc/proper-range-semantics


> And I do want to see some
> case-consistency when working with string ranges.
>
>
> I'd love to see it taken a stage (or two) further; returning an iterable
> rather than an array (although that would be a bc break); and working
> with strings (ASCII only) in the same way that the increment operator
> does, so that range('A', 'IV') would be valid, and return `Z` then `AA`,
> `AZ` then `BA`, etc.
>

Frankly I was also surprised that the behaviour with strings was to do an
ASCII code point increment.
As I would agree that range("Y", "AC") returning ["Y", "Z", "AA", "AB",
"AC"] would have been more intuitive than the silently discarding
everything past the 1st byte.
However, I don't think there is much point in breaking BC to return a
possible generator or fix the unfortunate string behaviour.
I would rather that PHP creates dedicated syntax to creates ranges (e.g.
$s..$e seems to be what most other programming languages settles on,
although it might be slightly confused as concatenation) à la Ruby which
allows objects that implement certain methods to also be used to generate
ranges.
This is IMHO way more powerful as it would allow the creation of Date
ranges or other custom ranges.
And part of this proposal could be to support the aforementioned
alphabetical string ranges natively without needing to break BC on range()
and let this function just fade away into obscurity.

There is also this C++ talk from over a decade ago that argues that Ranges
are better than iterator, so this might be an additional motivation as to
why we would want this:
https://accu.org/conf-docs/PDFs_2009/AndreiAlexandrescu_iterators-must-go.pdf


> I am slightly surprised that you make no mention of the odd behaviour of
> mixed alphameric strings, e.g. var_dump(range('A1', 'C5')) which returns
> a purely alpha array 'A' to 'C'; or var_dump(range('3c', '5e')) which
> returns numeric (3, 4, 5); or var_dump(range('1', '1e2')) which treates
> `1e2` as scientific and returns 1..100.
>

Because I didn't think of this and was just well usual numeric string
behaviour or non-numeric string behaviour that truncates the string.
But that range('3c', '5e') is the only way to get an array of digits as
strings, and it makes me want to shout into the abyss.
I'm not sure it super worth to mention those cases, but I can add examples
of this to the RFC after crying about the even more insane behaviour
range() currently has.

Best regards,

George P. Banyard

Reply via email to