On Fri, Jul 29, 2022 at 7:15 AM mickmackusa <mickmack...@gmail.com> wrote:

>
>
> On Monday, July 25, 2022, Guilliam Xavier <guilliam.xav...@gmail.com>
> wrote:
>
>> On Sat, Jul 9, 2022 at 1:56 AM mickmackusa <mickmack...@gmail.com> wrote:
>>
>>> I've discovered that several native string functions offer a character
>>> mask
>>> as a parameter.
>>>
>>> I've laid out my observations at
>>> https://stackoverflow.com/q/72865138/2943403
>>>
>>
>> Out of curiosity, why do you say that strtr() is "not a good candidate
>> because character order matters" (although you give a reasonable example)?
>> Maybe you have some counter-example?
>>
>> Regards,
>>
>> --
>> Guilliam Xavier
>>
>
> I prefer to keep my scope very tight when posting on Stack Overflow.
>
> My focus was purely on enabling character range syntax for native
> functions with character mask parameters.  My understanding of character
> masks in PHP requires single-byte characters and no meaning to character
> order.
>
> When strtr() is fed two strings, they cannot be considered "character
> masks" because the character orders matter.
>
> If extending character range syntax to parameters which are not character
> masks, I might support the feature for strtr(), but ensuring that the two
> strings are balanced will be made more difficult with ranged syntax.
> strtr() will silently condone imbalanced strings.  https://3v4l.org/PY15F
>

Thanks for the clarifications. You're right that the internal
`php_charmask` converts a character list (possibly containing one or more
ranges) into a 256-char *mask*, thus "losing" any original order; so
strtr() actually couldn't use the same implementation (even without
ranges), and a counter-example is `strtr('adobe', 'abcde', 'ebcda')`
(`strtr('adobe', 'a..e', 'e..a')` would trigger a Warning "Invalid
'..'-range, '..'-range needs to be incrementing").

I had seen a parallel with the Unix `tr` command, which *does* support
[incrementing] ranges (e.g. both `echo adobe | tr abcde ABCDE` and `echo
adobe | tr a-e A-E` give "ADoBE", while `echo adobe | tr abcde edcba` gives
"eboda" but `echo adobe | tr a-e e-a` errors "range-endpoints of 'e-a' are
in reverse collating sequence order"), but its implementation doesn't use
character masks indeed (
https://github.com/coreutils/coreutils/blob/master/src/tr.c), and `echo
abracadabra | tr a-f x` gives "xxrxxxxxxrx" not "xbrxcxdxbrx"; and it also
supports more things like POSIX character classes...

PS: I find the `strtr(string $string, array $replace_pairs)` form generally
superior to the `strtr(string $string, string $from, string $to)` one
anyway ;)

Regards,

-- 
Guilliam Xavier

Reply via email to