Re: [PHP-DEV] [RFC] Desire to move RFC add_str_begin_and_end_functions to a vote

will Fri, 28 Jun 2019 13:55:49 -0700

These are good points. Originally my RFC called for less functions butbased on feedback I added the others. My proposal: take the RFC as-is toa vote. If it fails, I will raise another RFC for a vote that will justcontain the two basic functions: str_begins and str_ends.


Thanks,


Will

On 2019-06-22 15:56, Ben Ramsey wrote:

On Jun 22, 2019, at 10:32, Nikita Popov <nikita....@gmail.com> wrote:
On Thu, Jun 20, 2019 at 12:32 AM <w...@wkhudgins.info> wrote:
I sent this earlier this week without [RFC] in the subjectline...since
some people might have filters to check the subject line I wanted to
send this again with the proper substring in the subject line–to makeit
clear I intend to take this to a vote in two weeks. Apologies for the
duplicate email.

-Will
On 2019-06-18 14:45, w...@wkhudgins.info wrote:
Hello all,
I submitted this RFC several years ago. I collected a lot offeedbackand I have updated the RFC and corresponding github patch. Pleasesee
the RFC at https://wiki.php.net/rfc/add_str_begin_and_end_functions
and the github patch at https://github.com/php/php-src/pull/2049. I
have addressed many concerns
(order of arguments, name of functions, multibye support, etc). Iplan
to move this RFC to a vote in the coming weeks.

Thanks,

Will
Unfortunately, this looks like a case where the RFC feedback has madethe
proposal worse, rather than better :(
I think it's easier to start with what I think this proposal shouldbe:There should be just two functions, str_starts_with() andstr_ends_with()
-- and that's it.
The important realization to have here is that these functions are abit of
sugar for an operation that is quite common, but can also be easily
implemented with existing functions (using strcmp, strpos or substr,
depending on what you like). There is no need for us to cover every
conceivable combination, just make the common case more convenient and
easier to read.

With that in mind:
* I believe the "starts with" and "ends with" naming is a lot more
canonical, used by Python, Ruby, Java, JavaScript and probably lotsmore.* In my experience case-insensitive "i" variants of strings functionsare
used much less, by an order of magnitude. With this being sugar in the
first place, I don't think there's a need to cover case-insensitive
variations (and from a quick look, these don't seem to be first class
methods in other languages either). If we do want to have them, I'dsuggestmaking the names str_starts_with_ci() and str_ends_with_ci(), which ismore
obvious and harder to miss than str_istarts_with() etc.
* Having mb_* variants of these functions doesn't really make sense. I
realize that there's this knee-jerk reaction about how if it doesn'thave"mb" in the name it's not Unicode compatible, but in this case it'sevenmore wrong than usual. The normal str_starts_with() function isperfectly
safe to use on UTF-8 strings, the only difference between it and
mb_str_starts_with() is that it's going to be implemented a lot more
efficiently. The only case that *might* make some sense is the
case-insensitive variant here, because that has some genuine relianceonthe character encoding. But then again, this can be handled bycase-foldingthe strings first, something that mbstring is going to do internallyanyway.
I would happily accept a proposal for str_starts_with() +str_ends_with(),
but I'm a lot more apprehensive about adding these 8 new functions.

Regards,
Nikita
I like the idea of simplifying this to the two functions
str_starts_with() and str_ends_with().

When I was looking through this the other day, I had trouble coming up
with an example of a string with the mb_* versions would ever generate
a different result from the non-multibyte versions, since the
implementation only needs to count and analyze bytes for uniqueness.
Perhaps it would only be an issue with the case-insensitive versions,
as Nikita points out? If so, can someone provide some example strings
where an mb_starts_with_ci() would return true, while
str_starts_with_ci() would return false?

I think the case sensitivity versions would be common enough in use
cases (i.e. looking to see if a path ends with .CSV vs. .csv, etc.),
but maybe the signatures could be revised to pass a third parameter?

str_starts_with($haystack, $needle, $case_sensitive = true): bool

-Ben


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] [RFC] Desire to move RFC add_str_begin_and_end_functions to a vote

Reply via email to