On Fri, Jun 28, 2019 at 10:54 PM <w...@wkhudgins.info> wrote:

> These are good points. Originally my RFC called for less functions but
> based on feedback I added the others. My proposal: take the RFC as-is to
> a vote. If it fails, I will raise another RFC for a vote that will just
> contain the two basic functions: str_begins and str_ends.
>

To put my comments into more actionable form, here is what I would
recommend for this RFC:

 * Rename str_begins -> str_starts_with, str_ends -> str_ends_with,
str_ibegins -> str_starts_with_ci, str_iends -> str_ends_with_ci. As
mentioned before, this is standard terminology used by many, many
programming languages and it would be great if PHP did not deviate from
convention without strong reason.
 * Have a separate vote (in the same RFC) for the addition of the
corresponding mb_* variants.

I believe doing those two changes will ensure that the core part of the RFC
passes. I personally would be voting yes on the first part and no on the
second, but others may decide as they see fit.

Nikita


> On 2019-06-22 15:56, Ben Ramsey wrote:
> >> On Jun 22, 2019, at 10:32, Nikita Popov <nikita....@gmail.com> wrote:
> >>
> >>> On Thu, Jun 20, 2019 at 12:32 AM <w...@wkhudgins.info> wrote:
> >>>
> >>> I sent this earlier this week without [RFC] in the subject
> >>> line...since
> >>> some people might have filters to check the subject line I wanted to
> >>> send this again with the proper substring in the subject line–to make
> >>> it
> >>> clear I intend to take this to a vote in two weeks. Apologies for the
> >>> duplicate email.
> >>>
> >>> -Will
> >>>
> >>>> On 2019-06-18 14:45, w...@wkhudgins.info wrote:
> >>>> Hello all,
> >>>>
> >>>> I submitted this RFC several years ago. I collected a lot of
> >>>> feedback
> >>>> and I have updated the RFC and corresponding github patch. Please
> >>>> see
> >>>> the RFC at https://wiki.php.net/rfc/add_str_begin_and_end_functions
> >>>> and the github patch at https://github.com/php/php-src/pull/2049. I
> >>>> have addressed many concerns
> >>>> (order of arguments, name of functions, multibye support, etc). I
> >>>> plan
> >>>> to move this RFC to a vote in the coming weeks.
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Will
> >>>
> >>
> >> Unfortunately, this looks like a case where the RFC feedback has made
> >> the
> >> proposal worse, rather than better :(
> >>
> >> I think it's easier to start with what I think this proposal should
> >> be:
> >> There should be just two functions, str_starts_with() and
> >> str_ends_with()
> >> -- and that's it.
> >>
> >> The important realization to have here is that these functions are a
> >> bit of
> >> sugar for an operation that is quite common, but can also be easily
> >> implemented with existing functions (using strcmp, strpos or substr,
> >> depending on what you like). There is no need for us to cover every
> >> conceivable combination, just make the common case more convenient and
> >> easier to read.
> >>
> >> With that in mind:
> >> * I believe the "starts with" and "ends with" naming is a lot more
> >> canonical, used by Python, Ruby, Java, JavaScript and probably lots
> >> more.
> >> * In my experience case-insensitive "i" variants of strings functions
> >> are
> >> used much less, by an order of magnitude. With this being sugar in the
> >> first place, I don't think there's a need to cover case-insensitive
> >> variations (and from a quick look, these don't seem to be first class
> >> methods in other languages either). If we do want to have them, I'd
> >> suggest
> >> making the names str_starts_with_ci() and str_ends_with_ci(), which is
> >> more
> >> obvious and harder to miss than str_istarts_with() etc.
> >> * Having mb_* variants of these functions doesn't really make sense. I
> >> realize that there's this knee-jerk reaction about how if it doesn't
> >> have
> >> "mb" in the name it's not Unicode compatible, but in this case it's
> >> even
> >> more wrong than usual. The normal str_starts_with() function is
> >> perfectly
> >> safe to use on UTF-8 strings, the only difference between it and
> >> mb_str_starts_with() is that it's going to be implemented a lot more
> >> efficiently. The only case that *might* make some sense is the
> >> case-insensitive variant here, because that has some genuine reliance
> >> on
> >> the character encoding. But then again, this can be handled by
> >> case-folding
> >> the strings first, something that mbstring is going to do internally
> >> anyway.
> >>
> >> I would happily accept a proposal for str_starts_with() +
> >> str_ends_with(),
> >> but I'm a lot more apprehensive about adding these 8 new functions.
> >>
> >> Regards,
> >> Nikita
> >
> >
> > I like the idea of simplifying this to the two functions
> > str_starts_with() and str_ends_with().
> >
> > When I was looking through this the other day, I had trouble coming up
> > with an example of a string with the mb_* versions would ever generate
> > a different result from the non-multibyte versions, since the
> > implementation only needs to count and analyze bytes for uniqueness.
> > Perhaps it would only be an issue with the case-insensitive versions,
> > as Nikita points out? If so, can someone provide some example strings
> > where an mb_starts_with_ci() would return true, while
> > str_starts_with_ci() would return false?
> >
> > I think the case sensitivity versions would be common enough in use
> > cases (i.e. looking to see if a path ends with .CSV vs. .csv, etc.),
> > but maybe the signatures could be revised to pass a third parameter?
> >
> > str_starts_with($haystack, $needle, $case_sensitive = true): bool
> >
> > -Ben
>

Reply via email to