On Fri, Jun 28, 2019 at 10:54 PM <w...@wkhudgins.info> wrote: > These are good points. Originally my RFC called for less functions but > based on feedback I added the others. My proposal: take the RFC as-is to > a vote. If it fails, I will raise another RFC for a vote that will just > contain the two basic functions: str_begins and str_ends. >
To put my comments into more actionable form, here is what I would recommend for this RFC: * Rename str_begins -> str_starts_with, str_ends -> str_ends_with, str_ibegins -> str_starts_with_ci, str_iends -> str_ends_with_ci. As mentioned before, this is standard terminology used by many, many programming languages and it would be great if PHP did not deviate from convention without strong reason. * Have a separate vote (in the same RFC) for the addition of the corresponding mb_* variants. I believe doing those two changes will ensure that the core part of the RFC passes. I personally would be voting yes on the first part and no on the second, but others may decide as they see fit. Nikita > On 2019-06-22 15:56, Ben Ramsey wrote: > >> On Jun 22, 2019, at 10:32, Nikita Popov <nikita....@gmail.com> wrote: > >> > >>> On Thu, Jun 20, 2019 at 12:32 AM <w...@wkhudgins.info> wrote: > >>> > >>> I sent this earlier this week without [RFC] in the subject > >>> line...since > >>> some people might have filters to check the subject line I wanted to > >>> send this again with the proper substring in the subject line–to make > >>> it > >>> clear I intend to take this to a vote in two weeks. Apologies for the > >>> duplicate email. > >>> > >>> -Will > >>> > >>>> On 2019-06-18 14:45, w...@wkhudgins.info wrote: > >>>> Hello all, > >>>> > >>>> I submitted this RFC several years ago. I collected a lot of > >>>> feedback > >>>> and I have updated the RFC and corresponding github patch. Please > >>>> see > >>>> the RFC at https://wiki.php.net/rfc/add_str_begin_and_end_functions > >>>> and the github patch at https://github.com/php/php-src/pull/2049. I > >>>> have addressed many concerns > >>>> (order of arguments, name of functions, multibye support, etc). I > >>>> plan > >>>> to move this RFC to a vote in the coming weeks. > >>>> > >>>> Thanks, > >>>> > >>>> Will > >>> > >> > >> Unfortunately, this looks like a case where the RFC feedback has made > >> the > >> proposal worse, rather than better :( > >> > >> I think it's easier to start with what I think this proposal should > >> be: > >> There should be just two functions, str_starts_with() and > >> str_ends_with() > >> -- and that's it. > >> > >> The important realization to have here is that these functions are a > >> bit of > >> sugar for an operation that is quite common, but can also be easily > >> implemented with existing functions (using strcmp, strpos or substr, > >> depending on what you like). There is no need for us to cover every > >> conceivable combination, just make the common case more convenient and > >> easier to read. > >> > >> With that in mind: > >> * I believe the "starts with" and "ends with" naming is a lot more > >> canonical, used by Python, Ruby, Java, JavaScript and probably lots > >> more. > >> * In my experience case-insensitive "i" variants of strings functions > >> are > >> used much less, by an order of magnitude. With this being sugar in the > >> first place, I don't think there's a need to cover case-insensitive > >> variations (and from a quick look, these don't seem to be first class > >> methods in other languages either). If we do want to have them, I'd > >> suggest > >> making the names str_starts_with_ci() and str_ends_with_ci(), which is > >> more > >> obvious and harder to miss than str_istarts_with() etc. > >> * Having mb_* variants of these functions doesn't really make sense. I > >> realize that there's this knee-jerk reaction about how if it doesn't > >> have > >> "mb" in the name it's not Unicode compatible, but in this case it's > >> even > >> more wrong than usual. The normal str_starts_with() function is > >> perfectly > >> safe to use on UTF-8 strings, the only difference between it and > >> mb_str_starts_with() is that it's going to be implemented a lot more > >> efficiently. The only case that *might* make some sense is the > >> case-insensitive variant here, because that has some genuine reliance > >> on > >> the character encoding. But then again, this can be handled by > >> case-folding > >> the strings first, something that mbstring is going to do internally > >> anyway. > >> > >> I would happily accept a proposal for str_starts_with() + > >> str_ends_with(), > >> but I'm a lot more apprehensive about adding these 8 new functions. > >> > >> Regards, > >> Nikita > > > > > > I like the idea of simplifying this to the two functions > > str_starts_with() and str_ends_with(). > > > > When I was looking through this the other day, I had trouble coming up > > with an example of a string with the mb_* versions would ever generate > > a different result from the non-multibyte versions, since the > > implementation only needs to count and analyze bytes for uniqueness. > > Perhaps it would only be an issue with the case-insensitive versions, > > as Nikita points out? If so, can someone provide some example strings > > where an mb_starts_with_ci() would return true, while > > str_starts_with_ci() would return false? > > > > I think the case sensitivity versions would be common enough in use > > cases (i.e. looking to see if a path ends with .CSV vs. .csv, etc.), > > but maybe the signatures could be revised to pass a third parameter? > > > > str_starts_with($haystack, $needle, $case_sensitive = true): bool > > > > -Ben >