> On 27 Jun 2024, at 12:31, Mike Schinkel <m...@newclarity.net> wrote:
> 
>> On Jun 26, 2024, at 8:14 AM, Gina P. Banyard <intern...@gpb.moe 
>> <mailto:intern...@gpb.moe>> wrote:
>> 
>> 
>> On Wednesday, 26 June 2024 at 06:18, Mike Schinkel <m...@newclarity.net 
>> <mailto:m...@newclarity.net>> wrote:
>>> https://3v4l.org/RDYFs#v8.3.8
>>> 
>>> Note those seven use-cases are found in around the first 25 results when 
>>> searching GitHub for "strtok(".  I could probably find more if I kept 
>>> looking:
>>> 
>>> https://github.com/search?q=strtok%28+language%3APHP+&type=code
>>> 
>>> Regarding explode($delimiter, $str)[0] — unless it is to be special-cased 
>>> during compilation —it is a really inefficient way to find the substring up 
>>> to the first character, especially for large strings and/or when in a tight 
>>> loop where the explode is contained in a called function
>> 
>> Then use a regex: https://3v4l.org/SGWL5
> 
> Using `preg_match()` instead of `strtok()` to process the ~4k file of commas 
> is, on average, same as using explode()[0], or 10x as long as using 
> `strtok()` (at times it got as low as 4.4x, but that was rare):
> 
> https://onlinephp.io/c/e1fad
> 
> Size of file:          3972
> Number of commas:      359
> Time taken for strtok: 0.003 seconds
> Time taken for regex:  0.0307 seconds
> Times strtok() faster: 10.25
> 
>> Or a combination of strpos and substr.
> 
> 
> Using `strpos()`+ `substr()` instead of `strtok()` to process the ~4k file of 
> commas is, took on average ~3x as long as using `strtok()`. I implemented a 
> class for this and tried to optimize it by using only string positions and 
> not copying the string repeatedly. It also took about 1/2 hour to get the 
> code working vs. about 15 seconds to get the code working with strtok(); 
> which will most programmers prefer?
> 
> https://onlinephp.io/c/2a09f
> 
> Size of file:           3972
> Number of commas:       359
> Time for strtok:        0.0027 seconds
> Time for strpos/substr: 0.0089 seconds
> Times strtok() faster:  3.31
> 
> 
>> There are *plenty* of solutions to the specific problem you pose here, and 
>> thus many different solutions more or less appropriate.
> 
> Yes, and in all cases the existing solutions are significantly slower, except 
> one.
> 
> And that one solution that is not significantly slower is to not deprecate 
> `strtok()`.  Not to mention not deprecating would keep from causing lots of 
> BC breakage.
> 
> -Mike

Hi All,

I do appreciate that strtok has a kind of bizarre signature/use pattern and 
potential for confusion due to how subsequent calls work, but to me that sounds 
like a better result for uses that need the repeated call functionality, would 
be to introduce a builtin `StringTokenizer` class that wraps the underlying 
strtok_r C call and uses internal state to keep track of the string being 
tokenized. 


As a "works the same" solution for grabbing the first segment of a string up to 
any of the delimiter chars, could the  `strpbrk` function be expanded with a 
`$before_needle` arg like `strstr` has? (strstr matches on an exact substring, 
not on any pf a list of characters)




Cheers

Stephen 

Reply via email to