Hi Joe,
At 7/8/2004 07:07 PM -0400, [EMAIL PROTECTED] wrote:
Hello,
My name is Joe Lapp, and I have written high-speed portal-side parsers in Java for XML, HTML, and various other XML-related syntaxes (e.g. XQL).
I am planning a series of new parsing technologies that I'd like to implement in PHP. To allow my parsers to perform with high efficiency in PHP, I need two new string functions. One is identical to strpbrk() but would also take a starting-offset parameter.
Here are the two new functions:
/* strpbrk -- Returns the offset into a string of the first occurrence of any character found in a list of provided characters, optionally scanning the string starting from a provided string offset. */
strpbrk(string haystack, string char_list [, int starting_offset])
/* strnpbrk -- Returns the offset into a string of the first occurrence of a character NOT found in a list of provided characters, optionally scanning the string starting from a provided string offset. */
strnpbrk(string haystack, string char_list [, int starting_offset])
They sound useful for general purpose parsing and string manipulation...
In other words, strpbrk() would function as it does currently, but it would take a starting_offset. strnpbrk() would be almost identical to this new strpbrk(), except that it skips over characters found in the provided character list and returns the position of the first character that is not in the list.
(BTW, I'm not real fond of C-lib style cryptic names. I'd much prefer string functions with readable names that are also good mnemonics. Maybe scan_for_char() and skip_over_chars() would be better names.)
Ideally, these functions would also support a way to specify characters by their unicode values and a way to specify a range of characters. For example, "#8230;A-Z<>" would name the ellipsis character ("#8230;"), the characters from A to Z, and the angle bracket characters.
The significance of these functions is purely processing speed. They would allow me to create high-speed parsers and distribute them as uncompiled PHP. If the functions are implemented properly, using them should produce much faster code than the equivalent compiled PHP. The starting offset is necessary to avoid creating a proliferation of substrings that would significantly slow down parsing speed.
How could php code using compiled function calls ever be faster than 100% compiled code?
What are the odds that we can get such functions into PHP 5? I am planning a high-speed XML filtering technology for XML-replication servers in PHP. I want to make this engine free as well as a particular application of this engine that I think could create a whole new mode of using the net. Speed is very important because of the amount of XML being processed. I cannot use existing XML processors for the filtering function I have in mind. In any case, these two new functions would allow people to easily create any sort of high-speed parser.
I fear that without these functions, I'd have to distribute this new server as compiled PHP and perhaps require faster server hardware (more clock cycles available to the user per unit time) than most users currently have. Maybe that's not a problem, except perhaps for my wallet. I don't know what sort of Zend license I'd require to be able to distribute free pre-compiled code.
http://pecl.php.net
I am also an experience C/C++ programmer and can write these functions myself. Before doing so, though, I'd like to know if I should bother. Would they make it into PHP 5?
Thanks for your help! ~joe
--
Sincerely, Jason Garber
-- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php