Hello,

My name is Joe Lapp, and I have written high-speed portal-side parsers in
Java for XML, HTML, and various other XML-related syntaxes (e.g. XQL).

I am planning a series of new parsing technologies that I'd like to 
implement in PHP.  To allow my parsers to perform with high efficiency in 
PHP, I need two new string functions.  One is identical to strpbrk() but 
would also take a starting-offset parameter.

Here are the two new functions:

/* strpbrk -- Returns the offset into a string of the first occurrence of 
any character found in a list of provided characters, optionally scanning 
the string starting from a provided string offset. */

strpbrk(string haystack, string char_list [, int starting_offset])

/* strnpbrk -- Returns the offset into a string of the first occurrence of a
character NOT found in a list of provided characters, optionally scanning 
the string starting from a provided string offset. */

strnpbrk(string haystack, string char_list [, int starting_offset])

In other words, strpbrk() would function as it does currently, but it would 
take a starting_offset.  strnpbrk() would be almost identical to this new 
strpbrk(), except that it skips over characters found in the provided 
character list and returns the position of the first character that is not 
in the list.

(BTW, I'm not real fond of C-lib style cryptic names.  I'd much prefer
string functions with readable names that are also good mnemonics.
Maybe scan_for_char() and skip_over_chars() would be better names.)

Ideally, these functions would also support a way to specify characters by 
their unicode values and a way to specify a range of characters.  For 
example, "#8230;A-Z<>" would name the ellipsis character ("#8230;"), the 
characters from A to Z, and the angle bracket characters.

The significance of these functions is purely processing speed.  They would 
allow me to create high-speed parsers and distribute them as uncompiled PHP.  
If the functions are implemented properly, using them should produce much 
faster code than the equivalent compiled PHP.  The starting offset is
necessary to avoid creating a proliferation of substrings that would 
significantly slow down parsing speed.

What are the odds that we can get such functions into PHP 5?  I am planning 
a high-speed XML filtering technology for XML-replication servers in PHP.  I 
want to make this engine free as well as a particular application of this 
engine that I think could create a whole new mode of using the net.  Speed
is very important because of the amount of XML being processed.  I cannot
use existing XML processors for the filtering function I have in mind.  In
any case, these two new functions would allow people to easily create any
sort of high-speed parser.

I fear that without these functions, I'd have to distribute this new server 
as compiled PHP and perhaps require faster server hardware (more clock cycles 
available to the user per unit time) than most users currently have.  Maybe
that's not a problem, except perhaps for my wallet.  I don't know what sort
of Zend license I'd require to be able to distribute free pre-compiled code.

I am also an experience C/C++ programmer and can write these functions 
myself.  Before doing so, though, I'd like to know if I should bother.  
Would they make it into PHP 5?

Thanks for your help!
~joe

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to