Hi Joe,

At 7/8/2004 07:07 PM -0400, [EMAIL PROTECTED] wrote:
Hello,

My name is Joe Lapp, and I have written high-speed portal-side parsers in
Java for XML, HTML, and various other XML-related syntaxes (e.g. XQL).

I am planning a series of new parsing technologies that I'd like to
implement in PHP.  To allow my parsers to perform with high efficiency in
PHP, I need two new string functions.  One is identical to strpbrk() but
would also take a starting-offset parameter.

Here are the two new functions:

/* strpbrk -- Returns the offset into a string of the first occurrence of
any character found in a list of provided characters, optionally scanning
the string starting from a provided string offset. */

strpbrk(string haystack, string char_list [, int starting_offset])

/* strnpbrk -- Returns the offset into a string of the first occurrence of a
character NOT found in a list of provided characters, optionally scanning
the string starting from a provided string offset. */

strnpbrk(string haystack, string char_list [, int starting_offset])

They sound useful for general purpose parsing and string manipulation...

In other words, strpbrk() would function as it does currently, but it would
take a starting_offset.  strnpbrk() would be almost identical to this new
strpbrk(), except that it skips over characters found in the provided
character list and returns the position of the first character that is not
in the list.

(BTW, I'm not real fond of C-lib style cryptic names.  I'd much prefer
string functions with readable names that are also good mnemonics.
Maybe scan_for_char() and skip_over_chars() would be better names.)

Ideally, these functions would also support a way to specify characters by
their unicode values and a way to specify a range of characters.  For
example, "#8230;A-Z<>" would name the ellipsis character ("#8230;"), the
characters from A to Z, and the angle bracket characters.

The significance of these functions is purely processing speed.  They would
allow me to create high-speed parsers and distribute them as uncompiled PHP.
If the functions are implemented properly, using them should produce much
faster code than the equivalent compiled PHP.  The starting offset is
necessary to avoid creating a proliferation of substrings that would
significantly slow down parsing speed.

How could php code using compiled function calls ever be faster than 100% compiled code?


What are the odds that we can get such functions into PHP 5?  I am planning
a high-speed XML filtering technology for XML-replication servers in PHP.  I
want to make this engine free as well as a particular application of this
engine that I think could create a whole new mode of using the net.  Speed
is very important because of the amount of XML being processed.  I cannot
use existing XML processors for the filtering function I have in mind.  In
any case, these two new functions would allow people to easily create any
sort of high-speed parser.

I fear that without these functions, I'd have to distribute this new server
as compiled PHP and perhaps require faster server hardware (more clock cycles
available to the user per unit time) than most users currently have.  Maybe
that's not a problem, except perhaps for my wallet.  I don't know what sort
of Zend license I'd require to be able to distribute free pre-compiled code.

http://pecl.php.net

I am also an experience C/C++ programmer and can write these functions
myself.  Before doing so, though, I'd like to know if I should bother.
Would they make it into PHP 5?

Thanks for your help!
~joe

--

Sincerely, Jason Garber

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Reply via email to