[PHP-DEV] Need two simple string funcs for parsing
Hello, My name is Joe Lapp, and I have written high-speed portal-side parsers in Java for XML, HTML, and various other XML-related syntaxes (e.g. XQL). I am planning a series of new parsing technologies that I'd like to implement in PHP. To allow my parsers to perform with high efficiency in PHP, I need two new string functions. One is identical to strpbrk() but would also take a starting-offset parameter. Here are the two new functions: /* strpbrk -- Returns the offset into a string of the first occurrence of any character found in a list of provided characters, optionally scanning the string starting from a provided string offset. */ strpbrk(string haystack, string char_list [, int starting_offset]) /* strnpbrk -- Returns the offset into a string of the first occurrence of a character NOT found in a list of provided characters, optionally scanning the string starting from a provided string offset. */ strnpbrk(string haystack, string char_list [, int starting_offset]) In other words, strpbrk() would function as it does currently, but it would take a starting_offset. strnpbrk() would be almost identical to this new strpbrk(), except that it skips over characters found in the provided character list and returns the position of the first character that is not in the list. (BTW, I'm not real fond of C-lib style cryptic names. I'd much prefer string functions with readable names that are also good mnemonics. Maybe scan_for_char() and skip_over_chars() would be better names.) Ideally, these functions would also support a way to specify characters by their unicode values and a way to specify a range of characters. For example, "#8230;A-Z<>" would name the ellipsis character ("#8230;"), the characters from A to Z, and the angle bracket characters. The significance of these functions is purely processing speed. They would allow me to create high-speed parsers and distribute them as uncompiled PHP. If the functions are implemented properly, using them should produce much faster code than the equivalent compiled PHP. The starting offset is necessary to avoid creating a proliferation of substrings that would significantly slow down parsing speed. What are the odds that we can get such functions into PHP 5? I am planning a high-speed XML filtering technology for XML-replication servers in PHP. I want to make this engine free as well as a particular application of this engine that I think could create a whole new mode of using the net. Speed is very important because of the amount of XML being processed. I cannot use existing XML processors for the filtering function I have in mind. In any case, these two new functions would allow people to easily create any sort of high-speed parser. I fear that without these functions, I'd have to distribute this new server as compiled PHP and perhaps require faster server hardware (more clock cycles available to the user per unit time) than most users currently have. Maybe that's not a problem, except perhaps for my wallet. I don't know what sort of Zend license I'd require to be able to distribute free pre-compiled code. I am also an experience C/C++ programmer and can write these functions myself. Before doing so, though, I'd like to know if I should bother. Would they make it into PHP 5? Thanks for your help! ~joe -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] Lamenting PHP's streaming support...
Hi everyone, I'm trying to write some serious parsing applications in PHP. I find myself frequently lamenting the 4GL-like support for buffered streams. I'd rather a full fledged streaming API with stream handles (or objects) like you get in mature 3GL languages like C and Java. I'm making do with the single character-stream buffer available to me in the "output buffer." I wrap this stream in classes that emulate distinct character streams by saving the current output buffer, clearing the output buffer for the new virtual stream, and then restoring the original output buffer when the virtual stream is closed. This works, but it costs in overhead and requires repeatedly creating string objects to store old buffers and then rewriting those objects back to the output buffer. This is less than ideal from both a performance standpoint and a complexity standpoint (and an increased potential for wierd errors). I'm not too concerned about the performance issues of these virtual buffers because I can architect the application so that it minimizes these switches. However, I find myself (so far) unable to architect around another serious performance issue. I'm having to create a new string for each character sequence that I write to the output buffer. I'd rather just copy the substring of the document being parsed directly to the output buffer. Object creation is an expensive activity when thousands of objects needed to be created for a single page hit. All I need to deal with this problem is a new PHP function: ob_write($string, $start, $length) This would write the characters in substr($string, $start, $length) to the output buffer without creating an intermediate string object. Is there anything on the horizon that would give me the kind of streaming support I'm looking for? Thanks for your help! ~joe -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] Substring writes and buffered char streams
Hello PHP gurus, The php-general list does not believe that PHP allows me to do either of the following: (1) Writing an arbitrary substring of a string directly to a stream without first creating a string object for the substring. I.E. There is no print($string, $start, $length) or fwrite($resource, $string, $length, $start). (2) Creating multiple independent buffered characters streams. It appears that stdout is the only instance available. I need to be able to do the first to prevent a costly proliferation of string objects when parsing an input string and producing a new output string from it's substrings. My experience writing Java parsers for business portals clearly demonstrates that object creation, and particularly string creation, is a limiting factor to throughput. Fixing this problem in PHP seems easy: we just need to add an optional start-offset parameter to fwrite(). I don't think I absolutely need the second feature, as I'm emulating multiple buffered character streams by saving and restoring the contents of stdout (via the output buffer) when switching between instances. I just have to keep the application smart about switching so that I can minimize the switch costs. However, it's possible that this could become an issue. I can't create or use a PHP module since my customers generally only have FTP access to their web sites -- not even telnet, much less the ability to customize their PHP configuration. So, is it truly impossible to do these things? If so, when could I hope to see these features -- or at least feature (1) -- ship with core PHP? Thank you for your help! ~joe -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Substring writes and buffered char streams
Hi Wez. Here are some clarifications... On 8/7/2004 Wez. wrote: >I suppose we could add that. Keep in mind that strings in PHP aren't >hugely expensive unless you are doing something wrong (tm) like using >10MB strings. Strings are cheap in Java too. The issue is object creation and cleanup. When the strings are very large or very numerous, we could be talking about thousands of substrings per page hit. This increases the strain on both the clock speed and the memory of the host machine. Theory aside, I can get as much as a tenfold improvement in throughput with such techniques in Java. >$fp = fopen(...) ? >$fp = tmpfile() ? Right, I should have mentioned this possibility. The main reason for taking these precautions is throughput. I need in-memory streams. Can I create a memory-only file? Thanks for your help! ~joe -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php