On Fri, May 20, 2011 at 12:15 PM, Larry Garfield <la...@garfieldtech.com>wrote:
> I'm working with a fellow developer on an experimental project. There are > some PECL modules that we want to try and use in an open source project > where we cannot guarantee that PECL modules will be available, since it's > intended for widespread distribution on both shared hosts and custom > hosting. The thought we had was to do a user-space port of the PECL module > to include in the project and rely on that. Then if the PECL module is > installed, we don't include the library (either via an extension_loaded() > check or just relying on autoload) and the PECL implementation gets used > instead. Poof, nice speed boost. > I do a fair amount of this with Flourish (http://flourishlib.com), since my goal is to have all functionality work with only the extensions that come with a standard PHP install (plus at least one database extension for the desired database). I also back-port some functionality since my goal is for PHP 5.1+ compatibility. When the PHP extension in question is available and known not to contain show-stopper bugs, I'll use it. Otherwise I fall back to a native PHP implementation of the functionality. Granted, the native PHP is much slower, but people who are really concerned about performance can always do the work of installing the extensions. In practice this seems to work fairly well. So far I have the following implemented with native PHP fallbacks: mbstring (UTF-8 only): fUTF8 json (for 5.1): fJSON bcmath: fNumber I've also implemented imap/pop3 in native PHP to get around performance issues, segfaults and similar issues. I've backported HTTP-only cookies to 5.1. There are also a whole bunch of other compatibility/portability elements. > The questions I have are: > > 1) Is this even a viable approach? It seems like it, but to my knowledge > no one else has done this to any serious extent which makes me wonder if > there's a reason the road less traveled is less traveled. > It has seem to work fine for Flourish, although I don't have anywhere near the traction/usage of any of the big frameworks or component libraries. I think part of the reason this road is less traveled is because most developers who care about more advanced functionality are more than capable of finding an environment that has the extensions they need. > 2) Is anyone else doing this? No sense doing it ourselves if someone else > already is. > As I mentioned, I do this is Flourish, however I do wrap the functionality up in my own API since I am kind of particular about the usability of APIs. Sometimes wrapping it in my own API provides the ability to skip 30% of the functionality I don't see being frequently used, other times it allows me to fix bugs in specific versions of PHP, such as with stream_select() in some versions of PHP 5.2. > 3) What would be the cleanest way to do so? We had the thought of > partially automating the process by having PHP auto-generate at the very > least the subs of any classes and functions that the module provides. > However, when my colleague tried using the same parser as is used for > generating documentation he says he got several times as many classes as the > manual says the module has. We were using the PECL HTTP module as our > sample (http://www.php.net/http). (I don't know the exact details of what > he did at the moment.) Is that not a viable approach? Would we be better > off using reflection? Is there some other tool we're not aware of? > Like I said, I create my own APIs. I normally will have used the C extensions some and have some ideas about how I think they can be improved and how I normally see them used. This then informs me about what I think the custom API should be. Sometimes I do a pretty straight-forward implementation though. For instance, I've implemented all of the PHP string functions to work against UTF-8, even things like ucwords(). > If viable I'd love if this would start a trend, but we'll see where it > goes. I know it wouldn't work for all PECL modules, obviously, but I > suspect it could work for several, and provide an easy way for different PHP > projects to share backend code without needing lots of C developers. > I agree that not having to rely on C code is helpful, but in my opinion it isn't because developing in C is hard/annoying/there are many C developers, but because it is much harder to get special C extensions installed on many shared hosts. Even for people who do have full access to install extensions, it is often a pain to deal with compared to dropping in an updated PHP script. So from my experience implementing various semi-CPU-intensive work, the key aspects to making native PHP work is trying to rely on built-in functions that do as much as possible and to be sure to think about algorithms being used. For instance, I found I was able to implement the porter stemming algorithm with much better performance by using preg* functions for string manipulation. For the diff/patch library I wrote, I found using the patience diff algorithm it was possible to write a diffing library that wouldn't consume 128MB of ram for 3000 line files, and it was an order of magnitude faster to boot. Will