On Fri, May 20, 2011 at 12:15 PM, Larry Garfield <la...@garfieldtech.com>wrote:

> I'm working with a fellow developer on an experimental project.  There are
> some PECL modules that we want to try and use in an open source project
> where we cannot guarantee that PECL modules will be available, since it's
> intended for widespread distribution on both shared hosts and custom
> hosting. The thought we had was to do a user-space port of the PECL module
> to include in the project and rely on that.  Then if the PECL module is
> installed, we don't include the library (either via an extension_loaded()
> check or just relying on autoload) and the PECL implementation gets used
> instead.  Poof, nice speed boost.
>

I do a fair amount of this with Flourish (http://flourishlib.com), since my
goal is to have all functionality work with only the extensions that come
with a standard PHP install (plus at least one database extension for the
desired database). I also back-port some functionality since my goal is for
PHP 5.1+ compatibility. When the PHP extension in question is available and
known not to contain show-stopper bugs, I'll use it. Otherwise I fall back
to a native PHP implementation of the functionality. Granted, the native PHP
is much slower, but people who are really concerned about performance can
always do the work of installing the extensions. In practice this seems to
work fairly well.

So far I have the following implemented with native PHP fallbacks:

mbstring (UTF-8 only): fUTF8
json (for 5.1): fJSON
bcmath: fNumber

I've also implemented imap/pop3 in native PHP to get around performance
issues, segfaults and similar issues. I've backported HTTP-only cookies to
5.1. There are also a whole bunch of other compatibility/portability
elements.


> The questions I have are:
>
> 1) Is this even a viable approach?  It seems like it, but to my knowledge
> no one else has done this to any serious extent which makes me wonder if
> there's a reason the road less traveled is less traveled.
>

It has seem to work fine for Flourish, although I don't have anywhere near
the traction/usage of any of the big frameworks or component libraries. I
think part of the reason this road is less traveled is because most
developers who care about more advanced functionality are more than capable
of finding an environment that has the extensions they need.


> 2) Is anyone else doing this?  No sense doing it ourselves if someone else
> already is.
>

As I mentioned, I do this is Flourish, however I do wrap the functionality
up in my own API since I am kind of particular about the usability of APIs.
Sometimes wrapping it in my own API provides the ability to skip 30% of the
functionality I don't see being frequently used, other times it allows me to
fix bugs in specific versions of PHP, such as with stream_select() in some
versions of PHP 5.2.


> 3) What would be the cleanest way to do so?  We had the thought of
> partially automating the process by having PHP auto-generate at the very
> least the subs of any classes and functions that the module provides.
>  However, when my colleague tried using the same parser as is used for
> generating documentation he says he got several times as many classes as the
> manual says the module has.  We were using the PECL HTTP module as our
> sample (http://www.php.net/http).  (I don't know the exact details of what
> he did at the moment.)  Is that not a viable approach?  Would we be better
> off using reflection?  Is there some other tool we're not aware of?
>

Like I said, I create my own APIs. I normally will have used the C
extensions some and have some ideas about how I think they can be improved
and how I normally see them used. This then informs me about what I think
the custom API should be. Sometimes I do a pretty straight-forward
implementation though. For instance, I've implemented all of the PHP string
functions to work against UTF-8, even things like ucwords().


> If viable I'd love if this would start a trend, but we'll see where it
> goes.  I know it wouldn't work for all PECL modules, obviously, but I
> suspect it could work for several, and provide an easy way for different PHP
> projects to share backend code without needing lots of C developers.
>

I agree that not having to rely on C code is helpful, but in my opinion it
isn't because developing in C is hard/annoying/there are many C developers,
but because it is much harder to get special C extensions installed on many
shared hosts. Even for people who do have full access to install extensions,
it is often a pain to deal with compared to dropping in an updated PHP
script.

So from my experience implementing various semi-CPU-intensive work, the key
aspects to making native PHP work is trying to rely on built-in functions
that do as much as possible and to be sure to think about algorithms being
used. For instance, I found I was able to implement the porter stemming
algorithm with much better performance by using preg* functions for string
manipulation. For the diff/patch library I wrote, I found using the patience
diff algorithm it was possible to write a diffing library that wouldn't
consume 128MB of ram for 3000 line files, and it was an order of magnitude
faster to boot.

Will

Reply via email to