Hello,

Jeff Bearer wrote:
> 
> Hello,
> 
>         I have a PHP content management application that I've developed.  I'm
> looking to add data caching to it so the database doesn't get pounded
> all day long, the content on the site changes slowly, once or twice a
> day.
> 
>         Does anyone know of where I can look to find an application that does
> this?  I've searched and have yet to find anything that does the same
> kind of thing.  I'll take anything, a module or library that does it, or
> even some other application that does the same thing that I can look at.

You are in the right track avoiding as much database access as possible.
Database accesses will get slower and slower with the growth of your
site contents. This means that not only database connections will take
longer but also means that they will eat your memory with the growth of
audience. This happens not only because each database connection takes a
lot of memory from database server process (1MB for MySQL) but also
because the longer that a page takes to generate the more time that
stalls Web server processes making it fork more processes to atend the
growth of simultaneous requests making them raise the amount of memory
that is consumed simultaneously until it is exhausted leading to server
crashes.

The best thing to do is not just use caches to avoid database accesses,
but also minimize the number of dynamic pages and other dynamically
generated content, meaning avoid not only database access, but as much
PHP scripts as you can. PHP is good and provides flexibility, but for
scalability sake avoid it wherever you can. In practice, this means
avoid personalized content and generate and serve static content for
non-personalized pages.

This is easier said than done. I know that very well because my site PHP
Classes repository has been suffering outages because of that. The
greatest problem is that I admit that I got carried away by the power of
PHP and database programming when I planned the site initially. Today
the site is so busy that it can't handle that much database accesses and
dynamically generated pages for everything.

I have been working around the problem as much as I can with caching
solutions. They reduce the problem but they don't solve it. The solution
is to redesign for making the site static before I need to scale the
hardware. Making the site static will also help the mirroring to other
servers which is a good thing for which I had plenty of requests.

Still making the site static for non-personalize content only solves
part of the problem. The personalized pages can't be accessing the
database on every request. One thing to avoid is to authenticate
sessions using the database on each request. I will be solving this with
a back end daemon that caches database accesses for session
authentication. The Web server processes will establish persistent Unix
domain socket connections with the deamon just to verify session
authentication. I will also study shared memory caching to avoid the
overhead of connecting to that daemon server.

Still there is the problem of personalized content. Fortunately,
personalized pages are made of views  that often are common even for
different users. So, does parts are worth caching. That problem is
mostly solved with a caching class that I developed that stores caches
in local disk files. This class is freely available from here:

http://phpclasses.upperdesign.com/browse.html/package/313


> Ideally I think it would be cool if it would be a PEAR module that the
> application connects to just like the database, and it manages chaches
> and querying the database for data in the module.

My experience is that it is not worth to cache just queries, but to
cache the page contents generated from those queries and thus avoid the
overhead of reprocessing query results on each access.

I looked at PEAR cache classes. PEAR provides an infra-structure for
caching data in different types of containers: database, shared memory
and files. Caching content in the database is counterproductive because
that is precisely what you need to avoid most. Caching content in the
shared memory is probably the fastest way but usually you are limited to
the amount of shared memory that each process may use. Caching contents
in files seems to be the most flexible way although it could lead to the
better results when combined with shared memory caching. Anyway, most
times OS implicit filesystem caching is already good enough to avoid
most overhead of reading from the same files over and over again.

Anyway, if your cached data is going to be shared by multiple Web server
processes that will access it concurrently, you certainly need an engine
that will avoid that more than one process attempts to update the cache
contents due to race conditions that may happen when the stored cache is
outdated and needs to be rebuilt. This aspect PEAR file cache class is
not addressing. This means that if you attempt to use it in a busy site,
you may end up with corrupted cache files that compromise what the site
will display.

Fortunately, this issue is properly handled with the class that I
pointed above. It uses file locks to allow multiple read accesses and
only one write access. It has proven to work robustly in a busy site
like the PHP Classes repository, so you may want to give a serious try.

Regards,
Manuel Lemos

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]

Reply via email to