Hi, albeit *not* as a daemon, we've successfully developed a Crawler in PHP within our company. It can run for hours without a leak, if I remember correctly it's peak memory consumption is below 64MB. However we're crawling only a small amount of URLs, just around 10.000 .
As Brian mentioned: free your database resources, unset unused variables. We've had one major rewrite which, besides re-architecturing the whole thing for plugin/modularity, involved auditing every step to make sure resources are properly freed. Usually a PHP developer doesn't have to pay much attention to it because of the wide-used process-fork model (but I guess I don't need to tell you that :). But you'll get often beaten by PHP itself: it has quite some leaks and finding/tracking them done costs time, sometimes requires skill at the C level of PHP to properly understand/diagnose things and if you were (unfortunately) successful in identifying a PHP problem you've report a bug, preferable attach provide a patch/workaround. For example, we've had to fight http://bugs.php.net/bug.php?id=43450 . Tracking this PHP problem was quite time consuming, involving multiple developers, etc. Luckily we could work around this, but it was pretty annoying. We actually planned to release this as open source, donate it to Zend, whatever. Legally it's done within the company, just no one had the time for the publishing process, going over things, etc. :/ As a sidenote: We've hit the current limit of our Crawler implementation in PHP itself: we can't to parallel fetching/processing of URLs in a efficient manner. You can get things quick running in PHP, but doing things with style and a serious architecture hits its limits. We've gone to Java for such cases, made sense for us anyway as we had to move away from Zend_Search_Lucene as it had performance problems with our index where as Lucene/Solr was still mostly bored. Will be interesting to see if http://code.google.com/p/marjory/ can handle this. Ops, off-topic. HTH, - Markus -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php