On Thu, Feb 03, 2005 at 10:09:49PM -0800, Paul Rubin wrote: > aurora <[EMAIL PROTECTED]> writes: > > I'm lost. So what do you compares against when you said LAMP is slow? > > What is the reference point? Is it just a general observation that > > slashdot is slower than we like it to be? [reordered Paul's email a bit]
> > If you mean MySQL or SQL database in general is slow, there are truth > > in it. The best thing about SQL database is concurrent access, > > transactional semantics and versatile querying. Turns out a lot of > > application can really live without that. If you can rearchitect the > > application using flat files instead of database it can often be a > > big bloom. > > This is the kind of answer I had in mind. *ding*ding*ding* The biggest mistake I've made most frequently is using a database in applications. YAGNI. Using a database at all has it's own overhead. Using a database badly is deadly. Most sites would benefit from ripping out the database and doing something simpler. Refactoring a database on a live system is a giant pain in the ass, simpler file-based approaches make incremental updates easier. The Wikipedia example has been thrown around, I haven't looked at the code either; except for search why would they need a database to look up an individual WikiWord? Going to the database requires reading an index when pickle.load(open('words/W/WikiWord')) would seem sufficient. > Yes, that's the basic observation, not specifically Slashdot but for > lots of LAMP sites (some PHPBB sites are other examples) have the same > behavior. You send a url and the server has to grind for quite a > while coming up with the page, even though it's pretty obvious what > kinds of dynamic stuff it needs to find. Just taking a naive approach > with no databases but just doing everything with in-memory structures > (better not ever crash!) would make me expect a radically faster site. > For a site like Slashdot, which gets maybe 10 MB of comments a day, > keeping them all in RAM isn't excessive. (You'd also dump them > serially to a log file, no seeking or index overhead as this happened. > On server restart you'd just read the log file back into ram). You're preaching to the choir, I don't use any of the fancy stuff in Twisted but the single threaded nature means I can keep everything in RAM and just serialize changes to disk (to survive a restart). This allows you to do very naive things and pay no penalty. My homespun blogging software isn't as full featured as Pybloxsom but it is a few hundred times(!) faster. Pybloxsom pays a high price in file stats because it allows running under CGI. Mine would too as a CGI but it isn't so *shrug*. > > A lot of these is just implementation. Find the right tool and the > > right design for the job. I still don't see a case that LAMP based > > solution is inherently slow. > > I don't mean LAMP is inherently slow, I just mean that a lot of > existing LAMP sites are observably slow. A lot of these are just implementation. Going the dumb non-DB way won't prevent you from making bad choices but if a lot of bad choices are made simply because of the DB (my assertion) dropping the DB would avoid some bad choices. I think Sourceforge has one table for all project's bugs & patches. That means a never used project's bugs take up space in the index and slow down access to the popular projects. Would a naive file-based implementation have been just as bad? maybe. If there is interest I'll follow up with some details on my own LAMP software which does live reports on gigs of data and - you guessed it - I regret it is database backed. That story also involves why I started using Python (the prototype was in PHP). -Jack -- http://mail.python.org/mailman/listinfo/python-list