On Mon, May 17, 2004 at 12:36:24AM -0700, Karsten M. Self wrote: > This is really old, but it's straight up my alley, so... > > on Sat, Apr 03, 2004 at 07:08:39PM -0600, Christopher L. Everett ([EMAIL PROTECTED]) > wrote: > > I do a lot of database work. Sometimes I must do massive batch jobs on
> In general: > > - Load a small search/criteria set into memory, and use it to > sequentially scan a larger dataset. > > - Lose any data you don't need early on. > > - When querying remote data sources, if possible, *run the query* > remotely, and just return the result set. This was the trick with > my 20 hours -> 5 minutes process. I defined a view on the remote > database, populated a small (~20k rows) table on the database > server, and queried the view for my result set (returning ~20k > records). Querying against a 40m row table, indexed. > > - Avoid disk processing by streaming / piping data between processes. > > - Use hashes rather than sorts or b-trees (or get your tools to use > them for you). > > - Think about what you're doing. > > - Do as little as possible. That's been by gag answer to "what do you > do", but from an optimization standpoint, it's the goal. > > It's both science and art. Treat it that way. Seriously Karsten, have you ever considered writing a book of monographs and epigrams? You could title it _Karsten Recommends: Because Karsten Knows Better than You_. That way, instead of compulsively saving even the e-mails that have nothing to do with my current situation, I could simply have them all in a lovely bound volume. Suddenly I feel the need to do heavy database work, just so I can put all my new knowledge to use. Cheers, Jason Whittle -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]