I've had a reporting database with just about a billion rows. Each row was horribly large because the legacy schema had problems. We partitioned it out by month and it ran about 30 million rows a month. With a reasonably large box you can get that kind of data into memory and indexes are almost unnecessary. So long as you have constraint exclusion and a good partition scheme you should be fine. Throw in a well designed schema and you'll be cooking well into the tens of billions of rows.
We ran self joins of that table reasonably consistently by the way: SELECT lhs.id, rhs.id FROM bigtable lhs, bigtable rhs WHERE lhs.id > rhs.id AND '' > lhs.timestamp AND lhs.timestamp >= '' AND '' > rhs.timestamp AND rhs.timestamp >= '' AND lhs.timestamp = rhs.timestamp AND lhs.foo = rhs.foo AND lhs.bar = rhs.bar This really liked the timestamp index and we had to be careful to only do it for a few days at a time. It took a few minutes each go but it was definitely doable. Once you get this large you do have to be careful with a few things though: *It's somewhat easy to write super long queries or updates. This can lots of dead rows in your tables. Limit your longest running queries to a day or so. Note that queries are unlikely to take that long but updates with massive date ranges could. SELECT COUNT(*) FROM bigtable too about 30 minutes when the server wasn't under heavy load. *You sometimes get bad plans because: **You don't or can't get enough statistics about a column. **PostgreSQL doesn't capture statistics about two columns together. PostgreSQL has no way of knowing that columnA = 'foo' implies columnB = 'bar' about 30% of the time. Nik On Thu, May 27, 2010 at 5:58 AM, Massa, Harald Armin <c...@ghum.de> wrote: > Dann, > > There really are domains that big, so that there is no more normalization >> or other processes to mitigate the problem. >> >> Examples: >> Microsoft's registered customers database (all MS products bought by any >> customer, including operating systems) >> Tolls taken on the New Jersey road system for FY 2009 >> DNA data from the Human Genome Project >> >> ..... > > please also think of ouer most risk exposed users, the ones using Poker / > Roulette simulation and analyzing software with an PostgrSQL database below. > There are so many rounds of Poker to play .... :) > > Harald > > > -- > GHUM Harald Massa > persuadere et programmare > Harald Armin Massa > Spielberger Straße 49 > 70435 Stuttgart > 0173/9409607 > no fx, no carrier pigeon > - > Using PostgreSQL is mostly about sleeping well at night. >