Re: [PERFORM] count(*) slow on large tables

2003-10-04 Thread Christopher Kings-Lynne

On our message boards each post is a row.  The powers that be like to know
how many posts there are total (In addition to 'today')-
select count(*) from posts is how it has been
done on our informix db.  With our port to PG I instead select reltuples
pg_class.
We have exactly the same situation, except we just added a 'num_replies' 
field to each thread and a 'num_posts' field to each forum, so that 
getting that information out is a very fast operation.  Because, of 
course, there are hundreds of times more reads of that information than 
writes...

Chris

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] [HACKERS] Index/Function organized table layout (from Re:

2003-10-04 Thread Hannu Krosing
Christopher Browne kirjutas R, 03.10.2003 kell 00:57:
> [EMAIL PROTECTED] (Jean-Luc Lachance) writes:
> > That's one of the draw back of MVCC.  
> > I once suggested that the transaction number and other house keeping
> > info be included in the index, but was told to forget it...
> > It would solve once and for all the issue of seq_scan vs index_scan.
> > It would simplify the aggregate problem.
> 
> It would only simplify _one_ case, namely the case where someone cares
> about the cardinality of a relation, and it would do that at
> _considerable_ cost.
> 
> A while back I outlined how this would have to be done, and for it to
> be done efficiently, it would be anything BUT simple.  

Could this be made a TODO item, perhaps with your attack plan. 
Of course as strictly optional feature useful only for special situations
(see below)

I cross-post this to [HACKERS] as it seem relevant to a problem recently
discussed there.

> It would be very hairy to implement it correctly, and all this would
> cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;"

Not really. Just yesterday there was a discussion on [HACKERS] about
implementing btree-organized tables, which would be much less needed if
the visibility info were kept in indexes. 

> If you had a single WHERE clause attached, you would have to revert to
> walking through the tuples looking for the ones that are live and
> committed, which is true for any DBMS.

If the WHERE clause could use the same index (or any index with
visibility info) then there would be no need for "walking through the
tuples" in data relation.

the typical usecase cited on [HACKERS] was time series data, where
inserts are roughly in (timestamp,id)order but queries in (id,timestamp)
order. Now if the index would include all relevant fields
(id,timestamp,data1,data2,...,dataN) then the query could run on index
only touching just a few pages and thus vastly improving performance. I
agree that this is not something everybody needs, but when it is needed
it is needed bad.

> And it still begs the same question, of why the result of this query
> would be particularly meaningful to anyone.  I don't see the
> usefulness; I don't see the value of going to the considerable effort
> of "fixing" this purported problem.

Being able to do fast count(*) is just a side benefit.


Hannu


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Tuning/performance issue...

2003-10-04 Thread Jeff
On Fri, 3 Oct 2003, Bruce Momjian wrote:

>
> I have updated the FAQ to be:
>
>   In comparison to MySQL or leaner database systems, we are
>   faster for multiple users, complex queries, and a read/write query
>   load.  MySQL is faster for SELECT queries done by a few users.
>
> Is this accurate?  It seems so.
>
>

Another thing I noticed - If you use a dataset that can live in mysql's
query cache / os cache it screams, until it has to hit the disk. then
GRINDING HALT.

It would be nice if someone (I don't have the time now) did a comparison
of say:
selct value where name = XXX; [where xxx varies] with 1,10,20,50
connections

then make progressively more complex queries. And occasionally point out
mysql silly omissions:
select * from myview where id = 1234
[Oh wait! mysql doesn't have views. Ooopsy!]

Wrapping up - PG is not that slow for simple queries either.  It can be
rather zippy - and PREPARE can give HUGE gains - even for simple
statements.   I've often wondered if YACC, etc is a bottleneck (You can
only go as fast as your parser can go).

Hurray for PG!

And I'm giving my PG presentation monday.  I hope to post it tuesday after
I update with comments I receive and remove confidential information.

--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] reindex/vacuum locking/performance?

2003-10-04 Thread Andrew Sullivan
On Fri, Oct 03, 2003 at 02:24:42PM -0600, Rob Nagler wrote:
> I've read some posts that says vacuum doesn't lock, but my experience
> today indicates the opposite.  It seemed that "vacuum full analyze"

VACUUM doesn't.  VACUUM FULL does.

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] reindex/vacuum locking/performance?

2003-10-04 Thread Andrew Sullivan
On Fri, Oct 03, 2003 at 11:49:03PM -0400, Tom Lane wrote:
> 
> What if said SELECTs are using the index in question?

That's a good reason to build a new index and, when it's done, drop
the old one.  It still prevents writes, of course.

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] reindex/vacuum locking/performance?

2003-10-04 Thread Andrew Sullivan
On Sat, Oct 04, 2003 at 12:29:55AM +0100, Matt Clark wrote:
> My real world experience on a *very* heavily updated OLTP type DB, following
> advice from this list (thanks guys!), is that there is essentially zero cost
> to going ahead and vacuuming as often as you feel like it.  Go crazy, and
> speed up your DB!

That's not quite true.  If vacuums start running into each other, you
can very easily start eating up all your I/O bandwidth.  Even if you
gots lots of it.

Also, a vacuum pretty much destroys your shared buffers, so you have
to be aware of that trade-off too.  Vacuum is not free.  It's _way_
cheaper than it used to be, though.

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] count(*) slow on large tables

2003-10-04 Thread Bruce Momjian
Christopher Browne wrote:
> [EMAIL PROTECTED] (Jean-Luc Lachance) writes:
> > That's one of the draw back of MVCC.  
> > I once suggested that the transaction number and other house keeping
> > info be included in the index, but was told to forget it...
> > It would solve once and for all the issue of seq_scan vs index_scan.
> > It would simplify the aggregate problem.
> 
> It would only simplify _one_ case, namely the case where someone cares
> about the cardinality of a relation, and it would do that at
> _considerable_ cost.
> 
> A while back I outlined how this would have to be done, and for it to
> be done efficiently, it would be anything BUT simple.  
> 
> It would be very hairy to implement it correctly, and all this would
> cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;"

We do have a TODO item:

* Consider using MVCC to cache count(*) queries with no WHERE clause

The idea is to cache a recent count of the table, then have
insert/delete add +/- records to the count.  A COUNT(*) would get the
main cached record plus any visible +/- records.  This would allow the
count to return the proper value depending on the visibility of the
requesting transaction, and it would require _no_ heap or index scan.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] count(*) slow on large tables

2003-10-04 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> We do have a TODO item:
>   * Consider using MVCC to cache count(*) queries with no WHERE clause

> The idea is to cache a recent count of the table, then have
> insert/delete add +/- records to the count.  A COUNT(*) would get the
> main cached record plus any visible +/- records.  This would allow the
> count to return the proper value depending on the visibility of the
> requesting transaction, and it would require _no_ heap or index scan.

... and it would give the wrong answers.  Unless the cache is somehow
snapshot-aware, so that it can know which other transactions should be
included in your count.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] count(*) slow on large tables

2003-10-04 Thread Bruce Momjian
Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > We do have a TODO item:
> > * Consider using MVCC to cache count(*) queries with no WHERE clause
> 
> > The idea is to cache a recent count of the table, then have
> > insert/delete add +/- records to the count.  A COUNT(*) would get the
> > main cached record plus any visible +/- records.  This would allow the
> > count to return the proper value depending on the visibility of the
> > requesting transaction, and it would require _no_ heap or index scan.
> 
> ... and it would give the wrong answers.  Unless the cache is somehow
> snapshot-aware, so that it can know which other transactions should be
> included in your count.

The cache is an ordinary table, with xid's on every row.  I meant it
would require no index/heap scans of the large table --- it would still
require a scan of the "count" table.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] count(*) slow on large tables

2003-10-04 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> ... and it would give the wrong answers.  Unless the cache is somehow
>> snapshot-aware, so that it can know which other transactions should be
>> included in your count.

> The cache is an ordinary table, with xid's on every row.  I meant it
> would require no index/heap scans of the large table --- it would still
> require a scan of the "count" table.

Oh, that idea.  Yeah, I think we had concluded it might work.  You'd
better make the TODO item link to that discussion, because there's sure
been plenty of discussion of ideas that wouldn't work.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] count(*) slow on large tables

2003-10-04 Thread Bruce Momjian
Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > Tom Lane wrote:
> >> ... and it would give the wrong answers.  Unless the cache is somehow
> >> snapshot-aware, so that it can know which other transactions should be
> >> included in your count.
> 
> > The cache is an ordinary table, with xid's on every row.  I meant it
> > would require no index/heap scans of the large table --- it would still
> > require a scan of the "count" table.
> 
> Oh, that idea.  Yeah, I think we had concluded it might work.  You'd
> better make the TODO item link to that discussion, because there's sure
> been plenty of discussion of ideas that wouldn't work.

OK, I beefed up the TODO:

* Use a fixed row count and a +/- count with MVCC visibility rules
  to allow fast COUNT(*) queries with no WHERE clause(?)

I can always give the details if someone asks.  It doesn't seem complex
enough for a separate TODO.detail item.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] count(*) slow on large tables

2003-10-04 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> It doesn't seem complex enough for a separate TODO.detail item.

I thought it was, if only because it is so easy to think of wrong
implementations.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] COUNT(*) again (was Re: [HACKERS] Index/Function organized table layout)

2003-10-04 Thread Tom Lane
Hannu Krosing <[EMAIL PROTECTED]> writes:
> Christopher Browne kirjutas R, 03.10.2003 kell 00:57:
>> A while back I outlined how this would have to be done, and for it to
>> be done efficiently, it would be anything BUT simple.  

> Could this be made a TODO item, perhaps with your attack plan. 

If I recall that discussion correctly, no one including Christopher
thought the attack plan was actually reasonable.

What this keeps coming down to is that an optimization that helps only
COUNT(*)-of-one-table-with-no-WHERE-clause would be too expensive in
development and maintenance effort to justify its existence.

At least if you insist on an exact, MVCC-correct answer.  So far as I've
seen, the actual use cases for unqualified COUNT(*) could be handled
equally well by an approximate answer.  What we should be doing rather
than wasting large amounts of time trying to devise exact solutions is
telling people to look at pg_class.reltuples for approximate answers.
We could also be looking at beefing up support for that approach ---
maybe provide some syntactic sugar for the lookup, maybe see if we can
update reltuples in more places than we do now, make sure that the
autovacuum daemon includes "keep reltuples accurate" as one of its
design goals, etc etc.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] COUNT(*) again (was Re: [HACKERS] Index/Function

2003-10-04 Thread Hannu Krosing
Tom Lane kirjutas L, 04.10.2003 kell 19:07:
> Hannu Krosing <[EMAIL PROTECTED]> writes:
> > Christopher Browne kirjutas R, 03.10.2003 kell 00:57:
> >> A while back I outlined how this would have to be done, and for it to
> >> be done efficiently, it would be anything BUT simple.  
> 
> > Could this be made a TODO item, perhaps with your attack plan. 
> 
> If I recall that discussion correctly, no one including Christopher
> thought the attack plan was actually reasonable.
> 
> What this keeps coming down to is that an optimization that helps only
> COUNT(*)-of-one-table-with-no-WHERE-clause would be too expensive in
> development and maintenance effort to justify its existence.

Please read further in my email ;)

The point I was trying to make was that faster count(*)'s is just a side
effect. If we could (conditionally) keep visibility info in indexes,
then this would also solve the problem fo much more tricky question of
index-structured tables.

Count(*) is *not* the only query that could benefit from not needing to
go to actual data table for visibilty info, The much more needed case
would be the "inveres time series" type of queries, which would
otherways trash cache pages badly.


Hannu


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] COUNT(*) again (was Re: [HACKERS] Index/Function organized table layout)

2003-10-04 Thread Tom Lane
Hannu Krosing <[EMAIL PROTECTED]> writes:
> The point I was trying to make was that faster count(*)'s is just a side
> effect. If we could (conditionally) keep visibility info in indexes,

I think that's not happening, conditionally or otherwise.  The atomicity
problems alone are sufficient reason why not, even before you look at
the performance issues.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] Uses for Index/Function organizing

2003-10-04 Thread James Rogers
On 10/4/03 2:00 AM, "Hannu Krosing" <[EMAIL PROTECTED]> wrote:
> 
> If the WHERE clause could use the same index (or any index with
> visibility info) then there would be no need for "walking through the
> tuples" in data relation.
> 
> the typical usecase cited on [HACKERS] was time series data, where
> inserts are roughly in (timestamp,id)order but queries in (id,timestamp)
> order. Now if the index would include all relevant fields
> (id,timestamp,data1,data2,...,dataN) then the query could run on index
> only touching just a few pages and thus vastly improving performance. I
> agree that this is not something everybody needs, but when it is needed
> it is needed bad.



I would add that automatically index-organizing tuples isn't just useful for
time-series data (though it is a good example), but can be used to
substantially improve the query performance of any really large table in a
number of different and not always direct ways.  Once working sets routinely
exceed the size of physical RAM, buffer access/utilization efficiency often
becomes the center of performance tuning, but not one that many people know
much about.

One of the less direct ways of using btree-organized tables for improving
scalability is to "materialize" table indexes of tables that *shouldn't* be
btree-organized.  Not only can you turn tables into indexes, but you can
also turn indexes into tables, which can have advantages in some cases.


For example, I did some scalability consulting at a well-known movie rental
company with some very large Oracle databases running on big Sun boxen.  One
of the biggest problems was that their rental history table, which had a
detailed record of every movie ever rented by every customer, had grown so
large that the performance was getting painfully slow.  To make matters
worse, it and a few related tables had high concurrent usage, a mixture of
many performance-sensitive queries grabbing windows of a customer's history
plus a few broader OLAP queries which were not time sensitive.  Everything
was technically optimized in a relational and basic configuration sense, and
the database guys at the company were at a loss on how to fix this problem.
Performance of all queries was essentially bound by how fast pages could be
moved between the disk and buffers.

Issue #1:  The history rows had quite a lot of columns and the OLAP
processes used non-primary indexes, so the table was not particularly
suitable for btree-organizing.

Issue #2:  Partitioning was not an option because it would have exceeded
certain limits in Oracle (at that time, I don't know if that has changed).

Issue #3:  Although customer histories were being constantly queried, data
needed most was really an index view of the customers history, not the
details of the history itself.


The solution I came up with was to use a synced btree-organized partial
clone of the main history table that only contained a small number of key
columns that mattered for generating customer history indexes in the
applications that used them.  While this substantially increased the disk
space footprint for the same data (since we were cloning it), it greatly
reduced the total number of cache misses for the typical query, only
fetching the full history row pages when actually needed.  In other words,
basically concentrating more buffer traffic into a smaller number of page
buffers.  What we had was an exceedingly active but relatively compact
materialized index of the history table that could essentially stay resident
in RAM, and a much less active history table+indexes that while less likely
to be buffered than before, had pages accessed at such a reduced frequency
that there was a huge net performance gain because disk access plummeted.

Average performance improvement for the time sensitive queries: 50-70x

So btree-organized tables can do more than make tables behave like indexes.
They can also make indexes behave like tables.  Both are very useful in some
cases when your working set exceeds the physical buffer space.  For smaller
databases this has much less utility and users need to understand the
limitations, nonetheless when tables and databases get really big it becomes
an important tool in the tool belt.

Cheers,

-James Rogers
 [EMAIL PROTECTED]


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] count(*) slow on large tables

2003-10-04 Thread Christopher Browne
Quoth [EMAIL PROTECTED] (Tom Lane):
> Bruce Momjian <[EMAIL PROTECTED]> writes:
>> We do have a TODO item:
>>  * Consider using MVCC to cache count(*) queries with no WHERE clause
>
>> The idea is to cache a recent count of the table, then have
>> insert/delete add +/- records to the count.  A COUNT(*) would get the
>> main cached record plus any visible +/- records.  This would allow the
>> count to return the proper value depending on the visibility of the
>> requesting transaction, and it would require _no_ heap or index scan.
>
> ... and it would give the wrong answers.  Unless the cache is somehow
> snapshot-aware, so that it can know which other transactions should be
> included in your count.

[That's an excellent summary that Bruce did of what came out of the
previous discussion...]

If this "cache" was a table, itself, the visibility of its records
should be identical to that of the visibility of the "real" records.
+/- records would become visible when the transaction COMMITed, at the
very same time the source records became visible.

I thought, at one point, that it would be a slick idea for "record
compression" to take place automatically; when you do a COUNT(*), the
process would include compressing multiple records down to one.
Unfortunately, that turns out to be Tremendously Evil if the same
COUNT(*) were being concurrently processed in multiple transactions.
Both would repeat much the same work, and this would ultimately lead
to one of the transactions aborting.  [I recently saw this effect
occur, um, a few times...]

For this not to have Evil Effects on unsuspecting transactions, we
would instead require some process analagous to VACUUM, where a single
transaction would be used to compress the "counts table" down to one
record per table.  Being independent of "user transactions," it could
safely compress the data without injuring unsuspecting transactions.

But in most cases, the cost of this would be pretty prohibitive.
Every transaction that adds a record to a table leads to a record
being added to table "pg_exact_row_counts".  If transactions typically
involve adding ONE row to any given table, this effectively doubles
the update traffic.  Ouch.  That means that in a _real_
implementation, it would make sense to pick and choose the tables that
would be so managed.

In my earlier arguing of "You don't really want that!", while I may
have been guilty of engaging in a _little_ hyperbole, I was certainly
_not_ being facetious overall.  At work, we tell the developers "avoid
doing COUNT(*) inside ordinary transactions!", and that is certainly
NOT facetious comment.  I recall a case a while back where system
performance was getting brutalized by a lurking COUNT(*).  (Combined
with some other pathological behaviour, of course!)  And note that
this wasn't a query that the TODO item could address; it was of the
form "SELECT COUNT(*) FROM SOME_TABLE WHERE OWNER = VALUE;"

As you have commented elsewhere in the thread, much of the time, the
point of asking for COUNT(*) is often to get some idea of table size,
where the precise number isn't terribly important in comparison with
getting general magnitude.  Improving the ability to get approximate
values would be of some value.

I would further argue that "SELECT COUNT(*) FROM TABLE" isn't
particularly useful even when precision _is_ important.  If I'm
working on reports that would be used to reconcile things, the queries
I use are a whole lot more involved than that simple form.  It is far
more likely that I'm using a GROUP BY.

It is legitimate to get wishful and imagine that it would be nice if
we could get the value of that query "instantaneously."  It is also
legitimate to think that the effort required to implement that might
be better used on improving other things.
-- 
(reverse (concatenate 'string "ac.notelrac.teneerf" "@" "454aa"))
http://www3.sympatico.ca/cbbrowne/
"very few people approach me in real life and insist on proving they
are drooling idiots."  -- Erik Naggum, comp.lang.lisp

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] COUNT(*) again (was Re: [HACKERS] Index/Function organized

2003-10-04 Thread Bruce Momjian
Tom Lane wrote:
> Hannu Krosing <[EMAIL PROTECTED]> writes:
> > The point I was trying to make was that faster count(*)'s is just a side
> > effect. If we could (conditionally) keep visibility info in indexes,
> 
> I think that's not happening, conditionally or otherwise.  The atomicity
> problems alone are sufficient reason why not, even before you look at
> the performance issues.

What are the atomicity problems of adding a create/expire xid to the
index tuples?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] COUNT(*) again (was Re: [HACKERS] Index/Function organized table layout)

2003-10-04 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> I think that's not happening, conditionally or otherwise.  The atomicity
>> problems alone are sufficient reason why not, even before you look at
>> the performance issues.

> What are the atomicity problems of adding a create/expire xid to the
> index tuples?

You can't update a tuple's status in just one place ... you have to
update the copies in the indexes too.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match