Re: [GENERAL] why does count take so long?

Jean-Luc Lachance Wed, 10 Sep 2003 09:54:16 -0700

Christopher,

You did not quite understand.


The counts in question is the actual counts (deltas) for the
transtactions.

the tuple should be something like:
xid, reltype, insert_count, update_count, delete_count

When a COUNT(*) is issued the commited tuples are totaled up stored as
xid 0 or whatever and the commited tuples deleted (all this with
appropriate locks).

I am not sure if there is a need for the update count.

JLL


Christopher Browne wrote:
> 
> In the last exciting episode, [EMAIL PROTECTED] (Jean-Luc Lachance) wrote:
> > How about keeping counts of inserts, deletes and updates per table per
> > transaction as part of the live statistics?
> 
> Aye, there's the rub.
> 
> That's exactly what _won't_ work, and that's exactly the case that is
> somewhat pathological under MVCC.
> 
> With MVCC, data "appears as if by magic" when its transaction COMMITs,
> thereby revealing the rows to the world.
> 
> Unfortunately, there's no simple way of making updates to counts
> "simply appear" when they take effect, not without turning the updates
> into a concurrency bottleneck.
> 
> Here's a bit of a wild thought...
> 
> Assume a table with schema as follows:
> create table pg_count (
>   xid integer,   ---  Actually, it's not an integer, right, Andrew? :-(
>   reltype oid,
>   count integer
> );
> 
> Every transaction, "xid," affects some number of tuples.  So that for
> a transaction, #2134151 that adds 5 rows to table with oid 12345 and deletes 4
> rows from table with 45678, part of the transaction would include
> inserting these rows:
> 
>   insert into pg_count (xid, reltype, count) values (2134151, 12345, 5);
>   insert into pg_count (xid, reltype, count) values (2134151, 45678, -4);
> 
> In order to get the count for table 12345, you could then go to
> pg_count and request:
>   select sum(count) from pg_count where reltype = 12345;
> 
> The size of this table would increase every time a transaction gets
> committed, so presumably part of VACUUM TABLE would be a
> collection/summarization, thus...
> 
>   -- Collect existing stats into 1 row
>   insert into pg_count(xid, reltype, count) values (currxid,
>     currtable, select sum(count) from pg_count where reltype =
>     currtable);
> 
>   -- Delete the old stats
>   delete from pg_count where reltype = currtable and xid <> currxid;
> 
> This will cope with concurrency reasonably well (modulo having to make
> sure that the "collect/delete" transaction is a serialized one).
> 
> Unfortunately, if a table is updated frequently, the summary
>   select sum(count) from pg_count where reltype = 12345;
> will have to collect together quite a large number of entries, which
> makes this "less cheap."
> 
> That suggests an optimization; any time the COUNT is selected, the old
> stats can and should be collected into 1 row and the old data deleted.
> --
> wm(X,Y):-write(X),write('@'),write(Y). wm('aa454','freenet.carleton.ca').
> http://www3.sympatico.ca/cbbrowne/nonrdbms.html
> Sturgeon's Law: 90% of *EVERYTHING* is crud.
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

Re: [GENERAL] why does count take so long?

Reply via email to