Re: [PERFORM] two memory-consuming postgres processes

2008-05-03 Thread Matthew Wakeling

On Fri, 2 May 2008, PFC wrote:
CREATE TABLE derived AS SELECT ... FROM ... (perform all your derived 
calculations here)


Given what you have said (that you really want all the data in one table) 
it may be best to proceed like this:


First, take your original table, create an index on the primary key field, 
and CLUSTER on that index.


CREATE TABLE derived AS SELECT ... FROM ... ORDER BY primary key field
CREATE INDEX derived_pk ON derived(primary key field)

Repeat those last two commands ad nauseum.

Then, when you want a final full table, run:

CREATE TABLE new_original AS SELECT * FROM original, derived, derived2,
  ... WHERE original.pk = derived.pk ...

That should be a merge join, which should run really quickly, and you can 
then create all the indexes you want on it.


Matthew

--
When I first started working with sendmail, I was convinced that the cf
file had been created by someone bashing their head on the keyboard. After
a week, I realised this was, indeed, almost certainly the case.
   -- Unknown

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-03 Thread Merlin Moncure
On Fri, May 2, 2008 at 4:51 PM, Alexy Khrabrov <[EMAIL PROTECTED]> wrote:
>
>  On May 2, 2008, at 1:40 PM, Scott Marlowe wrote:
>
> > Again, a database protects your data from getting scrambled should the
> > program updating it quit halfway through etc...
> >
>
>  Right -- but this is a data mining work, I add a derived column to a row,
> and it's computed from that very row and a small second table which should
> fit in RAM.

Full table update of a single field is one of the worst possible
operations with PostgreSQL.  mysql is better at this because lack of
proper transactions and full table locking allow the rows to be
(mostly) updated in place.  Ideally, you should be leveraging the
power of PostgreSQL so that you can avoid the full table update if
possible.  Maybe if you step back and think about the problem you may
be able to come up with a solution that is more efficient.

Also, if you must do it this way, (as others suggest), do CREATE TABLE
new_table AS SELECT, then create keys, and drop the old table when
done.  This is much faster than update.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Vacuum statistics

2008-05-03 Thread Francisco Reyes

[EMAIL PROTECTED] writes:


What version of Postgres you are running ?


8.2

If you are using 8.3, you can use pg_stat_all_tables.If Not you can use 
http://www.postgresql.org/docs/current/static/pgstattuple.html>http:// 
www.postgresql.org/docs/current/static/pgstattuple.html


pgstattuple is also a 8.3 function. Anything simmilar to it in 8.2?

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Vacuum statistics

2008-05-03 Thread Mark Kirkwood

Francisco Reyes wrote:

[EMAIL PROTECTED] writes:


What version of Postgres you are running ?


8.2

If you are using 8.3, you can use pg_stat_all_tables.If Not you can 
use 
http://www.postgresql.org/docs/current/static/pgstattuple.html>http:// 
www.postgresql.org/docs/current/static/pgstattuple.html


pgstattuple is also a 8.3 function. Anything simmilar to it in 8.2?



It is available as a contrib module in 8.2, but needs to be installed 
(see contrib/pgstattuple).


regards

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance