Re: [PERFORM] 15,000 tables - next step

2005-12-04 Thread William Yu

Michael Riess wrote:

Well, I'd think that's were your problem is.  Not only you have a
(relatively speaking) small server -- you also share it with other
very-memory-hungry services!  That's not a situation I'd like to be in.
Try putting Apache and Tomcat elsewhere, and leave the bulk of the 1GB
to Postgres. 



No can do. I can try to switch to a 2GB machine, but I will not use 
several machines. Not for a 5GB database. ;-)



With 1500 shared buffers you are not really going
anywhere -- you should have ten times that at the very least.



Like I said - I tried to double the buffers and the performance did not 
improve in the least. And I also tried this on a 2GB machine, and 
swapping was not a problem. If I used 10x more buffers, I would in 
essence remove the OS buffers.


Increasing buffers do improve performance -- if you have enough memory. 
You just don't have enough memory to play with. My servers run w/ 10K 
buffers (128MB on 64-bit FC4) and it definitely runs better w/ it at 10K 
versus 1500.


With that many tables, your system catalogs are probably huge. To keep 
your system catalog from continually cycling in-out of buffers/OS 
cache/disk, you need a lot more memory. Ordinarily, I'd say the 500MB 
you have available for Postgres to cache 5GB is a workable ratio. My 
servers all have similar ratios of ~1:10 and they perform pretty good -- 
*except* when the system catalogs bloated due to lack of vacuuming on 
system tables. My app regularly creates & drops thousands of temporary 
tables leaving a lot of dead rows in the system catalogs. (Nearly the 
same situation as you -- instead of 15K live tables, I had 200 live 
tables and tens of thousands of dead table records.) Even with almost 
8GB of RAM dedicated to postgres, performance on every single query -- 
not matter how small the table was -- took forever because the query 
planner had to spend a significant period of time scanning through my 
huge system catalogs to build the execution plan.


While my situtation was fixable by scheduling a nightly vacuum/analyze 
on the system catalogs to get rid of the bazillion dead table/index 
info, you have no choice but to get more memory so you can stuff your 
entire system catalog into buffers/os cache. Personally, w/ 1GB of ECC 
RAM at ~$85, it's a no brainer. Get as much memory as your server can 
support.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] 15,000 tables - next step

2005-12-04 Thread Michael Riess

William Yu schrieb:
> Michael Riess wrote:
>>> Well, I'd think that's were your problem is.  Not only you have a
>>> (relatively speaking) small server -- you also share it with other
>>> very-memory-hungry services!  That's not a situation I'd like to be in.
>>> Try putting Apache and Tomcat elsewhere, and leave the bulk of the 1GB
>>> to Postgres.
>>
>>
>> No can do. I can try to switch to a 2GB machine, but I will not use 
several machines. Not for a 5GB database. ;-)

>>
>>> With 1500 shared buffers you are not really going
>>> anywhere -- you should have ten times that at the very least.
>>>
>>
>> Like I said - I tried to double the buffers and the performance did 
not improve in the least. And I also tried this on a 2GB machine, and 
swapping was not a problem. If I used 10x more buffers, I would in 
essence remove the OS buffers.

>
> Increasing buffers do improve performance -- if you have enough 
memory. You just don't have enough memory to play with. My servers run 
w/ 10K buffers (128MB on 64-bit FC4) and it definitely runs better w/ it 
at 10K versus 1500.

>
> With that many tables, your system catalogs are probably huge.


content2=# select sum(relpages) from pg_class where relname like 'pg_%';
  sum
---
 64088
(1 row)

:-)


> While my situtation was fixable by scheduling a nightly 
vacuum/analyze on the system catalogs to get rid of the bazillion dead 
table/index info, you have no choice but to get more memory so you can 
stuff your entire system catalog into buffers/os cache. Personally, w/ 
1GB of ECC RAM at ~$85, it's a no brainer. Get as much memory as your 
server can support.


The problem is that we use pre-built hardware which isn't configurable. 
We can only switch to a bigger server with 2GB, but that's tops.


I will do the following:

- switch to 10k buffers on a 1GB machine, 20k buffers on a 2GB machine
- try to optimize my connection polls to remember which apps (groups of 
30 tables) were accessed, so that there is a better chance of using caches
- "swap out" tables which are rarely used: export the content, drop the 
table, and re-create it on the fly upon access.


Thanks for your comments!

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Faster db architecture for a twisted table.

2005-12-04 Thread Andreas Pflug

Hélder M. Vieira wrote:


- Original Message - From: "Andreas Pflug" 
<[EMAIL PROTECTED]>


Create a table "sibling" with parent_id, sibling_id and appropriate 
FKs, allowing the model to reflect the relation. At the same time, you 
can drop "mother" and "father", because this relation is covered too




Something like a table describing relationships and a table reflecting 
relationships from both sides, I guess:



create table relationship_type
(
relationship_type_id serial,
relationship_type_description varchar(20)
)

populated with values such as:
1 Child_of
2 Father_of
3 Brother_of
4 Sister_of
...


And then


create table person_relationships
(
source_person_id int4,
relationship_type_id int4,
target_person_id int4
)

populated with values such as:
1 1 2  (person 1 is child of person 2)
2 2 1  (person 2 is father of person 1)



This is an extended version, that could describe general person 
relations, not only family relations. Still, your your 
relationship_types are not precise. Since a two way relation is 
described, only the two Child_of and Brother/Sister are needed; the 
gender should be taken from the person themselves (to avoid data 
inconsistencies as "Mary is a brother of Lucy").

But this isn't pgsql-performances stuff any more.


Regards,
Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Queries taking ages in PG 8.1, have been much faster in PG<=8.0

2005-12-04 Thread Markus Wollny
Title: RE: [PERFORM] Queries taking ages in PG 8.1, have been much faster in PG<=8.0 






Hi!

> -Ursprüngliche Nachricht-
> Von: Tom Lane [mailto:[EMAIL PROTECTED]]
> Gesendet: Donnerstag, 1. Dezember 2005 17:26
> An: Markus Wollny
> Cc: pgsql-performance@postgresql.org
> Betreff: Re: [PERFORM] Queries taking ages in PG 8.1, have
> been much faster in PG<=8.0

> It looks like "set enable_nestloop = 0" might be a workable
> hack for the immediate need. 
>
> Once you're not under deadline,
> I'd like to investigate more closely to find out why 8.1 does
> worse than 8.0 here.


I've just set up a PostgreSQL 8.0.3 installation ...

select version();
  version

 PostgreSQL 8.0.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.5 (Debian 1:3.3.5-13)
(1 row)

...and restored a dump there; here's the explain analyze of the query for 8.0.3:

    QUERY PLAN
---
 Sort  (cost=5193.63..5193.63 rows=3 width=16) (actual time=7365.107..7365.110 rows=3 loops=1)
   Sort Key: source."position"
   ->  HashAggregate  (cost=5193.59..5193.60 rows=3 width=16) (actual time=7365.034..7365.041 rows=3 loops=1)
 ->  Nested Loop  (cost=0.00..5193.57 rows=3 width=16) (actual time=3190.642..7300.820 rows=11086 loops=1)
   ->  Nested Loop  (cost=0.00..3602.44 rows=4 width=20) (actual time=3169.968..5875.153 rows=11087 loops=1)
 ->  Nested Loop  (cost=0.00..1077.95 rows=750 width=16) (actual time=36.599..2778.129 rows=158288 loops=1)
   ->  Seq Scan on handy_java source  (cost=0.00..1.03 rows=3 width=14) (actual time=6.503..6.514 rows=3 loops=1)
   ->  Index Scan using idx02_performance on answer  (cost=0.00..355.85 rows=250 width=8) (actual time=10.071..732.746 rows=52763 loops=3)
 Index Cond: ((answer.question_id = 16) AND (answer.value = "outer".id))
 ->  Index Scan using pk_participant on participant  (cost=0.00..3.35 rows=1 width=4) (actual time=0.016..0.016 rows=0 loops=158288)
   Index Cond: (participant.session_id = "outer".session_id)
   Filter: ((status = 1) AND (date_trunc('month'::text, created) = date_trunc('month'::text, (now() - '2 mons'::interval
   ->  Index Scan using idx_answer_session_id on answer  (cost=0.00..397.77 rows=1 width=4) (actual time=0.080..0.122 rows=1 loops=11087)
 Index Cond: ("outer".session_id = answer.session_id)
 Filter: ((question_id = 6) AND (value = 1))
 Total runtime: 7365.461 ms
(16 rows)

Does this tell you anything useful? It's not on the same machine, mind you, but configuration for PostgreSQL is absolutely identical (apart from the autovacuum-lines which 8.0.3 doesn't like).

Kind regards

   Markus





Re: [PERFORM] Queries taking ages in PG 8.1, have been much faster in PG<=8.0

2005-12-04 Thread Tom Lane
"Markus Wollny" <[EMAIL PROTECTED]> writes:
>> Once you're not under deadline,
>> I'd like to investigate more closely to find out why 8.1 does
>> worse than 8.0 here.

> Does this tell you anything useful? It's not on the same machine, mind
> you, but configuration for PostgreSQL is absolutely identical (apart
> from the autovacuum-lines which 8.0.3 doesn't like).

The data is not quite the same, right?  I notice different numbers of
rows being returned.  But anyway, it seems the problem is with the upper
scan on "answers", which 8.0 does like this:

  ->  Index Scan using idx_answer_session_id on answer  (cost=0.00..397.77 
rows=1 width=4) (actual time=0.080..0.122 rows=1 loops=11087)
Index Cond: ("outer".session_id = answer.session_id)
Filter: ((question_id = 6) AND (value = 1))

and 8.1 does like this:

  ->  Bitmap Heap Scan on answer  (cost=185.85..187.26 rows=1 width=4) (actual 
time=197.490..197.494 rows=1 loops=9806)
Recheck Cond: (("outer".session_id = answer.session_id) AND 
(answer.question_id = 6) AND (answer.value = 1))
->  BitmapAnd  (cost=185.85..185.85 rows=1 width=0) (actual 
time=197.421..197.421 rows=0 loops=9806)
  ->  Bitmap Index Scan on idx_answer_session_id  (cost=0.00..2.83 
rows=236 width=0) (actual time=0.109..0.109 rows=49 loops=9806)
Index Cond: ("outer".session_id = answer.session_id)
  ->  Bitmap Index Scan on idx02_performance (cost=0.00..182.77 
rows=20629 width=0) (actual time=195.742..195.742 rows=165697 loops=9806)
Index Cond: ((question_id = 6) AND (value = 1))

It seems that checking question_id/value via the index, rather than
directly on the fetched tuple, is a net loss here.  It looks like 8.1
would have made the right plan choice if it had made a better estimate
of the combined selectivity of the question_id and value conditions,
so ultimately this is another manifestation of the lack of cross-column
statistics.  What I find interesting though is that the plain index scan
in 8.0 is so enormously cheaper than it's estimated to be.  Perhaps the
answer table in your 8.0 installation is almost perfectly ordered by
session_id?

Are you using default values for the planner cost parameters?  It looks
like reducing random_page_cost would help bring the planner estimates
into line with reality on your machines.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Faster db architecture for a twisted table.

2005-12-04 Thread Klint Gore
On Sat, 3 Dec 2005 23:00:21 +, Rodrigo Madera <[EMAIL PROTECTED]> wrote:
> Imagine a table named Person with "first_name" and "age".
> 
> Now let's make it fancy and put a "mother" and "father" field that is
> a reference to the own table (Person). And to get even fuzzier, let's
> drop in some siblings:
> 
> CREATE TABLE person(
>id bigint PRIMARY KEY,
>first_name TEXT,
>age INT,
>mother bigint REFERENCES person,
>father biging REFERENCES person,
>siblings array of bigints  (don't remember the syntax, but you get the 
> point)
> );
> 
> Well, this is ok, but imagine a search for  "brothers of person id
> 34". We would have to search inside the record's 'siblings' array. Is
> this a bad design? is this going to be slow?

Do you need the array at all?

alter table person add column gender;

select id 
>from person
where gender = 'male' 
and (mother = (select mother from person where id = 34)
 OR father = (select father from person where id = 34))

You can change the OR depending if you want half brothers or not

> What would be a better design to have these kind of relationships?
> (where you need several references to rows inside the table we are).

We use that structure (without the sibiling array) for our systems. 
Siblings are calculated from parents (in our case, livestock, there can
be hundreds).  You have to be prepared to use recursive functions and
make sure that a person doesnt appear anywhere higher in their family
tree.

klint.

+---+-+
: Klint Gore: "Non rhyming:
: EMail   : [EMAIL PROTECTED]   :  slang - the:
: Snail   : A.B.R.I.:  possibilities  :
: Mail  University of New England   :  are useless"   :
:   Armidale NSW 2351 Australia : L.J.J.  :
: Fax : +61 2 6772 5376 : :
+---+-+

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] Dividing up a single 250GB RAID10 server for postgres

2005-12-04 Thread Rory Campbell-Lange
Hi. We have a server provided for a test of a web application with the
following specifications:

1 Dual core 1.8GHz Opteron chip
6 GB RAM
approx 250GB of RAID10 storage (LSI card + BBU, 4 x 15000 RPM,16MB
Cache SCSI disks)

The database itself is very unlikely to use up more than 50GB of storage
-- however we are going to be storing pictures and movies etc etc on the
server.

I understand that it is better to run pg_xlog on a separate spindle but
we don't have that option available at present.

Normally I use ext3. I wondered if I should make my normal partitions
and then make a +/- 200GB LVM VG and then slice that initially into a
100GB ext3 data directory and a 50GB xfs postgres data area, giving
100GB to use between these as they grow. I haven't used LVM with xfs
before, however.

Advice gratefully received.
Rory


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings