Re: [PERFORM] 15,000 tables - next step
Michael Riess wrote: Well, I'd think that's were your problem is. Not only you have a (relatively speaking) small server -- you also share it with other very-memory-hungry services! That's not a situation I'd like to be in. Try putting Apache and Tomcat elsewhere, and leave the bulk of the 1GB to Postgres. No can do. I can try to switch to a 2GB machine, but I will not use several machines. Not for a 5GB database. ;-) With 1500 shared buffers you are not really going anywhere -- you should have ten times that at the very least. Like I said - I tried to double the buffers and the performance did not improve in the least. And I also tried this on a 2GB machine, and swapping was not a problem. If I used 10x more buffers, I would in essence remove the OS buffers. Increasing buffers do improve performance -- if you have enough memory. You just don't have enough memory to play with. My servers run w/ 10K buffers (128MB on 64-bit FC4) and it definitely runs better w/ it at 10K versus 1500. With that many tables, your system catalogs are probably huge. To keep your system catalog from continually cycling in-out of buffers/OS cache/disk, you need a lot more memory. Ordinarily, I'd say the 500MB you have available for Postgres to cache 5GB is a workable ratio. My servers all have similar ratios of ~1:10 and they perform pretty good -- *except* when the system catalogs bloated due to lack of vacuuming on system tables. My app regularly creates & drops thousands of temporary tables leaving a lot of dead rows in the system catalogs. (Nearly the same situation as you -- instead of 15K live tables, I had 200 live tables and tens of thousands of dead table records.) Even with almost 8GB of RAM dedicated to postgres, performance on every single query -- not matter how small the table was -- took forever because the query planner had to spend a significant period of time scanning through my huge system catalogs to build the execution plan. While my situtation was fixable by scheduling a nightly vacuum/analyze on the system catalogs to get rid of the bazillion dead table/index info, you have no choice but to get more memory so you can stuff your entire system catalog into buffers/os cache. Personally, w/ 1GB of ECC RAM at ~$85, it's a no brainer. Get as much memory as your server can support. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] 15,000 tables - next step
William Yu schrieb: > Michael Riess wrote: >>> Well, I'd think that's were your problem is. Not only you have a >>> (relatively speaking) small server -- you also share it with other >>> very-memory-hungry services! That's not a situation I'd like to be in. >>> Try putting Apache and Tomcat elsewhere, and leave the bulk of the 1GB >>> to Postgres. >> >> >> No can do. I can try to switch to a 2GB machine, but I will not use several machines. Not for a 5GB database. ;-) >> >>> With 1500 shared buffers you are not really going >>> anywhere -- you should have ten times that at the very least. >>> >> >> Like I said - I tried to double the buffers and the performance did not improve in the least. And I also tried this on a 2GB machine, and swapping was not a problem. If I used 10x more buffers, I would in essence remove the OS buffers. > > Increasing buffers do improve performance -- if you have enough memory. You just don't have enough memory to play with. My servers run w/ 10K buffers (128MB on 64-bit FC4) and it definitely runs better w/ it at 10K versus 1500. > > With that many tables, your system catalogs are probably huge. content2=# select sum(relpages) from pg_class where relname like 'pg_%'; sum --- 64088 (1 row) :-) > While my situtation was fixable by scheduling a nightly vacuum/analyze on the system catalogs to get rid of the bazillion dead table/index info, you have no choice but to get more memory so you can stuff your entire system catalog into buffers/os cache. Personally, w/ 1GB of ECC RAM at ~$85, it's a no brainer. Get as much memory as your server can support. The problem is that we use pre-built hardware which isn't configurable. We can only switch to a bigger server with 2GB, but that's tops. I will do the following: - switch to 10k buffers on a 1GB machine, 20k buffers on a 2GB machine - try to optimize my connection polls to remember which apps (groups of 30 tables) were accessed, so that there is a better chance of using caches - "swap out" tables which are rarely used: export the content, drop the table, and re-create it on the fly upon access. Thanks for your comments! ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Faster db architecture for a twisted table.
Hélder M. Vieira wrote: - Original Message - From: "Andreas Pflug" <[EMAIL PROTECTED]> Create a table "sibling" with parent_id, sibling_id and appropriate FKs, allowing the model to reflect the relation. At the same time, you can drop "mother" and "father", because this relation is covered too Something like a table describing relationships and a table reflecting relationships from both sides, I guess: create table relationship_type ( relationship_type_id serial, relationship_type_description varchar(20) ) populated with values such as: 1 Child_of 2 Father_of 3 Brother_of 4 Sister_of ... And then create table person_relationships ( source_person_id int4, relationship_type_id int4, target_person_id int4 ) populated with values such as: 1 1 2 (person 1 is child of person 2) 2 2 1 (person 2 is father of person 1) This is an extended version, that could describe general person relations, not only family relations. Still, your your relationship_types are not precise. Since a two way relation is described, only the two Child_of and Brother/Sister are needed; the gender should be taken from the person themselves (to avoid data inconsistencies as "Mary is a brother of Lucy"). But this isn't pgsql-performances stuff any more. Regards, Andreas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Queries taking ages in PG 8.1, have been much faster in PG<=8.0
Title: RE: [PERFORM] Queries taking ages in PG 8.1, have been much faster in PG<=8.0 Hi! > -Ursprüngliche Nachricht- > Von: Tom Lane [mailto:[EMAIL PROTECTED]] > Gesendet: Donnerstag, 1. Dezember 2005 17:26 > An: Markus Wollny > Cc: pgsql-performance@postgresql.org > Betreff: Re: [PERFORM] Queries taking ages in PG 8.1, have > been much faster in PG<=8.0 > It looks like "set enable_nestloop = 0" might be a workable > hack for the immediate need. > > Once you're not under deadline, > I'd like to investigate more closely to find out why 8.1 does > worse than 8.0 here. I've just set up a PostgreSQL 8.0.3 installation ... select version(); version PostgreSQL 8.0.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.5 (Debian 1:3.3.5-13) (1 row) ...and restored a dump there; here's the explain analyze of the query for 8.0.3: QUERY PLAN --- Sort (cost=5193.63..5193.63 rows=3 width=16) (actual time=7365.107..7365.110 rows=3 loops=1) Sort Key: source."position" -> HashAggregate (cost=5193.59..5193.60 rows=3 width=16) (actual time=7365.034..7365.041 rows=3 loops=1) -> Nested Loop (cost=0.00..5193.57 rows=3 width=16) (actual time=3190.642..7300.820 rows=11086 loops=1) -> Nested Loop (cost=0.00..3602.44 rows=4 width=20) (actual time=3169.968..5875.153 rows=11087 loops=1) -> Nested Loop (cost=0.00..1077.95 rows=750 width=16) (actual time=36.599..2778.129 rows=158288 loops=1) -> Seq Scan on handy_java source (cost=0.00..1.03 rows=3 width=14) (actual time=6.503..6.514 rows=3 loops=1) -> Index Scan using idx02_performance on answer (cost=0.00..355.85 rows=250 width=8) (actual time=10.071..732.746 rows=52763 loops=3) Index Cond: ((answer.question_id = 16) AND (answer.value = "outer".id)) -> Index Scan using pk_participant on participant (cost=0.00..3.35 rows=1 width=4) (actual time=0.016..0.016 rows=0 loops=158288) Index Cond: (participant.session_id = "outer".session_id) Filter: ((status = 1) AND (date_trunc('month'::text, created) = date_trunc('month'::text, (now() - '2 mons'::interval -> Index Scan using idx_answer_session_id on answer (cost=0.00..397.77 rows=1 width=4) (actual time=0.080..0.122 rows=1 loops=11087) Index Cond: ("outer".session_id = answer.session_id) Filter: ((question_id = 6) AND (value = 1)) Total runtime: 7365.461 ms (16 rows) Does this tell you anything useful? It's not on the same machine, mind you, but configuration for PostgreSQL is absolutely identical (apart from the autovacuum-lines which 8.0.3 doesn't like). Kind regards Markus
Re: [PERFORM] Queries taking ages in PG 8.1, have been much faster in PG<=8.0
"Markus Wollny" <[EMAIL PROTECTED]> writes: >> Once you're not under deadline, >> I'd like to investigate more closely to find out why 8.1 does >> worse than 8.0 here. > Does this tell you anything useful? It's not on the same machine, mind > you, but configuration for PostgreSQL is absolutely identical (apart > from the autovacuum-lines which 8.0.3 doesn't like). The data is not quite the same, right? I notice different numbers of rows being returned. But anyway, it seems the problem is with the upper scan on "answers", which 8.0 does like this: -> Index Scan using idx_answer_session_id on answer (cost=0.00..397.77 rows=1 width=4) (actual time=0.080..0.122 rows=1 loops=11087) Index Cond: ("outer".session_id = answer.session_id) Filter: ((question_id = 6) AND (value = 1)) and 8.1 does like this: -> Bitmap Heap Scan on answer (cost=185.85..187.26 rows=1 width=4) (actual time=197.490..197.494 rows=1 loops=9806) Recheck Cond: (("outer".session_id = answer.session_id) AND (answer.question_id = 6) AND (answer.value = 1)) -> BitmapAnd (cost=185.85..185.85 rows=1 width=0) (actual time=197.421..197.421 rows=0 loops=9806) -> Bitmap Index Scan on idx_answer_session_id (cost=0.00..2.83 rows=236 width=0) (actual time=0.109..0.109 rows=49 loops=9806) Index Cond: ("outer".session_id = answer.session_id) -> Bitmap Index Scan on idx02_performance (cost=0.00..182.77 rows=20629 width=0) (actual time=195.742..195.742 rows=165697 loops=9806) Index Cond: ((question_id = 6) AND (value = 1)) It seems that checking question_id/value via the index, rather than directly on the fetched tuple, is a net loss here. It looks like 8.1 would have made the right plan choice if it had made a better estimate of the combined selectivity of the question_id and value conditions, so ultimately this is another manifestation of the lack of cross-column statistics. What I find interesting though is that the plain index scan in 8.0 is so enormously cheaper than it's estimated to be. Perhaps the answer table in your 8.0 installation is almost perfectly ordered by session_id? Are you using default values for the planner cost parameters? It looks like reducing random_page_cost would help bring the planner estimates into line with reality on your machines. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Faster db architecture for a twisted table.
On Sat, 3 Dec 2005 23:00:21 +, Rodrigo Madera <[EMAIL PROTECTED]> wrote: > Imagine a table named Person with "first_name" and "age". > > Now let's make it fancy and put a "mother" and "father" field that is > a reference to the own table (Person). And to get even fuzzier, let's > drop in some siblings: > > CREATE TABLE person( >id bigint PRIMARY KEY, >first_name TEXT, >age INT, >mother bigint REFERENCES person, >father biging REFERENCES person, >siblings array of bigints (don't remember the syntax, but you get the > point) > ); > > Well, this is ok, but imagine a search for "brothers of person id > 34". We would have to search inside the record's 'siblings' array. Is > this a bad design? is this going to be slow? Do you need the array at all? alter table person add column gender; select id >from person where gender = 'male' and (mother = (select mother from person where id = 34) OR father = (select father from person where id = 34)) You can change the OR depending if you want half brothers or not > What would be a better design to have these kind of relationships? > (where you need several references to rows inside the table we are). We use that structure (without the sibiling array) for our systems. Siblings are calculated from parents (in our case, livestock, there can be hundreds). You have to be prepared to use recursive functions and make sure that a person doesnt appear anywhere higher in their family tree. klint. +---+-+ : Klint Gore: "Non rhyming: : EMail : [EMAIL PROTECTED] : slang - the: : Snail : A.B.R.I.: possibilities : : Mail University of New England : are useless" : : Armidale NSW 2351 Australia : L.J.J. : : Fax : +61 2 6772 5376 : : +---+-+ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Dividing up a single 250GB RAID10 server for postgres
Hi. We have a server provided for a test of a web application with the following specifications: 1 Dual core 1.8GHz Opteron chip 6 GB RAM approx 250GB of RAID10 storage (LSI card + BBU, 4 x 15000 RPM,16MB Cache SCSI disks) The database itself is very unlikely to use up more than 50GB of storage -- however we are going to be storing pictures and movies etc etc on the server. I understand that it is better to run pg_xlog on a separate spindle but we don't have that option available at present. Normally I use ext3. I wondered if I should make my normal partitions and then make a +/- 200GB LVM VG and then slice that initially into a 100GB ext3 data directory and a 50GB xfs postgres data area, giving 100GB to use between these as they grow. I haven't used LVM with xfs before, however. Advice gratefully received. Rory ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings