[PERFORM] How many Cluster database on a single server
Hi everybody, I googled a bit around and also checked the mailing lists but I still can't make an idea. We plan to use postgres 9 and the Cluster Database Replica. My colleagues are asking how many Cluster Databases (initdb) can I create and run on a single server. I mean, supposed my server has the resources, can I create 100 or even 200 Cluster Databases? Everyone with the right configuration and in respect of the requisites? Or the postgres architecture doesn't provide similar numbers? We are thinking to use the replica from near 200 databases around the internet on a single db server. Does anyone already did something like this? BTW, this is my first email to postgresql mailing list. If I'm doing something wrong do not hesitate to correct me :) Thanks Davo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Inner Join - Explicit vs Implicit Join Performance
Hi, In PostgreSQL, is there any performance difference between queries written using "explicit join notation" vs "implicit join notation" in complex queries? EXAMPLE: Simple "explicit join notation" SELECT * FROM employee INNER JOIN department ON employee.DepartmentID = department.DepartmentID; EXAMPLE: Simple "implicit join notation" SELECT * FROM employee, department WHERE employee.DepartmentID = department.DepartmentID; Regards, Gnanam -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Inner Join - Explicit vs Implicit Join Performance
Hello no, there is no difference - you can check it via EXPLAIN statement Regards Pavel Stehule 2011/10/19 Gnanakumar : > Hi, > > In PostgreSQL, is there any performance difference between queries written > using "explicit join notation" vs "implicit join notation" in complex > queries? > > EXAMPLE: Simple "explicit join notation" > SELECT * > FROM employee INNER JOIN department > ON employee.DepartmentID = department.DepartmentID; > > EXAMPLE: Simple "implicit join notation" > SELECT * > FROM employee, department > WHERE employee.DepartmentID = department.DepartmentID; > > Regards, > Gnanam > > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How many Cluster database on a single server
On 10/19/11 2:46 AM, d.davo...@mastertraining.it wrote: Hi everybody, I googled a bit around and also checked the mailing lists but I still can't make an idea. We plan to use postgres 9 and the Cluster Database Replica. My colleagues are asking how many Cluster Databases (initdb) can I create and run on a single server. I mean, supposed my server has the resources, can I create 100 or even 200 Cluster Databases? Everyone with the right configuration and in respect of the requisites? Or the postgres architecture doesn't provide similar numbers? We are thinking to use the replica from near 200 databases around the internet on a single db server. You don't need to do initdb on each one. Postgres can create many databases on a single server and manage them without difficulty. We currently operate about 300 databases on a single server. Most are small, and one is an aggregate of all the small ones. I believe there are sites that have >1000 separate databases on one server. Postgres has a slightly different concept of a "database" than Oracle or MySQL, which is why your question about initdb is slightly off. You can indeed create several separate instances of Postgres (separate initdb for each), but the only reason you ever need to do that is if you're running different versions of Postgres (like 8.4 and 9.0) simultaneously. Postgres runs into problems when the total number of objects (tables, views, sequences, ...) across all databases gets very large, where "very large" is ill defined but is somewhere between a few hundred thousand and a million. We once had a rogue process that created 5 million tables, and we had to completely abandon the installation because of some sort of N^2 phenomenon that made it impossible to even use pg_dump to save and restore the system. So the advice is, "don't do dumb stuff like that" and you should be able to manage many databases. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Rapidly finding maximal rows
On Tue, Oct 11, 2011 at 8:05 PM, Dave Crooke wrote: > Hi James > > > I'm guessing the problem is that the combination of using a view and the way > the view is defined with an in-line temporary table is too complex for the > planner to introspect into, transform and figure out the equivalent direct > query, and so it's creating that entire temporary table every time you > evaluate the select. > > Our app has some similar queries (get me the most recent row from a data > logging table) and these work fine with a simple self-join, like this > example (irrelevant columns omitted for discussion) > > select t.entity, t.time_stamp, t.data from log_table t > where t.entity=cast('21EC2020-3AEA-1069-A2DD-08002B30309D' as uuid) > and t.time_stamp= > (select max(time_stamp) > from log_table u > where t.entity=u.entity) > > given a schema with the obvious indexes ... > > create table log_table > (entity UUID, > time_stamp TIMESTAMP WITHOUT TIME ZONE, > data TEXT); > > create index log_table_index on log_table (entity, time_stamp); > > .. and the plan for the dependent sub-query does the obvious reverse index > scan as you'd expect / want. > > > > If you still want / need to have the view, I suspect that getting rid of the > temp table definition will fix it ... my effort is below, alternatively you > might be able to take your first example and pull out best_scores and define > it as a view alos, > > CREATE VIEW best_in_school_method3 AS > SELECT competition_name, academic_year_beginning, centre_number, entry_id, > total_score, (true) AS best_in_school FROM challenge_entries ce1 > WHERE total_score = > (SELECT MAX(total_score) FROM challenge_entries ce2 > WHERE ce1.competition_name=ce2.competition_name > AND ce1.academic_year_beginning=ce2.academic_year_beginning > AND ce1.centre_number=ce2.centre_number > ) This is a very common problem in SQL and has a lot of interesting solutions. In queries like this I usually use the 'ORDER BY total_score DESC LIMIT 1 trick. Modern postgres is *usually* smart enough to convert max to that, but not always. WHERE total_score = (SELECT total_score FROM challenge_entries ce2 WHERE ce1.competition_name=ce2.competition_name AND ce1.academic_year_beginning=ce2.academic_year_beginning AND ce1.centre_number=ce2.centre_number ORDER BY total_score DESC LIMIT 1 ) Another clever, and more portable way to write it which can sometimes give a better plan is like this: WHERE NOT EXISTS ( SELECT 1 FROM challenge_entries ce2 WHERE ce1.competition_name=ce2.competition_name AND ce1.academic_year_beginning=ce2.academic_year_beginning AND ce1.centre_number=ce2.centre_number AND ce1.total_score > ce2.total_score ) Yet another interesting way of getting the 'top' record based on a certain criteria is to write a custom aggregate. In postgres you can aggregate over the entire record type, not just plain fields, so that running your aggregate function looks like this: SELECT competition_name, academic_year_beginning, centre_number, max_challenge_entries(ce) FROM challenge_entries ce GROUP BY 1,2,3; Your function aggregator in SQL would look something like this: CREATE OR REPLACE FUNCTION max_challenge_entries_impl(challenge_entries, challenge_entries) returns challenge_entries AS $$ SELECT CASE WHEN ($2).total_score > ($1).total_score THEN $2 ELSE $1 END; $$ LANGUAGE SQL IMMUTABLE; This very STLish approach is rarely the best way to go performance wise although it can give better worst case plans in some cases (although total_score if in index can never be used for optimization). I mention it because I find it to be very clean conceptually and can be a great approach if your 'picking' algorithm is sufficiently more complex than 'field > field' and is also otherwise not optimizable. Anyways, food for thought. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How many Cluster database on a single server
Hi Craig, thanks for your reply. I think I need to add some details on my question, like why we would need more than one Cluster Database. We are thinking to use the Streaming Replica feature to keep in sync a number of little DB servers around the net. The replica should happen on one or more centralized servers. I didn't tested the replica personally bus as I can see, it syncs the whole Cluster DB. So, on the centralized server(s), we will have perfect copies of the Cluster Databases. We sure need to test this configuration but first of all I was wondering if there are known drawbacks. Thanks again. On 10/19/2011 03:54 PM, Craig James wrote: On 10/19/11 2:46 AM, d.davo...@mastertraining.it wrote: Hi everybody, I googled a bit around and also checked the mailing lists but I still can't make an idea. We plan to use postgres 9 and the Cluster Database Replica. My colleagues are asking how many Cluster Databases (initdb) can I create and run on a single server. I mean, supposed my server has the resources, can I create 100 or even 200 Cluster Databases? Everyone with the right configuration and in respect of the requisites? Or the postgres architecture doesn't provide similar numbers? We are thinking to use the replica from near 200 databases around the internet on a single db server. You don't need to do initdb on each one. Postgres can create many databases on a single server and manage them without difficulty. We currently operate about 300 databases on a single server. Most are small, and one is an aggregate of all the small ones. I believe there are sites that have >1000 separate databases on one server. Postgres has a slightly different concept of a "database" than Oracle or MySQL, which is why your question about initdb is slightly off. You can indeed create several separate instances of Postgres (separate initdb for each), but the only reason you ever need to do that is if you're running different versions of Postgres (like 8.4 and 9.0) simultaneously. Postgres runs into problems when the total number of objects (tables, views, sequences, ...) across all databases gets very large, where "very large" is ill defined but is somewhere between a few hundred thousand and a million. We once had a rogue process that created 5 million tables, and we had to completely abandon the installation because of some sort of N^2 phenomenon that made it impossible to even use pg_dump to save and restore the system. So the advice is, "don't do dumb stuff like that" and you should be able to manage many databases. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How many Cluster database on a single server
"d.davo...@mastertraining.it" wrote: > We are thinking to use the Streaming Replica feature to keep in > sync a number of little DB servers around the net. The replica > should happen on one or more centralized servers. I didn't tested > the replica personally bus as I can see, it syncs the whole > Cluster DB. So, on the centralized server(s), we will have perfect > copies of the Cluster Databases. We sure need to test this > configuration but first of all I was wondering if there are known > drawbacks. We do something very much like this with about 100 standby database clusters on a single machine. We don't have any illusion that we could switch to one of these for a normal production load and have good performance with all of these competing for resources -- it's primarily to confirm that the PITR backup process is working and staying up to date, and to provide a quick source for a copy to a standby production server. The one thing I would strongly recommend is that you use a separate OS user as the owner of each cluster's data directory (and, of course, to run the cluster's service). We didn't initially do this, and had problems on recovery when the server crashed. If you search the archives you can probably dig up all the details on why this is an issue and why separate users is a good solution; but really, this is important. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How many Cluster database on a single server
On Wed, Oct 19, 2011 at 9:02 AM, d.davo...@mastertraining.it wrote: > Hi Craig, > thanks for your reply. I think I need to add some details on my question, > like why we would need more than one Cluster Database. We are thinking to > use the Streaming Replica feature to keep in sync a number of little DB > servers around the net. The replica should happen on one or more centralized > servers. I didn't tested the replica personally bus as I can see, it syncs > the whole Cluster DB. So, on the centralized server(s), we will have perfect > copies of the Cluster Databases. > We sure need to test this configuration but first of all I was wondering if > there are known drawbacks. The problem with having so many clusters on one machine is the shared memory that each one needs. Even with a relatively small shared memory segment of say 16MB, with 100 clusters you're going to be using 1600MB of memory on that machine for shared memory. You might be better off with one cluster and using slony to replicate just the parts that need replication. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] does update of column with no relation imply a relation check of other column?
For example: Table A -id (PK) -name Table B -table_a_id (PK, FK) -address When I do an insert on table B, the database check if value for column "table_a_id" exists in table A But, if I do an update of column "address" of table B, does the database check again? My question is due to the nature of and update in postgres, that basically is a new version "insert". Thanks
Re: [PERFORM] does update of column with no relation imply a relation check of other column?
On 19 Oct 2011, at 17:51, Anibal David Acosta wrote: > For example: > Table A > -id (PK) > -name > > Table B > -table_a_id (PK, FK) > -address > > When I do an insert on table B, the database check if value for column > “table_a_id” exists in table A > But, if I do an update of column “address” of table B, does the database > check again? > > My question is due to the nature of and update in postgres, that basically is > a new version “insert”. In short - I believe it does. No reason for it not to.
[PERFORM] delete/recreate indexes
Hi I'm a postgres novice so I have this fairly simple table - device integer not null, group integer not null, datum timestamp without time zone not null, val1 numeric(7,4) not null default 0.000, val2 numeric(7,4) not null default 0.000 - The device column is a foreign-key to the PK of my device table. and I have a UNIQUE INDEX on 3 columns device, group, datum This is just a test database and I want to keep the "datum" values (timestamps that span approx 1 month) all bound to CURRENT_DATE(). So I thought I’d just run this once (via cron) every morning. BEGIN; DROP INDEX data_unique; UPDATE data SET datum = (data.datum + interval '24 hours'); CREATE UNIQUE INDEX data_unique ON public.data USING BTREE (device, group, datum); COMMIT; But 1. it’s taking forever and 2. I’m seeing that my disk is filling up real fast. Any suggestions? Alan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] disused indexes and performance?
I recently ran a query against our production database and saw several disused indexes. Is there a performance harm in having disused indexes out there? Of course, I will be checking our code base and with developers to ensure that these indexes aren't being used programmatically to prevent redundant inserts and the like. .. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Heavy contgnous load
"try to have as few indexes as possible and do your inserts inside transactions in batches, rather than one-by-one as individual statements. " That is the main problem. I use now few index as possible. Unfortunately the one-by-one INSERT is nature of or system. To join (batch) inserts is require spacial cache with inactivity timeout. But this timout are make more trouble for our application. The flat file has same problem. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Heavy-contgnous-load-tp4913425p4919006.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Heavy contgnous load
Kevin Grittner wrote: > Rollback activity? What rollback activity? When you're doing what? > What is the exact message? I mean here some kind of option to save reources. For example mysql has table (storage) type where no transaction support (rollback) in. This make the all server faster and use less resources. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Heavy-contgnous-load-tp4913425p4919050.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] disused indexes and performance?
On Wed, Oct 19, 2011 at 9:24 AM, Elisa wrote: > I recently ran a query against our production database and saw several > disused indexes. Is there a performance harm in having disused > indexes out there? Sure there is, they'll slow down writes and use more disk space. > Of course, I will be checking our code base and with developers to > ensure that these indexes aren't being used programmatically to > prevent redundant inserts and the like. .. Unique indexes are usually a small set compared to extra non-unique indexes. Also, some indexes may be seldomly used by make a very big difference on the rare occasion they are used, for instance partial or functional or full test indexes. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] delete/recreate indexes
On Wed, 2011-10-19 at 08:03 -0700, alan wrote: > So I thought I’d just run this once (via cron) every morning. > BEGIN; > DROP INDEX data_unique; > UPDATE data SET datum = (data.datum + interval '24 hours'); > CREATE UNIQUE INDEX data_unique ON public.data USING BTREE > (device, group, datum); > COMMIT; > > But > 1.it’s taking forever and > 2.I’m seeing that my disk is filling up real fast. An unrestricted update will end up rewriting the whole table. It's advisable to run VACUUM afterward, so that the wasted space can be reclaimed. What version are you on? Do you have autovacuum enabled? Also, to take a step back, why do you try to keep the timestamps changing like that? Why not store the information you need in the record (e.g. insert time as well as the datum) and then compute the result you need using a SELECT (or make it a view for convenience)? Fundamentally, these records aren't changing, you are just trying to interpret them in the context of the current day. That should be done using a SELECT, not an UPDATE. Regards, Jeff Davis -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How many Cluster database on a single server
On 10/19/2011 05:46 PM, d.davo...@mastertraining.it wrote: My colleagues are asking how many Cluster Databases (initdb) can I create and run on a single server. I mean, supposed my server has the resources, can I create 100 or even 200 Cluster Databases? Yep. It won't be fast, but it'll work. You'll have two performance problems to deal with: - The memory, CPU and disk I/O overhead of all those extra postmasters, bgwriters, autovacuum daemons etc running for each cluster; and - having to split the available shared memory up between each cluster, so no single cluster gets very much shared memory to use for shared_buffers. If you keep your shared_buffers low, it should work just fine, but it won't perform as well as a single PostgreSQL cluster with lots of databases. In the future I'm hoping someone'll be enthusiastic enough to / need to add support split WAL logging or partial replication so this sort of thing isn't necessary. For now it does seem to be the best way to handle cases where different databases need different replication. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Heavy contgnous load
On 10/20/2011 02:55 AM, kzsolt wrote: "try to have as few indexes as possible and do your inserts inside transactions in batches, rather than one-by-one as individual statements. " That is the main problem. I use now few index as possible. Unfortunately the one-by-one INSERT is nature of or system. To join (batch) inserts is require spacial cache with inactivity timeout. But this timout are make more trouble for our application. The flat file has same problem. Well, then you'll have to use an unlogged table (9.1 or newer only) to insert into, then periodically copy rows from the unlogged table into the main table using something like PgAgent to schedule the copy. An unlogged table is a tiny bit more like MySQL's MyISAM tables in that it doesn't have any crash recovery features. It still supports transactions, of course, and you won't find any way to remove transaction support in PostgreSQL. One of the reasons MySQL has historically had so many bizarre behaviours, like (by default) writing invalid data as NULL, inserting zeroes for invalid dates, etc is because MyISAM can't roll back transactions when it discovers a problem partway through, so it has to finish the job badly rather than error out and leave the job half-completed. If you really need absolutely maximum insert performance, you should just use a flat file or a different database system. Relational databases like PostgreSQL are designed for reliable concurrency, crash safety, fast querying, and data integrity, and they provide those at the cost of slower data insertion among other things. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance