[PERFORM] How many Cluster database on a single server

2011-10-19 Thread d.davo...@mastertraining.it

Hi everybody,
I googled a bit around and also checked the mailing lists but I still 
can't make an idea. We plan to use postgres 9 and the Cluster Database 
Replica.
My colleagues are asking how many Cluster Databases (initdb) can I 
create and run on a single server. I mean, supposed my server has the 
resources, can I create 100 or even 200 Cluster Databases? Everyone with 
the right configuration and in respect of the requisites?

Or the postgres architecture doesn't provide similar numbers?
We are thinking to use the replica from near 200 databases around the 
internet on a single db server.

Does anyone already did something like this?

BTW, this is my first email to postgresql mailing list. If I'm doing 
something wrong do not hesitate to correct me :)


Thanks
Davo

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Inner Join - Explicit vs Implicit Join Performance

2011-10-19 Thread Gnanakumar
Hi,

In PostgreSQL, is there any performance difference between queries written
using "explicit join notation" vs "implicit join notation" in complex
queries?

EXAMPLE: Simple "explicit join notation"
SELECT *
FROM   employee INNER JOIN department
ON employee.DepartmentID = department.DepartmentID;

EXAMPLE: Simple "implicit join notation"
SELECT *
FROM   employee, department 
WHERE  employee.DepartmentID = department.DepartmentID;

Regards,
Gnanam



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inner Join - Explicit vs Implicit Join Performance

2011-10-19 Thread Pavel Stehule
Hello

no, there is no difference - you can check it via EXPLAIN statement

Regards

Pavel Stehule

2011/10/19 Gnanakumar :
> Hi,
>
> In PostgreSQL, is there any performance difference between queries written
> using "explicit join notation" vs "implicit join notation" in complex
> queries?
>
> EXAMPLE: Simple "explicit join notation"
> SELECT *
> FROM   employee INNER JOIN department
> ON     employee.DepartmentID = department.DepartmentID;
>
> EXAMPLE: Simple "implicit join notation"
> SELECT *
> FROM   employee, department
> WHERE  employee.DepartmentID = department.DepartmentID;
>
> Regards,
> Gnanam
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How many Cluster database on a single server

2011-10-19 Thread Craig James

On 10/19/11 2:46 AM, d.davo...@mastertraining.it wrote:

Hi everybody,
I googled a bit around and also checked the mailing lists but I still can't 
make an idea. We plan to use postgres 9 and the Cluster Database Replica.
My colleagues are asking how many Cluster Databases (initdb) can I create and 
run on a single server. I mean, supposed my server has the resources, can I 
create 100 or even 200 Cluster Databases? Everyone with the right configuration 
and in respect of the requisites?
Or the postgres architecture doesn't provide similar numbers?
We are thinking to use the replica from near 200 databases around the internet 
on a single db server.

You don't need to do initdb on each one. Postgres can create many databases on 
a single server and manage them without difficulty.

We currently operate about 300 databases on a single server.  Most are small, and 
one is an aggregate of all the small ones.  I believe there are sites that have 
>1000 separate databases on one server.

Postgres has a slightly different concept of a "database" than Oracle or MySQL, 
which is why your question about initdb is slightly off.  You can indeed create several 
separate instances of Postgres (separate initdb for each), but the only reason you ever 
need to do that is if you're running different versions of Postgres (like 8.4 and 9.0) 
simultaneously.

Postgres runs into problems when the total number of objects (tables, views, sequences, ...) across 
all databases gets very large, where "very large" is ill defined but is somewhere between 
a few hundred thousand and a million.  We once had a rogue process that created 5 million tables, 
and we had to completely abandon the installation because of some sort of N^2 phenomenon that made 
it impossible to even use pg_dump to save and restore the system.  So the advice is, "don't do 
dumb stuff like that" and you should be able to manage many databases.

Craig


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Rapidly finding maximal rows

2011-10-19 Thread Merlin Moncure
On Tue, Oct 11, 2011 at 8:05 PM, Dave Crooke  wrote:
> Hi James
>
>
> I'm guessing the problem is that the combination of using a view and the way
> the view is defined with an in-line temporary table is too complex for the
> planner to introspect into, transform and figure out the equivalent direct
> query, and so it's creating that entire temporary table every time you
> evaluate the select.
>
> Our app has some similar queries (get me the most recent row from a data
> logging table) and these work fine with a simple self-join, like this
> example (irrelevant columns omitted for discussion)
>
> select t.entity, t.time_stamp, t.data from log_table t
> where t.entity=cast('21EC2020-3AEA-1069-A2DD-08002B30309D' as uuid)
> and t.time_stamp=
>    (select max(time_stamp)
>     from log_table u
>     where t.entity=u.entity)
>
> given a schema with the obvious indexes ...
>
> create table log_table
>    (entity UUID,
>     time_stamp TIMESTAMP WITHOUT TIME ZONE,
>     data TEXT);
>
> create index log_table_index on log_table (entity, time_stamp);
>
> .. and the plan for the dependent sub-query does the obvious reverse index
> scan as you'd expect / want.
>
>
>
> If you still want / need to have the view, I suspect that getting rid of the
> temp table definition will fix it ... my effort is below, alternatively you
> might be able to take your first example and pull out best_scores and define
> it as a view alos,
>
> CREATE VIEW best_in_school_method3 AS
>   SELECT competition_name, academic_year_beginning, centre_number, entry_id,
> total_score, (true) AS best_in_school FROM challenge_entries ce1
>   WHERE total_score =
>   (SELECT MAX(total_score) FROM challenge_entries ce2
>    WHERE ce1.competition_name=ce2.competition_name
>  AND ce1.academic_year_beginning=ce2.academic_year_beginning
>  AND ce1.centre_number=ce2.centre_number
>   )

This is a very common problem in SQL and has a lot of interesting solutions.

In queries like this I usually use the 'ORDER BY total_score DESC
LIMIT 1 trick.  Modern postgres is *usually* smart enough to convert
max to that, but not always.

WHERE total_score =
   (SELECT total_score FROM challenge_entries ce2
    WHERE ce1.competition_name=ce2.competition_name
  AND ce1.academic_year_beginning=ce2.academic_year_beginning
  AND ce1.centre_number=ce2.centre_number
  ORDER BY total_score DESC LIMIT 1
   )

Another clever, and more portable way to write it which can sometimes
give a better plan is like this:

WHERE NOT EXISTS
(
  SELECT 1 FROM challenge_entries ce2
WHERE  ce1.competition_name=ce2.competition_name
  AND ce1.academic_year_beginning=ce2.academic_year_beginning
  AND ce1.centre_number=ce2.centre_number
  AND ce1.total_score > ce2.total_score
)

Yet another interesting way of getting the 'top' record based on a
certain criteria is to write a custom aggregate.  In postgres you can
aggregate over the entire record type, not just plain fields, so that
running your aggregate function looks like this:

SELECT competition_name, academic_year_beginning, centre_number,
max_challenge_entries(ce) FROM challenge_entries ce GROUP BY 1,2,3;

Your function aggregator in SQL would look something like this:

CREATE OR REPLACE FUNCTION
max_challenge_entries_impl(challenge_entries, challenge_entries)
returns challenge_entries  AS
$$
  SELECT CASE WHEN ($2).total_score > ($1).total_score THEN $2 ELSE $1 END;
$$ LANGUAGE SQL IMMUTABLE;

This very STLish approach is rarely the best way to go performance
wise although it can give better worst case plans in some cases
(although total_score if in index can never be used for optimization).
 I mention it because I find it to be very clean conceptually and can
be a great approach if your 'picking' algorithm is sufficiently more
complex than 'field > field' and is also otherwise not optimizable.
Anyways, food for thought.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How many Cluster database on a single server

2011-10-19 Thread d.davo...@mastertraining.it

Hi Craig,
thanks for your reply. I think I need to add some details on my 
question, like why we would need more than one Cluster Database. We are 
thinking to use the Streaming Replica feature to keep in sync a number 
of little DB servers around the net. The replica should happen on one or 
more centralized servers. I didn't tested the replica personally bus as 
I can see, it syncs the whole Cluster DB. So, on the centralized 
server(s), we will have perfect copies of the Cluster Databases.
We sure need to test this configuration but first of all I was wondering 
if there are known drawbacks.

Thanks again.


On 10/19/2011 03:54 PM, Craig James wrote:

On 10/19/11 2:46 AM, d.davo...@mastertraining.it wrote:

Hi everybody,
I googled a bit around and also checked the mailing lists but I still 
can't make an idea. We plan to use postgres 9 and the Cluster 
Database Replica.
My colleagues are asking how many Cluster Databases (initdb) can I 
create and run on a single server. I mean, supposed my server has the 
resources, can I create 100 or even 200 Cluster Databases? Everyone 
with the right configuration and in respect of the requisites?

Or the postgres architecture doesn't provide similar numbers?
We are thinking to use the replica from near 200 databases around the 
internet on a single db server.
You don't need to do initdb on each one. Postgres can create many 
databases on a single server and manage them without difficulty.


We currently operate about 300 databases on a single server.  Most are 
small, and one is an aggregate of all the small ones.  I believe there 
are sites that have >1000 separate databases on one server.


Postgres has a slightly different concept of a "database" than Oracle 
or MySQL, which is why your question about initdb is slightly off.  
You can indeed create several separate instances of Postgres (separate 
initdb for each), but the only reason you ever need to do that is if 
you're running different versions of Postgres (like 8.4 and 9.0) 
simultaneously.


Postgres runs into problems when the total number of objects (tables, 
views, sequences, ...) across all databases gets very large, where 
"very large" is ill defined but is somewhere between a few hundred 
thousand and a million.  We once had a rogue process that created 5 
million tables, and we had to completely abandon the installation 
because of some sort of N^2 phenomenon that made it impossible to even 
use pg_dump to save and restore the system.  So the advice is, "don't 
do dumb stuff like that" and you should be able to manage many databases.


Craig





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How many Cluster database on a single server

2011-10-19 Thread Kevin Grittner
"d.davo...@mastertraining.it"  wrote:
 
> We are thinking to use the Streaming Replica feature to keep in
> sync a number of little DB servers around the net. The replica
> should happen on one or more centralized servers. I didn't tested
> the replica personally bus as I can see, it syncs the whole
> Cluster DB. So, on the centralized server(s), we will have perfect
> copies of the Cluster Databases. We sure need to test this
> configuration but first of all I was wondering if there are known
> drawbacks.
 
We do something very much like this with about 100 standby database
clusters on a single machine.  We don't have any illusion that we
could switch to one of these for a normal production load and have
good performance with all of these competing for resources -- it's
primarily to confirm that the PITR backup process is working and
staying up to date, and to provide a quick source for a copy to a
standby production server.
 
The one thing I would strongly recommend is that you use a separate
OS user as the owner of each cluster's data directory (and, of
course, to run the cluster's service).  We didn't initially do this,
and had problems on recovery when the server crashed.  If you search
the archives you can probably dig up all the details on why this is
an issue and why separate users is a good solution; but really, this
is important.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How many Cluster database on a single server

2011-10-19 Thread Scott Marlowe
On Wed, Oct 19, 2011 at 9:02 AM, d.davo...@mastertraining.it
 wrote:
> Hi Craig,
> thanks for your reply. I think I need to add some details on my question,
> like why we would need more than one Cluster Database. We are thinking to
> use the Streaming Replica feature to keep in sync a number of little DB
> servers around the net. The replica should happen on one or more centralized
> servers. I didn't tested the replica personally bus as I can see, it syncs
> the whole Cluster DB. So, on the centralized server(s), we will have perfect
> copies of the Cluster Databases.
> We sure need to test this configuration but first of all I was wondering if
> there are known drawbacks.

The problem with having so many clusters on one machine is the shared
memory that each one needs.  Even with a relatively small shared
memory segment of say 16MB, with 100 clusters you're going to be using
1600MB of memory on that machine for shared memory.

You might be better off with one cluster and using slony to replicate
just the parts that need replication.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] does update of column with no relation imply a relation check of other column?

2011-10-19 Thread Anibal David Acosta
For example:

Table A

-id (PK)

-name

 

Table B

-table_a_id (PK, FK)

-address 

 

When I do an insert on table B, the database check if value for column
"table_a_id" exists in table A

But, if I do an update of column "address" of table B, does the database
check again?

 

My question is due to the nature of and update in postgres, that basically
is a new version "insert".

 

Thanks



Re: [PERFORM] does update of column with no relation imply a relation check of other column?

2011-10-19 Thread Greg Jaskiewicz

On 19 Oct 2011, at 17:51, Anibal David Acosta wrote:

> For example:
> Table A
> -id (PK)
> -name
>  
> Table B
> -table_a_id (PK, FK)
> -address
>  
> When I do an insert on table B, the database check if value for column 
> “table_a_id” exists in table A
> But, if I do an update of column “address” of table B, does the database 
> check again?
>  
> My question is due to the nature of and update in postgres, that basically is 
> a new version “insert”.


In short - I believe it does. No reason for it not to. 



[PERFORM] delete/recreate indexes

2011-10-19 Thread alan
Hi

I'm a postgres novice so 

I have this fairly simple table
-
 device integer not null,
 group integer not null,
 datum timestamp without time zone not null,
 val1 numeric(7,4) not null default 0.000,
 val2 numeric(7,4) not null default 0.000
-

The device column is a foreign-key to the PK of my device table.
and I have a UNIQUE INDEX on 3 columns device, group, datum

This is just a test database and I want to keep the "datum" values
(timestamps that span approx 1 month) all bound to CURRENT_DATE().

So I thought I’d just run this once (via cron) every morning.
BEGIN;
DROP INDEX data_unique;
UPDATE data SET datum = (data.datum + interval '24 hours');
CREATE UNIQUE INDEX data_unique ON public.data USING BTREE
(device, group, datum);
COMMIT;

But
1.  it’s taking forever and
2.  I’m seeing that my disk is filling up real fast.

Any suggestions?

Alan

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] disused indexes and performance?

2011-10-19 Thread Elisa
I recently ran a query against our production database and saw several
disused indexes.  Is there a performance harm in having disused
indexes out there?

Of course, I will be checking our code base and with developers to
ensure that these indexes aren't being used programmatically to
prevent redundant inserts and the like. ..

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Heavy contgnous load

2011-10-19 Thread kzsolt
"try to have as few indexes as possible and do your inserts inside
transactions in batches, rather than one-by-one as individual statements. "
That is the main problem. I use now few index as possible. Unfortunately the
one-by-one INSERT is nature of or system. To join (batch) inserts is require
spacial cache with inactivity timeout. But this timout are make more trouble
for our application. The flat file has same problem. 

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Heavy-contgnous-load-tp4913425p4919006.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Heavy contgnous load

2011-10-19 Thread kzsolt

Kevin Grittner wrote:
> Rollback activity?  What rollback activity?  When you're doing what?
> What is the exact message?
I mean here some kind of option to save reources. 
For example mysql has table (storage) type where no transaction support
(rollback) in. This make the all server faster and use less resources. 

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Heavy-contgnous-load-tp4913425p4919050.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] disused indexes and performance?

2011-10-19 Thread Scott Marlowe
On Wed, Oct 19, 2011 at 9:24 AM, Elisa  wrote:
> I recently ran a query against our production database and saw several
> disused indexes.  Is there a performance harm in having disused
> indexes out there?

Sure there is, they'll slow down writes and use more disk space.

> Of course, I will be checking our code base and with developers to
> ensure that these indexes aren't being used programmatically to
> prevent redundant inserts and the like. ..

Unique indexes are usually a small set compared to extra non-unique
indexes.  Also, some indexes may be seldomly used by make a very big
difference on the rare occasion they are used, for instance partial or
functional or full test indexes.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] delete/recreate indexes

2011-10-19 Thread Jeff Davis
On Wed, 2011-10-19 at 08:03 -0700, alan wrote:
> So I thought I’d just run this once (via cron) every morning.
> BEGIN;
> DROP INDEX data_unique;
> UPDATE data SET datum = (data.datum + interval '24 hours');
> CREATE UNIQUE INDEX data_unique ON public.data USING BTREE
> (device, group, datum);
> COMMIT;
> 
> But
> 1.it’s taking forever and
> 2.I’m seeing that my disk is filling up real fast.

An unrestricted update will end up rewriting the whole table. It's
advisable to run VACUUM afterward, so that the wasted space can be
reclaimed. What version are you on? Do you have autovacuum enabled?

Also, to take a step back, why do you try to keep the timestamps
changing like that? Why not store the information you need in the record
(e.g. insert time as well as the datum) and then compute the result you
need using a SELECT (or make it a view for convenience)? Fundamentally,
these records aren't changing, you are just trying to interpret them in
the context of the current day. That should be done using a SELECT, not
an UPDATE.

Regards,
Jeff Davis



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How many Cluster database on a single server

2011-10-19 Thread Craig Ringer

On 10/19/2011 05:46 PM, d.davo...@mastertraining.it wrote:


My colleagues are asking how many Cluster Databases (initdb) can I
create and run on a single server. I mean, supposed my server has the
resources, can I create 100 or even 200 Cluster Databases?


Yep. It won't be fast, but it'll work.

You'll have two performance problems to deal with:

- The memory, CPU and disk I/O overhead of all those extra postmasters, 
bgwriters, autovacuum daemons etc running for each cluster; and


- having to split the available shared memory up between each cluster, 
so no single cluster gets very much shared memory to use for shared_buffers.


If you keep your shared_buffers low, it should work just fine, but it 
won't perform as well as a single PostgreSQL cluster with lots of databases.


In the future I'm hoping someone'll be enthusiastic enough to / need to 
add support split WAL logging or partial replication so this sort of 
thing isn't necessary. For now it does seem to be the best way to handle 
cases where different databases need different replication.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Heavy contgnous load

2011-10-19 Thread Craig Ringer

On 10/20/2011 02:55 AM, kzsolt wrote:

"try to have as few indexes as possible and do your inserts inside
transactions in batches, rather than one-by-one as individual statements. "
That is the main problem. I use now few index as possible. Unfortunately the
one-by-one INSERT is nature of or system. To join (batch) inserts is require
spacial cache with inactivity timeout. But this timout are make more trouble
for our application. The flat file has same problem.


Well, then you'll have to use an unlogged table (9.1 or newer only) to 
insert into, then periodically copy rows from the unlogged table into 
the main table using something like PgAgent to schedule the copy.


An unlogged table is a tiny bit more like MySQL's MyISAM tables in that 
it doesn't have any crash recovery features. It still supports 
transactions, of course, and you won't find any way to remove 
transaction support in PostgreSQL. One of the reasons MySQL has 
historically had so many bizarre behaviours, like (by default) writing 
invalid data as NULL, inserting zeroes for invalid dates, etc is because 
MyISAM can't roll back transactions when it discovers a problem partway 
through, so it has to finish the job badly rather than error out and 
leave the job half-completed.


If you really need absolutely maximum insert performance, you should 
just use a flat file or a different database system. Relational 
databases like PostgreSQL are designed for reliable concurrency, crash 
safety, fast querying, and data integrity, and they provide those at the 
cost of slower data insertion among other things.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance