Re: [PERFORM] Hardware impact on performances

2005-03-21 Thread Richard Huxton
Camille Chafer wrote:
Hi,
I'm using PostgreSQL 8 for a mmorpg.
The part of each operation is : select: 50%, update: 40%, insert: 10%.
I have no more than 4-5 concurrent connections to the database, but each 
of them does A LOT of queries (several per second).
The database size is about 1GB, but it'll probably be around 2GB in a 
fews months.
The OS will be FreeBSD (version production 5.3 probably, or 4.10)

At this time, i'm looking for a new server. Before to buy it, I grab 
some informations..
So, my question is : what would be the best hardware for this type of 
needs ?
Of course, I'm not asking for a trademark and/or for prices, but for hints.

- What is the most important part of the system : CPU ? RAM ? Disks ?
Usually Disks/RAM. Since you've got a lot of updates/inserts, 
battery-backed write-cache on your raid controller would be good.

- Is a server with 2 or more CPUs much better than a server with a 
single one, for a pgsql database ?
With 2+ connections, each can be serviced by one CPU. Of course, if your 
disk I/O is saturated then it won't help.

- How much RAM do I need ? The size of the data ? Twice the size ?
Ideally, enough to hold your "working set". That is, enough cache to 
store all pages/indexes you regularly access.

- I heard Raid1+0 is better than Raid 5. Is it right ? What would be the 
best configuration, regarding performances and security ?
It can depend - check the list archives for a lot of discussion on this. 
 More disks is always better.

- Does the CPU type (i386, PowerPC, ) matters ?
Dual-Xeons have given problems. A lot of people seem to think 
Opteron-based systems provide good value.

- A lot of queries probably generate a lot of network output. Does the 
network controller matters ?
Well, obviously the more time spent handling network I/O, the less time 
you spend running queries. I'd think it would have to be a *lot* of 
activity to make a serious difference.

- And finally, last question : is it possible to run a single postgresql 
database on several servers ? (hardware clustering)
Not easily, and it probably wouldn't provide any performance benefit. 
Plenty of replication options though.

Thanks in advance for your answers, and sorry for my crap english (i'm 
french).
Your English is perfect.
--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] What about utility to calculate planner cost constants?

2005-03-21 Thread Tambet Matiisen

I was following the cpu_tuple_cost thread and wondering, if it could be
possible to make PQA style utility to calculate configuration-specific
values for planner cost constants. It could make use of output of
log_(statement|parser|planner|executor)_stats, tough I'm not sure if the
output contains anything useful for those purposes. 

Otherwise it could just collect statements, run EXPLAIN ANALYZE for all
of them and then play with planner cost constants to get the estimated
values as close as possible to actual values. Something like Goal Seek
in Excel, if you pardon my reference to MS :).

Somewhat similar project seems to be pgautotune from pgFoundry, but it
only considers shared_buffers, sort_mem and vacuum_mem. And it seems to
use synthetic data instead of actual database and actual statements from
log. And it has been inactive for a while.

  Tambet

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] column name is "LIMIT"

2005-03-21 Thread Jan Wieck
On 3/14/2005 4:26 AM, Qingqing Zhou wrote:
So is it to make SQL parser context-sensitive - say the parser will
understand that in statement "SELECT * from LIMIT", LIMIT is just a table
name, instead of keyword?
More or less, yes. To use a reserved keyword as an identifier (table or 
column name), it must be enclosed in double quotes. Double quotes are 
also used to make identifiers case sensitive. So

select someval, "SOMEVAL", "someVAL" from "user";
is a valid query retrieving 3 distinct columns from the table "user". 
There is a builtin function quote_ident() in PostgreSQL that is supposed 
to return a properly quoted string allowed as an identifier for whatever 
name is passed in. But it fails to do so for all lower case names that 
are reserved keywords.

The queries Slony executes on the replicas are constructed using that 
quoting function, and therefore Slony fails to build valid SQL for 
replicated table containing reserved keyword identifiers. One solution 
would be to brute-force quote all identifiers in Slony ... not sure what 
the side effects performance wise would be.

Jan
--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] column name is "LIMIT"

2005-03-21 Thread Tom Lane
Jan Wieck <[EMAIL PROTECTED]> writes:
> There is a builtin function quote_ident() in PostgreSQL that is supposed 
> to return a properly quoted string allowed as an identifier for whatever 
> name is passed in. But it fails to do so for all lower case names that 
> are reserved keywords.

Not any more ...

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] What about utility to calculate planner cost constants?

2005-03-21 Thread Josh Berkus
Tambet,

> I was following the cpu_tuple_cost thread and wondering, if it could be
> possible to make PQA style utility to calculate configuration-specific
> values for planner cost constants. It could make use of output of
> log_(statement|parser|planner|executor)_stats, tough I'm not sure if the
> output contains anything useful for those purposes.

Yeah, that's something I need to look at.

> Otherwise it could just collect statements, run EXPLAIN ANALYZE for all
> of them and then play with planner cost constants to get the estimated
> values as close as possible to actual values. Something like Goal Seek
> in Excel, if you pardon my reference to MS :).

That's not really practical.   There are currently 5 major query tuning 
parameters, not counting the memory adjustments which really can't be left 
out.  You can't realistically test all combinations of 6 variables.

> Somewhat similar project seems to be pgautotune from pgFoundry, but it
> only considers shared_buffers, sort_mem and vacuum_mem. And it seems to
> use synthetic data instead of actual database and actual statements from
> log. And it has been inactive for a while.

Yeah, pg_autotune is a dead project.   Once we got OSDL able to run tests, we 
came up with some rules-of-thumb which are more useful than autotune's 
output.  More importantly, the approach doesn't scale to the 15-20 GUCs which 
we'd realistically want to test.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-21 Thread Josh Berkus
Stacy,

> Luckily they that had the chance to work with a truly fantastic DBA (the
> author of an Oracle Press performance tuning book even) before they could
> switch back.  He convinced them to make some of their indexes global.
> Performance dramatically improved (compared with both the unpartitioned
> schema, and the partitioned-and-locally-indexed schema), and they've since
> stayed with partitioned tables and a mix of local and global indexes.

Hmmm.  Wouldn't Greg's suggestion of a bitmap index which holds information on 
what values are found in what partition also solve this?Without 1/2 of 
the overhead imposed by global indexes?

I can actually see such a bitmap as being universally useful to the 
partitioning concept ... for one, it would resolve the whole "partition on 
{value}" issue.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] What about utility to calculate planner cost constants?

2005-03-21 Thread Thomas F.O'Connell
If by not practical you mean, "no one has implemented a multivariable 
testing approach," I'll agree with you. But multivariable testing is 
definitely a valid statistical approach to solving just such problems.

-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
On Mar 21, 2005, at 11:51 AM, Josh Berkus wrote:
That's not really practical.   There are currently 5 major query tuning
parameters, not counting the memory adjustments which really can't be 
left
out.  You can't realistically test all combinations of 6 variables.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-21 Thread Jim C. Nasby
On Mon, Mar 21, 2005 at 09:55:03AM -0800, Josh Berkus wrote:
> Stacy,
> 
> > Luckily they that had the chance to work with a truly fantastic DBA (the
> > author of an Oracle Press performance tuning book even) before they could
> > switch back.  He convinced them to make some of their indexes global.
> > Performance dramatically improved (compared with both the unpartitioned
> > schema, and the partitioned-and-locally-indexed schema), and they've since
> > stayed with partitioned tables and a mix of local and global indexes.
> 
> Hmmm.  Wouldn't Greg's suggestion of a bitmap index which holds information 
> on 
> what values are found in what partition also solve this?Without 1/2 of 
> the overhead imposed by global indexes?
> 
> I can actually see such a bitmap as being universally useful to the 
> partitioning concept ... for one, it would resolve the whole "partition on 
> {value}" issue.

I suspect both will have their uses. I've read quite a bit about global
v. local indexs in Oracle, and there are definately cases where global
is much better than local. Granted, there's some things with how Oracle
handles their catalog, etc. that might make local indexes more expensive
for them than they would be for PostgreSQL. It's also not clear how much
a 'partition bitmap' index would help.

As for the 'seqscan individual partitions' argument, that's not going to
work well at all for a case where you need to hit a relatively small
percentage of rows in a relatively large number of partitions. SELECT
... WHERE customer_id = 1 would be a good example of such a query
(assuming the table is partitioned on something like invoice_date).
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-21 Thread Jim C. Nasby
On Sat, Mar 19, 2005 at 07:05:53PM -0500, Tom Lane wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > We probably also need multi-table indexes.
> 
> As Josh says, that seems antithetical to the main point of partitioning,
> which is to be able to rapidly remove (and add) partitions of a table.
> If you have to do index cleaning before you can drop a partition, what's
> the point of partitioning?

Why would you need to do index cleaning first? Presumably the code that
goes to check a heap tuple that an index pointed at to ensure that it
was visible in the current transaction would be able to recognize if the
partition that tuple was in had been removed, and just ignore that index
entry. Granted, you'd need to clean the index up at some point
(presumably via vacuum), but it doesn't need to occur at partition drop
time.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] What about utility to calculate planner cost constants?

2005-03-21 Thread Josh Berkus
Thomas,

> If by not practical you mean, "no one has implemented a multivariable
> testing approach," I'll agree with you. But multivariable testing is
> definitely a valid statistical approach to solving just such problems.

Well, not practical as in:  "would take either $10 million in equipment or 
10,000 hours or both"

--Josh

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-21 Thread Jim C. Nasby
I think Greg's email did a good job of putting this on track. Phase 1
should be manual, low-level type of support. Oracle has had partitioning
for years now, and IF they've added automated partition management, it's
only happened in 10g which is pretty recent.

For inserts that don't currently have a defined partition to fit in, the
Oracle model might be better than tossing an error: a partitioned table
in Oracle also contains a default partition. Any rows that don't match a
defined partition go into the default partition. For many cases you'll
never have anything in the default partition, but sometimes you'll have
some partition values that occur infrequenttly enough in the table so as
not to warrant their own partition.

There's also another partitioning application that I think is often
overlooked. I have a table with about 130M rows that is
'pseudo-partitioned' by project_id. Right now, there are 5 different
project IDs that account for the bulk of those 130M rows. Oracle
provides a means to partition on discreet values. When you do this,
there's not actually any reason to even store the partition field in the
partition tables, since it will be the same for every row in the
partition. In my case, since the smallint project ID is being aligned to
a 4 byte boundary, having this feature would save ~120M rows * 4 bytes =
480MB in the table. Granted, 480MB isn't anything for today's disk
sizes, but it makes a huge difference when you look at it from an I/O
standpoint. Knowing that a partition contains only one value of a field
or set of fields also means you can drop those fields from local indexes
without losing any effectiveness. In my case, I have 2 indexes I could
drop project_id from. Does each node in a B-tree index have the full
index key? If so, then there would be substantial I/O gains to be had
there, as well. Even if each node doesn't store the full key, there
could still be faster to handle a narrower index.

I realize this might be a more difficult case to support. It probably
couldn't be done using inheritance, though I don't know if inheritence
or a union view is better for partitioning. In either case, this case
might not be a good candidate for phase 1, but I think partitioning
should be designed with it in mind.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] What about utility to calculate planner cost constants?

2005-03-21 Thread Greg Stark
Josh Berkus  writes:

> > Otherwise it could just collect statements, run EXPLAIN ANALYZE for all
> > of them and then play with planner cost constants to get the estimated
> > values as close as possible to actual values. Something like Goal Seek
> > in Excel, if you pardon my reference to MS :).
> 
> That's not really practical.   There are currently 5 major query tuning 
> parameters, not counting the memory adjustments which really can't be left 
> out.  You can't realistically test all combinations of 6 variables.

I don't think it would be very hard at all actually.

It's just a linear algebra problem with a bunch of independent variables and a
system of equations. Solving for values for all of them is a straightforward
problem.

Of course in reality these variables aren't actually independent because the
costing model isn't perfect. But that wouldn't be a problem, it would just
reduce the accuracy of the results.

What's needed is for the explain plan to total up the costing penalties
independently. So the result would be something like

1000 * random_page_cost + 101 * sequential_page_cost + 2000 * index_tuple_cost
+ ...

In other words a tuple like <1000,101,2000,...>

And explain analyze would produce the above tuple along with the resulting
time.

Some program would have to gather these values from the log or stats data and
gather them up into a large linear system and solve for values that minimize
the divergence from the observed times.



(costs penalties are currently normalized to sequential_page_cost being 1.
That could be maintained, or it could be changed to be normalized to an
expected 1ms.)

(Also, currently explain analyze has overhead that makes this impractical.
Ideally it could subtract out its overhead so the solutions would be accurate
enough to be useful)

-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match