Re: [PERFORM] Hardware impact on performances
Camille Chafer wrote: Hi, I'm using PostgreSQL 8 for a mmorpg. The part of each operation is : select: 50%, update: 40%, insert: 10%. I have no more than 4-5 concurrent connections to the database, but each of them does A LOT of queries (several per second). The database size is about 1GB, but it'll probably be around 2GB in a fews months. The OS will be FreeBSD (version production 5.3 probably, or 4.10) At this time, i'm looking for a new server. Before to buy it, I grab some informations.. So, my question is : what would be the best hardware for this type of needs ? Of course, I'm not asking for a trademark and/or for prices, but for hints. - What is the most important part of the system : CPU ? RAM ? Disks ? Usually Disks/RAM. Since you've got a lot of updates/inserts, battery-backed write-cache on your raid controller would be good. - Is a server with 2 or more CPUs much better than a server with a single one, for a pgsql database ? With 2+ connections, each can be serviced by one CPU. Of course, if your disk I/O is saturated then it won't help. - How much RAM do I need ? The size of the data ? Twice the size ? Ideally, enough to hold your "working set". That is, enough cache to store all pages/indexes you regularly access. - I heard Raid1+0 is better than Raid 5. Is it right ? What would be the best configuration, regarding performances and security ? It can depend - check the list archives for a lot of discussion on this. More disks is always better. - Does the CPU type (i386, PowerPC, ) matters ? Dual-Xeons have given problems. A lot of people seem to think Opteron-based systems provide good value. - A lot of queries probably generate a lot of network output. Does the network controller matters ? Well, obviously the more time spent handling network I/O, the less time you spend running queries. I'd think it would have to be a *lot* of activity to make a serious difference. - And finally, last question : is it possible to run a single postgresql database on several servers ? (hardware clustering) Not easily, and it probably wouldn't provide any performance benefit. Plenty of replication options though. Thanks in advance for your answers, and sorry for my crap english (i'm french). Your English is perfect. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] What about utility to calculate planner cost constants?
I was following the cpu_tuple_cost thread and wondering, if it could be possible to make PQA style utility to calculate configuration-specific values for planner cost constants. It could make use of output of log_(statement|parser|planner|executor)_stats, tough I'm not sure if the output contains anything useful for those purposes. Otherwise it could just collect statements, run EXPLAIN ANALYZE for all of them and then play with planner cost constants to get the estimated values as close as possible to actual values. Something like Goal Seek in Excel, if you pardon my reference to MS :). Somewhat similar project seems to be pgautotune from pgFoundry, but it only considers shared_buffers, sort_mem and vacuum_mem. And it seems to use synthetic data instead of actual database and actual statements from log. And it has been inactive for a while. Tambet ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] column name is "LIMIT"
On 3/14/2005 4:26 AM, Qingqing Zhou wrote: So is it to make SQL parser context-sensitive - say the parser will understand that in statement "SELECT * from LIMIT", LIMIT is just a table name, instead of keyword? More or less, yes. To use a reserved keyword as an identifier (table or column name), it must be enclosed in double quotes. Double quotes are also used to make identifiers case sensitive. So select someval, "SOMEVAL", "someVAL" from "user"; is a valid query retrieving 3 distinct columns from the table "user". There is a builtin function quote_ident() in PostgreSQL that is supposed to return a properly quoted string allowed as an identifier for whatever name is passed in. But it fails to do so for all lower case names that are reserved keywords. The queries Slony executes on the replicas are constructed using that quoting function, and therefore Slony fails to build valid SQL for replicated table containing reserved keyword identifiers. One solution would be to brute-force quote all identifiers in Slony ... not sure what the side effects performance wise would be. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] column name is "LIMIT"
Jan Wieck <[EMAIL PROTECTED]> writes: > There is a builtin function quote_ident() in PostgreSQL that is supposed > to return a properly quoted string allowed as an identifier for whatever > name is passed in. But it fails to do so for all lower case names that > are reserved keywords. Not any more ... regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] What about utility to calculate planner cost constants?
Tambet, > I was following the cpu_tuple_cost thread and wondering, if it could be > possible to make PQA style utility to calculate configuration-specific > values for planner cost constants. It could make use of output of > log_(statement|parser|planner|executor)_stats, tough I'm not sure if the > output contains anything useful for those purposes. Yeah, that's something I need to look at. > Otherwise it could just collect statements, run EXPLAIN ANALYZE for all > of them and then play with planner cost constants to get the estimated > values as close as possible to actual values. Something like Goal Seek > in Excel, if you pardon my reference to MS :). That's not really practical. There are currently 5 major query tuning parameters, not counting the memory adjustments which really can't be left out. You can't realistically test all combinations of 6 variables. > Somewhat similar project seems to be pgautotune from pgFoundry, but it > only considers shared_buffers, sort_mem and vacuum_mem. And it seems to > use synthetic data instead of actual database and actual statements from > log. And it has been inactive for a while. Yeah, pg_autotune is a dead project. Once we got OSDL able to run tests, we came up with some rules-of-thumb which are more useful than autotune's output. More importantly, the approach doesn't scale to the 15-20 GUCs which we'd realistically want to test. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] What needs to be done for real Partitioning?
Stacy, > Luckily they that had the chance to work with a truly fantastic DBA (the > author of an Oracle Press performance tuning book even) before they could > switch back. He convinced them to make some of their indexes global. > Performance dramatically improved (compared with both the unpartitioned > schema, and the partitioned-and-locally-indexed schema), and they've since > stayed with partitioned tables and a mix of local and global indexes. Hmmm. Wouldn't Greg's suggestion of a bitmap index which holds information on what values are found in what partition also solve this?Without 1/2 of the overhead imposed by global indexes? I can actually see such a bitmap as being universally useful to the partitioning concept ... for one, it would resolve the whole "partition on {value}" issue. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] What about utility to calculate planner cost constants?
If by not practical you mean, "no one has implemented a multivariable testing approach," I'll agree with you. But multivariable testing is definitely a valid statistical approach to solving just such problems. -tfo -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 On Mar 21, 2005, at 11:51 AM, Josh Berkus wrote: That's not really practical. There are currently 5 major query tuning parameters, not counting the memory adjustments which really can't be left out. You can't realistically test all combinations of 6 variables. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] What needs to be done for real Partitioning?
On Mon, Mar 21, 2005 at 09:55:03AM -0800, Josh Berkus wrote: > Stacy, > > > Luckily they that had the chance to work with a truly fantastic DBA (the > > author of an Oracle Press performance tuning book even) before they could > > switch back. He convinced them to make some of their indexes global. > > Performance dramatically improved (compared with both the unpartitioned > > schema, and the partitioned-and-locally-indexed schema), and they've since > > stayed with partitioned tables and a mix of local and global indexes. > > Hmmm. Wouldn't Greg's suggestion of a bitmap index which holds information > on > what values are found in what partition also solve this?Without 1/2 of > the overhead imposed by global indexes? > > I can actually see such a bitmap as being universally useful to the > partitioning concept ... for one, it would resolve the whole "partition on > {value}" issue. I suspect both will have their uses. I've read quite a bit about global v. local indexs in Oracle, and there are definately cases where global is much better than local. Granted, there's some things with how Oracle handles their catalog, etc. that might make local indexes more expensive for them than they would be for PostgreSQL. It's also not clear how much a 'partition bitmap' index would help. As for the 'seqscan individual partitions' argument, that's not going to work well at all for a case where you need to hit a relatively small percentage of rows in a relatively large number of partitions. SELECT ... WHERE customer_id = 1 would be a good example of such a query (assuming the table is partitioned on something like invoice_date). -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] What needs to be done for real Partitioning?
On Sat, Mar 19, 2005 at 07:05:53PM -0500, Tom Lane wrote: > Alvaro Herrera <[EMAIL PROTECTED]> writes: > > We probably also need multi-table indexes. > > As Josh says, that seems antithetical to the main point of partitioning, > which is to be able to rapidly remove (and add) partitions of a table. > If you have to do index cleaning before you can drop a partition, what's > the point of partitioning? Why would you need to do index cleaning first? Presumably the code that goes to check a heap tuple that an index pointed at to ensure that it was visible in the current transaction would be able to recognize if the partition that tuple was in had been removed, and just ignore that index entry. Granted, you'd need to clean the index up at some point (presumably via vacuum), but it doesn't need to occur at partition drop time. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] What about utility to calculate planner cost constants?
Thomas, > If by not practical you mean, "no one has implemented a multivariable > testing approach," I'll agree with you. But multivariable testing is > definitely a valid statistical approach to solving just such problems. Well, not practical as in: "would take either $10 million in equipment or 10,000 hours or both" --Josh -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] What needs to be done for real Partitioning?
I think Greg's email did a good job of putting this on track. Phase 1 should be manual, low-level type of support. Oracle has had partitioning for years now, and IF they've added automated partition management, it's only happened in 10g which is pretty recent. For inserts that don't currently have a defined partition to fit in, the Oracle model might be better than tossing an error: a partitioned table in Oracle also contains a default partition. Any rows that don't match a defined partition go into the default partition. For many cases you'll never have anything in the default partition, but sometimes you'll have some partition values that occur infrequenttly enough in the table so as not to warrant their own partition. There's also another partitioning application that I think is often overlooked. I have a table with about 130M rows that is 'pseudo-partitioned' by project_id. Right now, there are 5 different project IDs that account for the bulk of those 130M rows. Oracle provides a means to partition on discreet values. When you do this, there's not actually any reason to even store the partition field in the partition tables, since it will be the same for every row in the partition. In my case, since the smallint project ID is being aligned to a 4 byte boundary, having this feature would save ~120M rows * 4 bytes = 480MB in the table. Granted, 480MB isn't anything for today's disk sizes, but it makes a huge difference when you look at it from an I/O standpoint. Knowing that a partition contains only one value of a field or set of fields also means you can drop those fields from local indexes without losing any effectiveness. In my case, I have 2 indexes I could drop project_id from. Does each node in a B-tree index have the full index key? If so, then there would be substantial I/O gains to be had there, as well. Even if each node doesn't store the full key, there could still be faster to handle a narrower index. I realize this might be a more difficult case to support. It probably couldn't be done using inheritance, though I don't know if inheritence or a union view is better for partitioning. In either case, this case might not be a good candidate for phase 1, but I think partitioning should be designed with it in mind. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] What about utility to calculate planner cost constants?
Josh Berkus writes: > > Otherwise it could just collect statements, run EXPLAIN ANALYZE for all > > of them and then play with planner cost constants to get the estimated > > values as close as possible to actual values. Something like Goal Seek > > in Excel, if you pardon my reference to MS :). > > That's not really practical. There are currently 5 major query tuning > parameters, not counting the memory adjustments which really can't be left > out. You can't realistically test all combinations of 6 variables. I don't think it would be very hard at all actually. It's just a linear algebra problem with a bunch of independent variables and a system of equations. Solving for values for all of them is a straightforward problem. Of course in reality these variables aren't actually independent because the costing model isn't perfect. But that wouldn't be a problem, it would just reduce the accuracy of the results. What's needed is for the explain plan to total up the costing penalties independently. So the result would be something like 1000 * random_page_cost + 101 * sequential_page_cost + 2000 * index_tuple_cost + ... In other words a tuple like <1000,101,2000,...> And explain analyze would produce the above tuple along with the resulting time. Some program would have to gather these values from the log or stats data and gather them up into a large linear system and solve for values that minimize the divergence from the observed times. (costs penalties are currently normalized to sequential_page_cost being 1. That could be maintained, or it could be changed to be normalized to an expected 1ms.) (Also, currently explain analyze has overhead that makes this impractical. Ideally it could subtract out its overhead so the solutions would be accurate enough to be useful) -- greg ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match