Re: [PERFORM] Sustained inserts per sec ... ?
On Apr 5, 2005 12:16 AM, Christopher Petrilli <[EMAIL PROTECTED]> wrote: > Looking at preliminary results from running with shared_buffers at > 16000, it seems this may be correct. Performance was flatter for a > BIT longer, but slammed right into the wall and started hitting the > 3-30 second range per COPY. I've restarted the run, with fsync turned > on (fdatasync), and we'll see. > > My fear is that it's some bizarre situation interacting with both > issues, and one that might not be solvable. Does anyone else have > much experience with this sort of sustained COPY? Well, here's the results: http://www.amber.org/~petrilli/diagrams/pgsql_copy500_comparison.png The red is the run with shared_buffers turned up, but fsync off. The blue is the run with shared_buffers turned up, but fsync on. Note that it hits the wall sooner. Unfortunately, my brain is fried, and not sure what that means! Chris -- | Christopher Petrilli | [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Sustained inserts per sec ... ?
On Mon, 2005-04-04 at 22:36 -0400, Tom Lane wrote: > Christopher Petrilli <[EMAIL PROTECTED]> writes: > > On Apr 4, 2005 12:23 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > >> do a test run with *no* indexes on the table, just to see if it behaves > >> any differently? Basically I was wondering if index overhead might be > >> part of the problem. > > > http://www.amber.org/~petrilli/diagrams/pgsql_copy500_pkonly.png > > > I appologize, I forgot to kill the PK, but as you can see, the curve > > flattened out a lot. It still begins to increase in what seems like > > the same place. You can find the results themselves at: > > Yeah, this confirms the thought that the indexes are the source of > the issue. (Which is what I'd expect, because a bare INSERT ought to be > an approximately constant-time operation. But it's good to verify.) Yup, indexes are the best explanation so far - block extension needs some work, but I doubted that it was the source of this effect. > Now some amount of slowdown is to be expected as the indexes get larger, > since it ought to take roughly O(log N) time to insert a new entry in an > index of size N. The weird thing about your curves is the very sudden > jump in the insert times. Well, ISTM that the curve is far from unique. Mark's OSDL tests show them too. What was wierd, for me, was that it "resets" when you move to a new table. The index theory does accurately explain that. Perhaps the jump is not so sudden? Do I see a first small step up at about 4.5M rows, then another much bigger one at 7.5M (which looks like the only one at first glance)? > What I think might be happening is that the "working set" of pages > touched during index inserts is gradually growing, and at some point it > exceeds shared_buffers, and at that point performance goes in the toilet > because we are suddenly doing lots of reads to pull in index pages that > fell out of the shared buffer area. So this does seem to be the best explanation and it seems a good one. It's also an excellent advert for table and index partitioning, and some damning evidence against global indexes on partitioned tables (though they may still be better than the alternative...) > The indicated fix of course is to increase shared_buffers. Splitting your tables at 4M, not 10M would work even better. .. Anyway, where most of this started was with Christopher's comments: On Fri, 2005-04-01 at 14:38 -0500, Christopher Petrilli wrote: > This was an application originally written for MySQL/MYISAM, and it's > looking like PostgreSQL can't hold up for it, simply because it's "too > much database" if that makes sense. The same box, running the MySQL > implementation (which uses no transactions) runs around 800-1000 > rows/second systained. B-trees aren't unique to PostgreSQL; the explanation developed here would work equally well for any database system that used tree-based indexes. Do we still think that MySQL can do this when PostgreSQL cannot? How? Do we have performance test results showing the same application load without the degradation? We don't need to look at the source code to measure MySQL performance... Best Regards, Simon Riggs ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Sustained inserts per sec ... ?
On Apr 5, 2005 3:48 PM, Simon Riggs <[EMAIL PROTECTED]> wrote: > > Now some amount of slowdown is to be expected as the indexes get larger, > > since it ought to take roughly O(log N) time to insert a new entry in an > > index of size N. The weird thing about your curves is the very sudden > > jump in the insert times. > > Well, ISTM that the curve is far from unique. Mark's OSDL tests show > them too. What was wierd, for me, was that it "resets" when you move to > a new table. The index theory does accurately explain that. > > Perhaps the jump is not so sudden? Do I see a first small step up at > about 4.5M rows, then another much bigger one at 7.5M (which looks like > the only one at first glance)? > > > What I think might be happening is that the "working set" of pages > > touched during index inserts is gradually growing, and at some point it > > exceeds shared_buffers, and at that point performance goes in the toilet > > because we are suddenly doing lots of reads to pull in index pages that > > fell out of the shared buffer area. > > So this does seem to be the best explanation and it seems a good one. > > It's also an excellent advert for table and index partitioning, and some > damning evidence against global indexes on partitioned tables (though > they may still be better than the alternative...) > > > The indicated fix of course is to increase shared_buffers. > > Splitting your tables at 4M, not 10M would work even better. Unfortunately, given we are talking about billions of rows potentially, I'm concerned about that many tables when it comes to query time. I assume this will kick in the genetic optimizer? > Anyway, where most of this started was with Christopher's comments: > > On Fri, 2005-04-01 at 14:38 -0500, Christopher Petrilli wrote: > > This was an application originally written for MySQL/MYISAM, and it's > > looking like PostgreSQL can't hold up for it, simply because it's "too > > much database" if that makes sense. The same box, running the MySQL > > implementation (which uses no transactions) runs around 800-1000 > > rows/second systained. > > B-trees aren't unique to PostgreSQL; the explanation developed here > would work equally well for any database system that used tree-based > indexes. Do we still think that MySQL can do this when PostgreSQL > cannot? How? There are customers in production using MySQL with 10M rows/table, and I have no evidence of this behavior. I do not have the test jig for MySQL, but I can create one, which is what I will do. Note that they are using MyISAM files, so there is no ACID behavior. Also, I have seen troubling corruption issues that I've never been able to concretely identify. Above all, I've been impressed that PostgreSQL, even when it hits this wall, never corrupts anything. > Do we have performance test results showing the same application load > without the degradation? We don't need to look at the source code to > measure MySQL performance... I will see what I can do in the next few days to create a similar little test for MySQL. Chris -- | Christopher Petrilli | [EMAIL PROTECTED] ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Sustained inserts per sec ... ?
On Apr 5, 2005 3:48 PM, Simon Riggs <[EMAIL PROTECTED]> wrote: > B-trees aren't unique to PostgreSQL; the explanation developed here > would work equally well for any database system that used tree-based > indexes. Do we still think that MySQL can do this when PostgreSQL > cannot? How? > > Do we have performance test results showing the same application load > without the degradation? We don't need to look at the source code to > measure MySQL performance... http://www.amber.org/~petrilli/diagrams/comparison_mysql_pgsql.png That chart shows MySQL (using INSERT against MyISAM tables) and PostgreSQL (using COPY) running with the exact same code otherwise. Note that MySQL does hit a bit of a wall, but nothing as drastic as PostgreSQL and actually maintains something "more flat". The red and blue dashed lines are the 95th percentile point. My suspicion is that what we're seeing is WAL issues, not particularly index issues. The indices just fill up the WAL faster because there's more data. This is a wag basically, but it would seem to explain the difference. In both cases, the indices were identical. Five on each. One interesting thing... PostgreSQL starts out a good bit faster, but looses in the end. Chris -- | Christopher Petrilli | [EMAIL PROTECTED] ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Follow-Up: How to improve db performance with $7K?
Thomas F.O'Connell wrote: > I'd use two of your drives to create a mirrored partition where pg_xlog > resides separate from the actual data. > > RAID 10 is probably appropriate for the remaining drives. > > Fortunately, you're not using Dell, so you don't have to worry about > the Perc3/Di RAID controller, which is not so compatible with > Linux... Hmm...I have to wonder how true this is these days. My company has a Dell 2500 with a Perc3/Di running Debian Linux, with the 2.6.10 kernel. The controller seems to work reasonably well, though I wouldn't doubt that it's slower than a different one might be. But so far we haven't had any reliability issues with it. Now, the performance is pretty bad considering the setup -- a RAID 5 with five 73.6 gig SCSI disks (10K RPM, I believe). Reads through the filesystem come through at about 65 megabytes/sec, writes about 35 megabytes/sec (at least, so says "bonnie -s 8192"). This is on a system with a single 3 GHz Xeon and 1 gigabyte of memory. I'd expect much better read performance from what is essentially a stripe of 4 fast SCSI disks. While compatibility hasn't really been an issue, at least as far as the basics go, I still agree with your general sentiment -- stay away from the Dells, at least if they have the Perc3/Di controller. You'll probably get much better performance out of something else. -- Kevin Brown [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])