Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-17 Thread Gregory Stark
"Gregory Stark" <[EMAIL PROTECTED]> writes: > <[EMAIL PROTECTED]> writes: > >>> If you are completely over-writing an entire stripe, there's no reason to >>> read the existing data; you would just calculate the parity information from >>> the new data. Any good controller should take that approach

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-17 Thread Gregory Stark
<[EMAIL PROTECTED]> writes: >> If you are completely over-writing an entire stripe, there's no reason to >> read the existing data; you would just calculate the parity information from >> the new data. Any good controller should take that approach. > > in theory yes, in practice the OS writes usua

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-16 Thread david
On Sat, 16 Aug 2008, Decibel! wrote: On Aug 13, 2008, at 2:54 PM, Henrik wrote: Additionally, you need to be careful of what size writes you're using. If you're doing random writes that perfectly align with the raid stripe size, you'll see virtually no RAID5 overhead, and you'll get the perfor

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-16 Thread Decibel!
On Aug 13, 2008, at 2:54 PM, Henrik wrote: Additionally, you need to be careful of what size writes you're using. If you're doing random writes that perfectly align with the raid stripe size, you'll see virtually no RAID5 overhead, and you'll get the performance of N-1 drives, as opposed to

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-14 Thread Ron Mayer
Greg Smith wrote: On Wed, 13 Aug 2008, Ron Mayer wrote: Second of all - ext3 fsync() appears to me to be *extremely* stupid. It only seems to correctly do the correct flushing (and waiting) for a drive's cache to be flushed when a file's inode has changed. This is bad, but the way PostgreSQL

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-14 Thread Scott Marlowe
I've seen it written a couple of times in this thread, and in the wikipedia article, that SOME sw raid configs don't support write barriers. This implies that some do. Which ones do and which ones don't? Does anybody have a list of them? I was mainly wondering if sw RAID0 on top of hw RAID1 wou

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-14 Thread Greg Smith
On Wed, 13 Aug 2008, Ron Mayer wrote: First off - some IDE drives don't even support the relatively recent ATA command that apparently lets the software know when a cache flush is complete. Right, so this is one reason you can't assume barriers will be available. And barriers don't work rega

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-13 Thread Ron Mayer
Scott Marlowe wrote: IDE came up corrupted every single time. Greg Smith wrote: you've drank the kool-aid ... completely ridiculous ...unsafe fsync ... md0 RAID-1 array (aren't there issues with md and the barriers?) Alright - I'll eat my words. Or mostly. I still haven't found IDE drives

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-13 Thread Henrik
13 aug 2008 kl. 17.13 skrev Decibel!: On Aug 11, 2008, at 9:01 AM, Jeff wrote: On Aug 11, 2008, at 5:17 AM, Henrik wrote: OK, changed the SAS RAID 10 to RAID 5 and now my random writes are handing 112 MB/ sek. So it is almsot twice as fast as the RAID10 with the same disks. Any ideas why?

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-13 Thread Greg Smith
On Wed, 13 Aug 2008, Ron Mayer wrote: I assume test_fsync in the postgres source distribution is a decent way to see? Not really. It takes too long (runs too many tests you don't care about) and doesn't spit out the results the way you want them--TPS, not average time. You can do it with

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-13 Thread Decibel!
On Aug 11, 2008, at 9:01 AM, Jeff wrote: On Aug 11, 2008, at 5:17 AM, Henrik wrote: OK, changed the SAS RAID 10 to RAID 5 and now my random writes are handing 112 MB/ sek. So it is almsot twice as fast as the RAID10 with the same disks. Any ideas why? Is the iozone tests faulty? does IO

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-13 Thread Scott Marlowe
On Wed, Aug 13, 2008 at 8:41 AM, Ron Mayer <[EMAIL PROTECTED]> wrote: > Greg Smith wrote: > But I still am looking for any evidence that there were any > widely shipped SATA (or even IDE drives) that were at fault, > as opposed to filesystem bugs and poor settings of defaults. Well, if they're ge

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-13 Thread Ron Mayer
Greg Smith wrote: The below disk writes impossibly fast when I issue a sequence of fsync 'k. I've got some homework. I'll be trying to reproduce similar with md raid, old IDE drives, etc to see if I can reproduce them. I assume test_fsync in the postgres source distribution is a decent way to

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-13 Thread Ron Mayer
Scott Marlowe wrote: On Tue, Aug 12, 2008 at 10:28 PM, Ron Mayer ...wrote: Scott Marlowe wrote: I can attest to the 2.4 kernel ... ...SCSI...AFAICT the write barrier support... Tested both by pulling the power plug. The SCSI was pulled 10 times while running 600 or so concurrent pgbench thr

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-13 Thread Matthew Wakeling
On Tue, 12 Aug 2008, Ron Mayer wrote: Really old software (notably 2.4 linux kernels) didn't send cache synchronizing commands for SCSI nor either ATA; Surely not true. Write cache flushing has been a known problem in the computer science world for several tens of years. The difference is that

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-13 Thread Scott Marlowe
On Tue, Aug 12, 2008 at 10:28 PM, Ron Mayer <[EMAIL PROTECTED]> wrote: > Scott Marlowe wrote: >> >> I can attest to the 2.4 kernel not being able to guarantee fsync on >> IDE drives. > > Sure. But note that it won't for SCSI either; since AFAICT the write > barrier support was implemented at the s

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-12 Thread Greg Smith
On Tue, 12 Aug 2008, Ron Mayer wrote: Really old software (notably 2.4 linux kernels) didn't send cache synchronizing commands for SCSI nor either ATA; but it seems well thought through in the 2.6 kernels as described in the Linux kernel documentation. http://www.mjmwired.net/kernel/Documentatio

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-12 Thread Ron Mayer
Scott Marlowe wrote: I can attest to the 2.4 kernel not being able to guarantee fsync on IDE drives. Sure. But note that it won't for SCSI either; since AFAICT the write barrier support was implemented at the same time for both. -- Sent via pgsql-performance mailing list (pgsql-performance@

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-12 Thread Scott Carey
I'm not an expert on which and where -- its been a while since I was exposed to the issue. From what I've read in a few places over time ( storagereview.com, linux and windows patches or knowledge base articles), it happens from time to time. Drives usually get firmware updates quickly. Drivers /

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-12 Thread david
On Tue, 12 Aug 2008, Ron Mayer wrote: Scott Carey wrote: Some SATA drives were known to not flush their cache when told to. Can you name one? The ATA commands seem pretty clear on the matter, and ISTM most of the reports of these issues came from before Linux had write-barrier support. I c

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-12 Thread Scott Marlowe
On Tue, Aug 12, 2008 at 6:23 PM, Scott Carey <[EMAIL PROTECTED]> wrote: > Some SATA drives were known to not flush their cache when told to. > Some file systems don't know about this (UFS, older linux kernels, etc). > > So yes, if your OS / File System / Controller card combo properly sends the > w

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-12 Thread Ron Mayer
Scott Carey wrote: Some SATA drives were known to not flush their cache when told to. Can you name one? The ATA commands seem pretty clear on the matter, and ISTM most of the reports of these issues came from before Linux had write-barrier support. I've yet to hear of a drive with the problem

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-12 Thread Scott Carey
Some SATA drives were known to not flush their cache when told to. Some file systems don't know about this (UFS, older linux kernels, etc). So yes, if your OS / File System / Controller card combo properly sends the write cache flush command, and the drive is not a flawed one, all is well. Most sh

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-12 Thread Ron Mayer
Greg Smith wrote: some write cache in the SATA disks...Since all non-battery backed caches need to get turned off for reliable database use, you might want to double-check that on the controller that's driving the SATA disks. Is this really true? Doesn't the ATA "FLUSH CACHE" command (say,

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-12 Thread Scott Marlowe
On Tue, Aug 12, 2008 at 1:40 PM, Henrik <[EMAIL PROTECTED]> wrote: > Hi again all, > > Just wanted to give you an update. > > Talked to Dell tech support and they recommended using write-through(!) > caching in RAID10 configuration. Well, it didn't work and got even worse > performance. Someone at

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-12 Thread Henrik
Hi again all, Just wanted to give you an update. Talked to Dell tech support and they recommended using write- through(!) caching in RAID10 configuration. Well, it didn't work and got even worse performance. Anyone have an estimated what a RAID10 on 4 15k SAS disks should generate in rand

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-11 Thread Greg Smith
On Sun, 10 Aug 2008, Henrik wrote: Normally, when a SATA implementation is running significantly faster than a SAS one, it's because there's some write cache in the SATA disks turned on (which they usually are unless you go out of your way to disable them). Lucky for my I have BBU on all my co

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-11 Thread Jeff
On Aug 11, 2008, at 5:17 AM, Henrik wrote: OK, changed the SAS RAID 10 to RAID 5 and now my random writes are handing 112 MB/ sek. So it is almsot twice as fast as the RAID10 with the same disks. Any ideas why? Is the iozone tests faulty? does IOzone disable the os caches? If not you ne

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-11 Thread Scott Marlowe
On Mon, Aug 11, 2008 at 6:08 AM, Henrik <[EMAIL PROTECTED]> wrote: > 11 aug 2008 kl. 12.35 skrev Glyn Astill: > > It feels like there is something fishy going on. >>> >>> Maybe the RAID 10 > > implementation on the PERC/6e is crap? >> >> It's possible. We had a bunch of perc/

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-11 Thread Henrik
11 aug 2008 kl. 12.35 skrev Glyn Astill: It feels like there is something fishy going on. Maybe the RAID 10 implementation on the PERC/6e is crap? It's possible. We had a bunch of perc/5i SAS raid cards in our servers that performed quite well in Raid 5 but were shite in Raid 10. I

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-11 Thread Glyn Astill
> > > >> It feels like there is something fishy going on. > Maybe the RAID 10 > >> implementation on the PERC/6e is crap? > > It's possible. We had a bunch of perc/5i SAS raid cards in our servers that performed quite well in Raid 5 but were shite in Raid 10. I switched them out for Adaptec

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-11 Thread Henrik
OK, changed the SAS RAID 10 to RAID 5 and now my random writes are handing 112 MB/ sek. So it is almsot twice as fast as the RAID10 with the same disks. Any ideas why? Is the iozone tests faulty? What is your suggestions? Trust the IOZone tests and use RAID5 instead of RAID10, or go for RA

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-10 Thread Henrik
9 aug 2008 kl. 00.47 skrev Greg Smith: On Fri, 8 Aug 2008, Henrik wrote: It feels like there is something fishy going on. Maybe the RAID 10 implementation on the PERC/6e is crap? Normally, when a SATA implementation is running significantly faster than a SAS one, it's because there's som

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-08 Thread david
On Fri, 8 Aug 2008, Henrik wrote: But random writes should be faster on a RAID10 as it doesn't need to calculate parity. That is why people suggest RAID 10 for datases, correct? I can understand that RAID5 can be faster with sequential writes. the key word here is "can" be faster, it depends

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-08 Thread Greg Smith
On Fri, 8 Aug 2008, Henrik wrote: It feels like there is something fishy going on. Maybe the RAID 10 implementation on the PERC/6e is crap? Normally, when a SATA implementation is running significantly faster than a SAS one, it's because there's some write cache in the SATA disks turned on (

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-08 Thread Andrej Ricnik-Bay
On 09/08/2008, Henrik <[EMAIL PROTECTED]> wrote: > But random writes should be faster on a RAID10 as it doesn't need to > calculate parity. That is why people suggest RAID 10 for datases, correct? If it had 10 spindles as opposed to 4 ... with 4 drives the "split" is (because you're striping and m

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-08 Thread Henrik
8 aug 2008 kl. 18.44 skrev Mark Wong: On Fri, Aug 8, 2008 at 8:08 AM, Henrik <[EMAIL PROTECTED]> wrote: But random writes should be faster on a RAID10 as it doesn't need to calculate parity. That is why people suggest RAID 10 for datases, correct? I can understand that RAID5 can be faster w

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-08 Thread Mark Wong
On Fri, Aug 8, 2008 at 8:08 AM, Henrik <[EMAIL PROTECTED]> wrote: > But random writes should be faster on a RAID10 as it doesn't need to > calculate parity. That is why people suggest RAID 10 for datases, correct? > I can understand that RAID5 can be faster with sequential writes. There is some da

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-08 Thread Henrik
But random writes should be faster on a RAID10 as it doesn't need to calculate parity. That is why people suggest RAID 10 for datases, correct? I can understand that RAID5 can be faster with sequential writes. //Henke 8 aug 2008 kl. 16.53 skrev Luke Lonergan: Your expected write speed on a

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-08 Thread Luke Lonergan
Your expected write speed on a 4 drive RAID10 is two drives worth, probably 160 MB/s, depending on the generation of drives. The expect write speed for a 6 drive RAID5 is 5 drives worth, or about 400 MB/s, sans the RAID5 parity overhead. - Luke - Original Message - From: [EMAIL PROTECT