Oops forgot to attach the script. Here's the script I mentioned.
On Fri, Mar 14, 2014 at 12:21 PM, Chris Allen <ca.al...@gmail.com> wrote: > > I have already made some test and I have not be able to make any > > conclusive tests proving performance should be hurt by using sparse. > > Yeah it shouldn't affect the ZFS mechanics at all, the ZVOL will just lack > a reservation. > > > > Is sparse a way to provision more than a 100% then? > Yes. That, and it enables you to take advantage of compression on the > volume. Without sparse the volume is always going to take away the same > amount of space from the pool (due to the hard reservation) regardless of > whether or not compression and/or dedup is on. You just have to be careful > to monitor pool capacity. Bad things will happen if your SAN server runs > out of space... I attached a quick and dirty script I wrote to monitor > pool capacity and status, and send an e-mail alert if the pool degrades or > a capacity threshold is hit. I run it from cron every 30 minutes. > > > > For me 8k block size for volumes seems to be given more write speed. > 8k for me too is much better than 4k. With 4k I tend to hit my IOPS limit > easily, with not much throughput, and I get a lot of IO delay on VMs when > the SAN is fairly busy. Currently I'm leaning towards 16k, sparse, with > lz4 compression. If you go the sparse route then compression is a no > brainer as it accelerates performance on the underlying storage > considerably. Compression will lower your IOPS and data usage both are > good things for performance. ZFS performance drops as usage rises and gets > really ugly at around 90% capacity. Some people say it starts to drop with > as little as 10% used, but I have not tested this. With 16k block sizes > I'm getting good compression ratios - my best volume is 2.21x, my worst > 1.33x, and the average is 1.63x. So as you can see a lot of the time my > real block size on disk is going to be effectively smaller than 16k. The > tradeoff here is that compression ratios will go up with a larger block > size, but you'll have to do larger operations and thus more waste will > occur when the VM is doing small I/O. With a large block size on a busy > SAN your I/O is going to get fragmented before it hits the disk anyway, so > I think 16k is good balance. I only have 7200 RPM drives in my array, but > a ton of RAM and a big ZFS cache device, which is another reason I went > with 16k, to maximize what I get when I can get it. I think with 15k RPM > drives 8k block size might be better, as your IOPS limit will be roughly > double that of 7200 RPM. > > Dedup did not work out well for me. Aside from the huge memory > consumption, it didn't save all that much space and to save the max space > you need to match the VM's filesystem cluster size to the ZVOL block size. > Which means 4k for ext4 and NTFS (unless you change it during a Windows > install). Also dedup really really slows down zpool scrubbing and possibly > rebuild. This is one of the main reasons I avoid it. I don't want scrubs > to take forever, when I'm paranoid of something potentially being wrong. > > > > Regards write caching: Why not simply use sync > > directly on the volume? > > Good question. I don't know. > > > > I have made no tests on Solaris - licens costs is out of my league. I > > regularly test FreeBSD, Linux and Omnios. In production I only use > > Omnios (15008 but will migrate all to r151014 when this is released > > and then only use LTS in the future). > > I'm in the process of trying to run away from all things Oracle at my > company. We keep getting burned by them. It's so freakin' expensive, and > they hold you over a barrel with patches for both hardware and software. > We bought some very expensive hardware from them, and a management > controller for a blade chassis had major bugs to the point it was > practically unusable out of the box. Oracle would not under any > circumstance supply us with the new firmware unless we spent boatloads of > cash for a maintenance contract. We ended up doing this because we needed > the controller to work as advertised. This is what annoys me the most with > them - you buy a product and it doesn't do what is written on the box and > then you have to pay tons extra for it to do what they said it would do > when you bought it. I miss Sun... > > > > On Fri, Mar 14, 2014 at 10:52 AM, Michael Rasmussen <m...@datanom.net>wrote: > >> On Fri, 14 Mar 2014 10:11:17 -0700 >> Chris Allen <ca.al...@gmail.com> wrote: >> >> > > It was also part of latest 3.1. Double-click the mouse over your >> > > storage specification in Datacenter->storage and the panel pops up. >> > > Patched panel attached. >> > >> I forgot to mention that at the moment the code for creating ZFS >> storage is commented out >> in /usr/share/pve-manager/ext4/pvemanagerlib.js line 20465-20473 >> >> > >> > No I haven't. As far as I understand it sparse should not affect >> > performance whatsoever, it only changes whether or not a reservation is >> > created on the ZVOL. Turning of write caching on the LU should decrease >> > performance, dramatically so, if you do not have a separate and very >> fast >> > ZIL device (eg. ZeusRAM). Every block write to the ZVOL will be done >> > synchronously when write caching is turned off. >> > >> I have already made some test and I have not be able to make any >> conclusive tests proving performance should be hurt by using sparse. Is >> sparse a way to provision more than a 100% then? >> >> > I've done some testing with regards to block size, compression, and >> dedup. >> > I wanted sparse support for myself and I figured while I was there I >> might >> > as well add a flag for turning off write caching. For people with the >> > right (and expensive!) hardware the added safety of no write caching >> might >> > be worth it. >> > >> I have done the same. For me 8k block size for volumes seems to be given >> more write speed. Regards write caching: Why not simply use sync >> directly on the volume? >> >> > Have you tested the ZFS storage plugin on Solaris 11.1? I first tried >> > using it with 11.1, but they changed how the LUN assignment for the >> views >> > works. In 11.0 and OmniOS the first available LUN will get used when a >> new >> > view is created if no LUN is given. But in 11.1 it gets populated with >> a >> > string that says "AUTO". This of course means PVE can't connect to the >> > volume because it can't resolve the LUN. Unfortunately I couldn't find >> > anything in the 11.1 documentation that described how to get the LUN. >> I'm >> > assuming there's some kind of mechanism in 11.1 where you can get the >> > number on the fly, as it must handle them dynamically now. But after a >> lot >> > of Googling and fiddling around I gave up and switched to OmniOS. I >> don't >> > have a support contract with Oracle so that was a no go. Anyway, just >> > thought I'd mention that in case you knew about it. >> > >> > In addition to that problem 11.1 also has a bug in the handling of the >> > iSCSI feature Immediate Data. It doesn't implement it properly >> according >> > to the iSCSI RFC, and so you need to turn of Immediate Data on the >> client >> > in order to connect. The patch is available to Oracle paying support >> > customers only. >> > >> I have made no tests on Solaris - licens costs is out of my league. I >> regularly test FreeBSD, Linux and Omnios. In production I only use >> Omnios (15008 but will migrate all to r151014 when this is released >> and then only use LTS in the future). >> >> -- >> Hilsen/Regards >> Michael Rasmussen >> >> Get my public GnuPG keys: >> michael <at> rasmussen <dot> cc >> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E >> mir <at> datanom <dot> net >> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C >> mir <at> miras <dot> org >> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 >> -------------------------------------------------------------- >> /usr/games/fortune -es says: >> I never failed to convince an audience that the best thing they >> could do was to go away. >> > >
#!/usr/bin/python import os import sys import re import smtplib from socket import gethostname from subprocess import * ############################################################################# ## config ############################################################################# ## address to mail alerts to mailto = 'm...@mydomain.com' ## address to mail alerts from mailfrom = 'root...@mydomain.com' % gethostname() ## mail server to use mailserver = 'mail.mydomain.com' ## capacity threshold - percent full ## if the pool is at this capacity or higher, then generate an alert cap_threshold = 80 ############################################################################# ## objects ############################################################################# def run(cmd): """run a command in a subshell""" p = Popen(cmd, stdout=PIPE, stderr=STDOUT, shell=True) stdout, stderr = p.communicate() return (p.returncode, stdout) def sendmail(to, subject, body): """send an e-mail""" global mailfrom global mailserver mailer = smtplib.SMTP(mailserver) msg = 'From: <%s>\n' % mailfrom msg += 'Subject: %s \n\n %s' % (subject, body) mailer.sendmail(mailfrom, mailto, msg) def alert(poolname, health): """alert the sysadmin about a zfs event""" global mailto subject = 'ZFS ALERT! host: "%s" pool: "%s" health: "%s"' % ( \ gethostname(), poolname, health) if health == 'CAPACITY': rv, stdout = run('zpool list ' + poolname) sendmail(mailto, subject, stdout) else: ## here we send the message with an empty body first just in case zpool ## status blocks sendmail(mailto, subject, 'next message contains the pool status') ## now we try to get the pool status rv, stdout = run('zpool status ' + poolname) sendmail(mailto, subject, stdout) print 'alert sent to "%s" for pool "%s"' % (mailto, poolname) ############################################################################# ## main ############################################################################# if __name__ == '__main__': rv, stdout = run('zpool list -o "name,cap,health" ') for l in stdout.split('\n'): if re.match(r'NAME', l): continue if not l: continue pool = re.split(r' +', l) poolname = pool[0] capacity = pool[1][:-1] health = pool[2] if not health == 'ONLINE': alert(poolname, health) if int(capacity) >= cap_threshold: alert(poolname, 'CAPACITY')
_______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel