> From: Bryan N Iotti [mailto:[email protected]]
> 
> - zfs send -R rpool@<DATE> | gzip > rpool.COMPLETE.<DATE>.gz
> 
> ... as per Oracle manual.
> 
> I was wondering why it was so slow, taking a couple of hours, then I
> payed attention to my CPU meter and understood that the normal gzip was
> running as a single thread.
> 
> I searched online for a multithreaded version and came across pigz
> (http://www.zlib.net/pigz/). I downloaded it, modified the Makefile to
> use the Solaris Studio 12.3 cc with the proper CFLAGS ("-fast", for now,
> no -m64). I then copied both pigz and unpigz to a directory in my PATH
> and modified the last command to:
> - zfs send -R rpool@<DATE> | /usr/local/bin/pigz >
> rpool.COMPLETE.<DATE>.gz

Before anything else, you shouldn't be storing your zfs send in a file.  
Because of two major reasons:  If you need to do a restore, then a single bit 
corruption destroys the whole set.  And your only option would be a complete 
filesystem restore, no per-file granularity.

Instead, you should zfs send | zfs receive every single time.  If you're in the 
unfortunate situation of receiving onto a system that doesn't support zfs, you 
can instead, create a loopback file container on the receiving system, and 
create a zfs filesystem inside it, so you can zfs receive onto the destination 
filesystem.  (I don't think you even need to use lofiadm, the official loopback 
device tool - I think zfs can use a file directly, as if it were a device.)  
Check the man page, but I think the zpool -d option allows you to specify a 
file to use as if it were a device.

Now, a few words on compression and threading.

Gzip and pigz are based on zlib.  So ... If you check your performance using 
the default compression level, or add the --fast command-line argument, I think 
you'll find the performance is significantly better with --fast, while the 
compression is not significantly worse.  This is true for both the single 
threaded, and the pigz implementations.  If you're going to continue using gzip 
and/or pigz, try using --fast in every situation, and I think your life will 
become slightly better.

But there are lots of different compression algorithms.  
* lzop is based on lzo, which is extremely fast but not as powerful as zlib.  
Unfortunately, no known parallel implementation.
* If you enable compression on a zfs filesystem, by default it's using lzjb, 
which is very similar in performance to lzo.  I do this for nearly every zfs 
filesystem anywhere; generally speaking, the filesystem is both faster and 
smaller with this enabled, as compared to using no compression.
* zlib is often considered "default" just because it's common, and most people 
don't think much abuot it.  Used in zip files and gz files.   Call it "medium" 
in terms of compression and speed characteristics.
* bzip2 is much slower but somewhat stronger than zlib. There's a parallel 
implementation called pbzip2.  IMHO, there's no situation where this is the 
best option, because it's slower than either zlib or lzma, but it doesn't 
compress as well as lzma.
* xz is based on lzma.  Last I knew, they were still working out the bugs of 
the parallel implementation, but maybe by now it's ready.  If you use the 
--fast option, it generally compresses almost as fast as gzip --fast, and it 
compresses much better than either gzip or bzip2.

At some point, I got tired of waiting for all the parallel implementations of 
all these things.  And even when products like pigz got released, I was 
disappointed at the cross-platform compatibility, and in some cases, the 
threading model.  So I wrote python parallel threaded compression, threadzip.  
http://code.google.com/p/threadzip  Who knows, it might even be useful for you.


_______________________________________________
OpenIndiana-discuss mailing list
[email protected]
http://openindiana.org/mailman/listinfo/openindiana-discuss

Reply via email to