Edward Ned Harvey wrote:
>> I use the excellent pbzip2
>>
>>     zfs send ... | tee >(md5sum) | pbzip2 | ssh remote ...
>>
>> Utilizes those 8 cores quite well :)
>>     
>
> This (pbzip2) sounds promising, and it must be better than what I wrote.
> ;-)  But I don't understand the syntax you've got above, using tee,
> redirecting to something in parens.  I haven't been able to do this yet on
> my own system.  Can you please give me an example to simultaneously generate
> md5sum and gzip?
>
> This is how I currently do it:
> cat somefile | multipipe "md5sum > somefile.md5sum" "gzip > somefile.gz"
> End result is:
>       somefile
>       somefile.md5sum
>       somefile.gz
>
>   
Well the theory is simple. "tee" is quite sufficient, because it will
not just operate on files. It will operate on _file descriptors_ big
difference. A file descriptor can point to a whole slew of things, among
which are files and pipes, socket files, fifo's or whatever the heck
your brand of UNIX wants to call those.

Now, the shell usually gives you a lot of usual syntax for that

    ls > /dev/stderr
is usually a synonym for
    ls > /proc/self/fd/2

On to the topic of pipes...

You could make the 'anonymous' filedescriptors that your shell opens up
internally to link the pipe processes together, explicit like so:

    mkfifo /tmp/myzippipe
    mkfifo /tmp/myhashpipe
    (zfs send ... | tee /tmp/myzippipe /tmp/myhashpipe)&
    (cat /tmp/myzippipe | gzip > zipped_stream)&
    (cat /tmp/myhashpipe | md5sum > MD5SUMs)&
    wait
    unlink /tmp/my*pipe

All that is painfully verbose, leaves dangling fifo's on errors, has
security issues (fifo's on /tmp?) and looks like a clutch. It appears
that a number of shells (i think i remember using this on bash, sh, ksh)
support the nifty and obvious shorthand

    cat >(subshell command line)

which will be replaced (like in command line, environment, glob and
other expansion) by the proper filedescriptor like

    cat /dev/fd/23

Of course the actual number would be 'random'; depending on shell,
processes running etc.

This makes your needed multi-tee a snap:

    cat my_log_file | tee >(gzip > my_log_file.gz) >(wc -l) >(md5sum) |
sort | uniq -c

This will do all your hearts desires at once :) Note how the >(subshell)
notation allows you to do most anything your shell supports, including
using aliases, functions, redirection exactly like you would in
$(subshell) [1].

Well I'll stop here, because I'm sure 'man $0' in your favourite shell
will tell you more info more pertinent without requiring quite so many
keystrokes on my part


Cheers,
Seth

[1] Beware that it _is_ a subshell, so you cannot update shell
variables, certain things will not be inherited from the parent shell
(especially in security restricted environments)



   











_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to