On Tue, 9 May 2006, Nicolas Williams wrote:

> On Tue, May 09, 2006 at 01:33:33PM -0700, Darren Reed wrote:
> > Eric Schrock wrote:
> >
> > >...
> > >Asynchronous remote replication can be done today with 'zfs send' and
> > >zfs receive', though it needs some more work to be truly useful.  It has
> > >the properties that it doesn't tax local activity, but your data will be
> > >slightly out of sync (depending on how often you sync your data,
> > >preferably a few minutes
> > >
> >
> > Is it possible to add "tail -f" like properties to 'zfs send'?
> >
> > I suppose what I'm thinking of for 'zfs send -f' would be to send
> > down all of the transactions that update a ZFS data set, both the
> > metadata and the data.
> >
> > The catch here would be to start the 'zfs send -f' at the same time
> > as the filesystem came online so that there weren't any transactional
> > gaps.
> >
> > Thoughts?
>
> +1
>
> Add to this some churn/replication throttling and you may not want just
> a command-line interface but a library also.
>
> E.g., if the stdout/remote connection of zfs send -f blocked for
> long/broke then zfs should snapshot at the latest TXG and hold on to
> that snapshot until the output could drain and/or connection be
> restored, then resume by sending the incremental from the current TXG to
> that snapshot...

While I agree that zfs send is incredibly useful, after reading this post
I'm asking myself:

a) This already sounds like we're descending the slippery slope of
'checkpointing' - which is an incredibly hard problem to solve and
involves considerable hardware/software resources to achieve.  The only
successfull implementation (arguably) that does checkpointing, that I know
about, is the Burroughs B7700 stack-based mainframe - where every process
is a stack and checkpointing consisted of taking a snapshot of the stack
that represents the processes and moving it to other (mirror) hardware.
And much of this is implemented in hardware to solve the excessively high
"costs" of such operations.

b) You can never sucessfully checkpoint an application via data
replication.  Why?  Because, at some point you're trying to take a
snapshot of a process (or related processes) that modifies multiple files
that represent inter-related data.  That is what we have relational
databases for and the concept of:

begin_transaction
do blah op a
do blah op b
do baah op c
end_transaction

If anything goes wrong with operation a, b or c, you want to backout the
entire transaction.  If remote data replication could be implemented
successfully, you would not need begin_transaction ... end_transaction
semantics or (to spend the $s on) an RDBMS.

Or stated in different terms: if remote replication resolved the issue of
maintaining application state, then one could simply replicate the
underlying files that represented an Oracle or mySQL database and you're
done with application/site failover.  Buzzzzz ... loser.  Not possible.

The real issue is where do you draw the line?
And how do you manage user expectations if the user is convinced that by
mirroring the active filesystem, they have achieved site
diversity/failover?

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
           Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to