Re: [lopsa-discuss] async replication

John Stoffel Mon, 17 May 2010 12:14:46 -0700

Yves> For a site I am working at, we're looking at NAS async
Yves> replication across continents (latency > 100 ms). We've just
Yves> started looking at this, and are right now looking at IBM SONAS,
Yves> HP Ibrix, and Isilon.


How much data are you looking at?  And how strict are your
requirements?  I.e. can a file change at both sites at the same time
and if so, who wins in the replication update battle?  

Yves> The idea is:

Yves> * a file can be opened for writing on any of the NAS node.

Ouch, this is going to kill you, esp if it can be opened at each site
at the same time.  It might be possible to use a Cluster based
filesystem instead, which per-file global locking.  But the overhead,
esp over 100+ms latency will probably be huge.  

And how to handle WAN outages too...

Yves> * when a file is open for writing on one node, it is locked and
Yves> becomes read only on the other nodes (locking done by the NAS
Yves> device/filesystem, not the apps).

Key.  

Yves> * replication is done as the file gets written, not afterwards.

Umm, so what happens if I open a file for writing, truncate it, then
start writing new data.  But the WAN goes down after the truncate, but
before the write of new data?  And the WAN stays down and you need to
bring up the remote site now in a standlone manner and make it the
master? 

Yves> * once the file is closed one the writing node, and replicated
Yves> is complite everywhere, the file is available for writing on all
Yves> the nodes again.

Yves> Anybody has any experience with something like this?

The only thing I can suggest is to use a Cluster Aware filesystem,
which can be exported locally as NFS.  That *might* do the trick.  But
you might have to have all your nodes running the cluster filesystem.

We tried using Netapp's FlexCache product and it just didn't work out
for us.  This isn't quite the same thing, in that FlexCache has a
single writeable master, and multiple read-only slaves.  The idea
being that the slaves only cached the contenct that was actually used
locally.  

For us, with 20, 60 and 100ms latencies from the remote site,
performance just sucked rocks.  As did using Netapp's SnapVault
technology.  SnapMirror has been a much better performer, but having
someone create a 20Gb file will just swamp the network.

In my experience, none of the vendors are tuning their protocols for
Fast/Wide Pipe, and the TCP Delay Bandwith Product ends up killing
your performance.

So I worry about people generating a single large update to a file,
which then locks the file at all the remote sites for hours or even
days.  

But hey, I'm not quite into this space at all.  We gave up, beyond NFS
over the WAN for some stuff, and just put writeable stuff locally and
have users login across the WAN do their work at each site.  VNC and
NX are quite good for this, and it's a much easier to handle data
transport problem.

John

_______________________________________________
Discuss mailing list
Discuss@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Re: [lopsa-discuss] async replication

Reply via email to