> From: Doug Hughes [mailto:d...@will.to]
>  
>   I have tended to subscribe to the more cautionary advice still
> espoused by the ZFS Evil Tuning guide:
> http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Di
> sabling_the_ZIL_.28Don.27t.29
> 
> In particular:
> "Caution: Disabling the ZIL on an NFS server can lead to client side
> corruption. the ZFS pool integrity itself is not compromised by this
> tuning."

It's true.  In fact, I wrote a significant chunk of that guide.  ;-)  But
I'm always a fan of understanding the limits of the system, and choosing
your configurations and settings based on actual limitations and knowledge,
rather than following a rule of thumb.  A rule of thumb is always the "safe"
option you tell people who don't want to consider any alternative but the
conservative "Nobody ever got fired for choosing ______"

It goes like this:  At all times, async writes are being buffered in ram, no
matter what.  It is only the sync writes that hit the ZIL.  So in the event
of a system ungraceful crash, while writes are in progress, no matter what
you are going to lose your async writes.  Take it for granted, if you're in
the middle of writes when the server crashes, you're going to end up with
some data loss...  (Unless you make 100% of your writes sync, which is not
typical for a NFS server.)  

No matter what, you will need to remount your clients in order to expect
they're in a consistent state with the server.  And no matter what, your
clients will be consistent after remounting.  

If the data on your NFS server represents peoples' work...  In every place
I've worked, I feel confident saying that users can accept something like a
15 second loss of their work in the event of a dramatic server crash.
Because the server crashes so rarely.  Also acknowledged, something like 15
sec of async data will be lost unconditionally...  The only protection you
can possibly offer is something like 15 sec of sync writes too.

In the above scenario, if you were willing to accept 15 sec of async data
loss, and you don't even know which applications are performing async vs
sync operations...  Then you're also willing to accept the 15 sec of sync
mode data loss.  You can gain performance year round by disabling the ZIL.

So why does the caution comment exist?  Because inevitably you're going to
have somebody using something like a database running over a NFS link.  Or
let's imagine a mail server spooling to the NFS server.  Somebody is
processing credit card transactions.  You don't want to acknowledge the
transaction has been finalized, and 15 seconds later forget about it.
Anything where the NFS server is being used as the backend data store for
some transactional service which cannot be interrupted and relies on sync
mode writes for consistency.  Any application which cannot itself survive an
ungraceful crash.  Then you need to honor the sync.  These are not the types
of data that are typically stored on NFS servers.  But could be, and surely
is somewhere.  So the cautionary comment remains intact...

So you always have to weigh your own options and make your own decisions...
On the one hand, increase the risk of data loss from 15sec async to 15sec
async&sync...  Weigh that risk against the potential gain on the other
hand...  Something like twice the performance 364.5 days a year, depending
on how you measure your performance...  Apply your own personal beliefs in
probabilities and expectations and value of your work load.  Everyone has
their own opinions and philosophies.  There is no single right answer.

Ultimately, the safer option to tell people who are uncomfortable making
that judgement is...  Don't disable the ZIL.  But I personally do almost
systematically, disable the ZIL.  Because I like having optimum performance
and minimal cost, and in essentially all situations that I encounter, the
performance benefit outweighs the actual delta of risk.

_______________________________________________
Tech mailing list
Tech@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to