On Aug 5, 2010, at 2:24 PM, Roch Bourbonnais <roch.bourbonn...@sun.com> wrote:
> > Le 5 août 2010 à 19:49, Ross Walker a écrit : > >> On Aug 5, 2010, at 11:10 AM, Roch <roch.bourbonn...@sun.com> wrote: >> >>> >>> Ross Walker writes: >>>> On Aug 4, 2010, at 12:04 PM, Roch <roch.bourbonn...@sun.com> wrote: >>>> >>>>> >>>>> Ross Walker writes: >>>>>> On Aug 4, 2010, at 9:20 AM, Roch <roch.bourbonn...@sun.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> Ross Asks: >>>>>>> So on that note, ZFS should disable the disks' write cache, >>>>>>> not enable them despite ZFS's COW properties because it >>>>>>> should be resilient. >>>>>>> >>>>>>> No, because ZFS builds resiliency on top of unreliable parts. it's able >>>>>>> to deal >>>>>>> with contained failures (lost state) of the disk write cache. >>>>>>> >>>>>>> It can then export LUNS that have WC enabled or >>>>>>> disabled. But if we enable the WC on the exported LUNS, then >>>>>>> the consumer of these LUNS must be able to say the same. >>>>>>> The discussion at that level then needs to focus on failure groups. >>>>>>> >>>>>>> >>>>>>> Ross also Said : >>>>>>> I asked this question earlier, but got no answer: while an >>>>>>> iSCSI target is presented WCE does it respect the flush >>>>>>> command? >>>>>>> >>>>>>> Yes. I would like to say "obviously" but it's been anything >>>>>>> but. >>>>>> >>>>>> Sorry to probe further, but can you expand on but... >>>>>> >>>>>> Just if we had a bunch of zvols exported via iSCSI to another Solaris >>>>>> box which used them to form another zpool and had WCE turned on would >>>>>> it be reliable? >>>>>> >>>>> >>>>> Nope. That's because all the iSCSI are in the same fault >>>>> domain as they share a unified back-end cache. What works, >>>>> in principle, is mirroring SCSI channels hosted on >>>>> different storage controllers (or N SCSI channels on N >>>>> controller in a raid group). >>>>> >>>>> Which is why keeping the WC set to the default, is really >>>>> better in general. >>>> >>>> Well I was actually talking about two backend Solaris storage servers >>>> serving up storage over iSCSI to a front-end Solaris server serving ZFS >>>> over NFS, so I have redundancy there, but want the storage to be >>>> performant, so I want the iSCSI to have WCE, yet I want it to be reliable >>>> and have it honor cache flush requests from the front-end NFS server. >>>> >>>> Does that make sense? Is it possible? >>>> >>> >>> Well in response to a commit (say after a file creation) then the >>> front end server will end up sending flush write caches on >>> both side of the iscsi mirror which will reach the backend server >>> which will flush disk write caches. This will all work but >>> probably not unleash performance the way you would like it >>> to. >> >> >> >>> If you setup to have the backend server not honor the >>> backend disk flush write caches, then the 2 backend pools become at >>> risk of corruption, mostly because the ordering of IOs >>> around the ueberblock updates. If you have faith, then you >>> could consider that you won't hit 2 backend pool corruption >>> together and rely on the frontend resilvering to rebuild the >>> corrupted backend. >> >> So you are saying setting WCE disables cache flush on the target and setting >> WCD forces a flush for every WRITE? > > Nope. Setting WC either way has not implication on the response to a flush > request. We flush the cache in response to a request to do so, > unless one sets the unsupported zfs_nocacheflush, if set then the pool is at > risk >> >> How about a way to enable WCE on the target, yet still perform cache flush >> when the initiator requests one, like a real SCSI target should do, or is >> that just not possible with ZVOLs today? >> > I hope I've cleared that up. Not sure what I said that implicated otherwise. > > But if you honor the flush write cache request all the way to the disk > device, then 1, 2 or 3 layers of ZFS won't make a dent in the performance of > NFS tar x. > Only a device accepting low latency writes which survives power outtage can > do that. Understood and thanks for the clarification, if the NFS synchronicity has too much of a negative impact then that can be alleviated through an SSD or NVRAM slog device on the head server. -Ross _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss