Kristof, > Jim Yes, in step 5 commands were executed on both nodes. > > We did some more tests with opensolaris 2008.11. (build 101b) > > We managed to get AVS setup up and running, but we noticed that > performance was really bad. > > When we configured a zfs volume for replication, we noticed that > write performance went down from 50 MB/s to 5 MB/sec.
SNDR replication has three modes of operation, and I/O performance varies quite differently for each one. They are: 1). Logging mode - As primary volume write I/Os occur, the bitmap volume is used to scoreboard unreplicated write I/Os, at which time the write I/O completes. 2). Resynchronization mode - A resynchronization thread traverses the scoreboard, in block order, replicating write I/Os for each bit set. Concurrently, as primary volume write I/Os occur, the bitmap volume is used to scoreboard unreplicated write I/Os. For write I/Os that occur (block order wise) after the resynchronization point, the write I/O completes. For writes I/Os the occur before the resynchronization point, they must be synchronously replicated in place. At the start of resynchronization, almost all write I/Os complete quickly, as they occur after the resynchronization point. As resynchronization nears completion, almost all write I/Os complete slowly, as they occur before the resynchronization point. When the resynchronization point reaches the end of the scoreboard, the SNDR primary and secondary volumes are now 100% identical, write-order consistent, and asynchronous replication begins. 3). Replication mode - Primary volume write I/Os are queue up to SNDR's memory queue (or optionally configured disk queue), and scoreboarded for replication, at which time the write I/O completes. In the back ground, multiple asynchronous flusher threads, dequeue unreplicated I/Os from SNDR's memory or disk queue On configurations with ample system resources, write performance for both logging mode and replication mode should be nearly identical. The duration that a replica is in resynchronization mode is influence by the amount of write I/Os that occurred while the replica was in logging mode, the amount of primary volume write I/Os while resynchronization is also active, the network bandwidth and latency between primary and secondary nodes, and the I/O performance of the remote node's secondary volume. First time synchronization, done after the SNDR enable "sndradm - e ..." is identical to resynchronization, except the bitmap volume is intentionally set to ALL ones, forcing every block to be replicated from primary to secondary. Now if one configured replication before the initial "zpool create" , the SNDR primary and secondary volumes both contain uninitialized data, and thus can be considered equal, therefore no synchronization is needed. This is accomplished be using the "sndradm -E ..." option, setting the bitmap volume to ALL zeros. This means that the switch from logging mode, to replication mode is nearly instant. If one has a ZFS storage pool, plus available storage that can be provisioned as zpool replacement volumes, these replacement volumes can be "sndradm -E ..", enabled first. Now when the "zpool replace ..." command is invoked, the write I/Os caused by ZFS to populate the replacement volume, will are cause SNDR to replicate only those write I/Os. This operation is done under SNDR's replication mode, not synchronization mode, and is also an ZFS background operations. Once the zpool replace is complete, the previously used storage can be reclaimed. > A few notes about our test setup: > > * Since replication is configured in logging mode, there is zero > network traffic > * Since rdc_bitmap_mode has been configured for memory, and even > more, since the bitmap device is a ramdisk. Any data IO on the > replicated volume, results only in a single memory bit flip (per 32k > disk space) > * This setup is the bare minimum in the sense that the kernel driver > only hooks disk writes, and flips a bit in memory, it cannot go any > faster! Was the follow 'test' run during resynchronization mode or replication mode? > The Test > > * All tests were performed using the following command line > # dd if=/dev/zero of=/dev/zvol/rdsk/gold/xxVolNamexx oflag=dsync > bs=256M count=10 > > * Option 'dsync' is chosen to try avoiding zfs's aggressive caching. > Moreover however, usually a couple of runs were launched initially > to fill the instant zfs cache and to force real writing to disk > * Option 'bs=256M' was used in order to avoid the overhead of > copying multiple small blocks to kernel memory before disk writes. A > larger bs size ensures max throughput. Smaller values were used > without much difference though > -- > This message posted from opensolaris.org > _______________________________________________ > storage-discuss mailing list > storage-disc...@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/storage-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss