On Nov 20, 2009, at 11:27 AM, Adam Serediuk wrote:
I have several X4540 Thor systems with one large zpool that
replicate data to a backup host via zfs send/recv. The process works
quite well when there is little to no usage on the source systems.
However when the source systems are under usage replication slows
down to a near crawl. Without load replication streams along usually
near 1 Gbps but drops down to anywhere between 0 - 5000 Kbps while
under load.
This makes it difficult to keep snapshot replication working
effectively. It seems that the zfs_send operation is low priority
only occurring after I/O operations have been completed.
Is there a way that I can increase the send priority to increase
replication speed?
No, unless you compile the code yourself.
Both the source and destination system are configured in one large
zpool comprised of 8 raidz sets. While under load the source system
does ~ 500 - 950 iops/s (from zpool iostat) with no apparent hot
spots. It seems to me that the system should be able to perform much
faster. Unfortunately the data on these systems is in the form of
hundreds of millions (maybe even into the billion mark now) of very
small files, could this be a factor even with the block level
replication occurring?
The process is currently:
zfs_send -> mbuffer -> LAN -> mbuffer -> zfs_recv
I've done some work on such things. The difficulty in design is
figuring
out how often to do the send. You will want to balance your send time
interval with the write rate such that the send data is likely to be
in the ARC.
There is no magic formula, but empirically you can discover a reasonable
interval.
There is a lurking RFE here somewhere: it would be nice to automatically
snapshot when some threshold of writes has occurred.
P.S. if you have atime enabled, which is the default, handling
billions of
files will be quite a challenge.
-- richard
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss