You have SSD's for the ZIL (logzilla) enabled, and ZIL IO is what is hurting your performance...Hmmm....
I'll ask the stupid question (just to get it out of the way) - is it possible that the logzilla is undersized? Did you gather data using Richard Elling's zilstat (included below)? Thanks, /jim #! /usr/bin/ksh -p # CDDL HEADER START # # The contents of this file are subject to the terms of the # Common Development and Distribution License (the "License"). # You may not use this file except in compliance with the License. # # You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE # or http://www.opensolaris.org/os/licensing. # See the License for the specific language governing permissions # and limitations under the License. # # When distributing Covered Code, include this CDDL HEADER in each # file and include the License file at usr/src/OPENSOLARIS.LICENSE. # If applicable, add the following below this CDDL HEADER, with the # fields enclosed by brackets "[]" replaced with your own identifying # information: Portions Copyright [yyyy] [name of copyright owner] # # CDDL HEADER END # Portions Copyright 2009 Sun Microsystems, Inc. # #File: zilstat.d #Author: richard.ell...@sun.com # # This dtrace program will help identify the ZIL activity by sampling # writes sent to the ZIL. # output: # [TIME] # BYTES - total bytes written to ZIL over the interval # BYTES/S - bytes/s written to ZIL over ther interval # MAX-BYTES/S - maximum rate during any 1-second sample # ############################## # --- Process Arguments --- # # TODO: clean up args ### default variables opt_pool=0 opt_time=0 filter=0 pool= lines=-1 interval=1 count=-1 ### process options while getopts hl:p:t name do case $name in l) lines=$OPTARG ;; p) opt_pool=1; pool=$OPTARG ;; t) opt_time=1 ;; h|?) ME=$(basename $0) cat <<-END >&2 USAGE: $ME [t][-l linecount] [-p poolname] [interval [count]] -t # print timestamp -l linecount # print header every linecount lines (default=only once) -d poolname # only look at poolname -l number # print header every number lines examples: $ME # default output, 1 second samples $ME 10 # 10 second samples $ME 10 6 # print 6 x 10 second samples $ME -p rpool # show ZIL stats for rpool only output: [TIME] BYTES - total bytes written to ZIL over the interval BYTES/S - bytes/s written to ZIL over ther interval MAX-BYTES/S - maximum rate during any 1-second sample END exit 1 esac done shift $(( $OPTIND - 1 )) ### option logic if [[ "$1" > 0 ]]; then interval=$1; shift fi if [[ "$1" > 0 ]]; then count=$1; shift fi if (( opt_pool )); then filter=1 fi ############################## # --- Main Program, DTrace --- /usr/sbin/dtrace -n ' #pragma D option quiet inline int OPT_time = '$opt_time'; inline int OPT_pool = '$opt_pool'; inline int INTERVAL = '$interval'; inline int LINES = '$lines'; inline int COUNTER = '$count'; inline int FILTER = '$filter'; inline string POOL = "'$pool'"; dtrace:::BEGIN { /* starting values */ counts = COUNTER; secs = INTERVAL; line = 0; last_event[""] = 0; nused=0; max_per_sec=0; nused_per_sec=0; } /* * collect info when zil_lwb_write_start fires */ fbt::zil_lwb_write_start:entry /OPT_pool == 0 || POOL == args[0]->zl_dmu_pool->dp_spa->spa_name/ { nused += args[1]->lwb_nused; nused_per_sec += args[1]->lwb_nused; } /* * Timer */ profile:::tick-1sec { secs--; nused_per_sec > max_per_sec ? max_per_sec = nused_per_sec : 1; nused_per_sec = 0; } /* * Print header */ profile:::tick-1sec /line == 0 / { /* print optional headers */ OPT_time ? printf("%-20s ", "TIME") : 1; /* print header */ printf("%10s %10s %10s\n", "BYTES", "BYTES/S", "MAX-BYTES/S"); line = LINES; } /* * Print Output */ profile:::tick-1sec /secs == 0/ { OPT_time ? printf("%-20Y ", walltimestamp) : 1; printf("%10d %10d %10d\n", nused, nused/INTERVAL, max_per_sec); nused = 0; nused_per_sec = 0; max_per_sec = 0; secs = INTERVAL; counts--; line--; } /* * End of program */ profile:::tick-1sec /counts == 0/ { exit(0); } ' Greg Mason wrote: >> If there was a latency issue, we would see such a problem with our >> existing file server as well, which we do not. We'd also have much >> greater problems than just file server performance. >> >> So, like I've said, we've ruled out the network as an issue. > > I should also add that I've tested these Thors with the ZIL disabled, > and they scream! With the cache flush disabled, they also do quite well. > > The specific issue i'm trying to solve is the ZIL being slow when > using NFS. > > I really don't want to have to do something drastic like disabling the > ZIL to get the performance I need... _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss