Hi Ivan,

On Jan 26, 2012, at 8:25 PM, Ivan Rodriguez wrote:

> Dear fellows,
> 
> We have a backup server with a zpool size of 20 TB, we transfer
> information using zfs snapshots every day (we have around 300 fs on
> that pool),
> the storage is a dell md3000i connected by iscsi, the pool is
> currently version 10, the same storage is connected
> to another server with a smaller pool of 3 TB(zpool version 10) this
> server is working fine and speed is good between the storage
> and the server, however  in the server with 20 TB pool performance is
> an issue  after we restart the server
> performance is good but with the time lets say a week the performance
> keeps dropping until we have to
> bounce the server again (same behavior with new version of solaris in
> this case performance drops in 2 days), no errors in logs or storage
> or the zpool status -v
> 
> We suspect that the pool has some issues probably there is corruption
> somewhere, we tested solaris 10 8/11 with zpool 29,
> although we haven't update the pool itself, with the new solaris the
> performance is even worst and every time

If you upgrade to zpool version 29 or later, then you will be tied to the
lawnmower (Oracle) forever. Several changes related to snapshot performance
were introduced in version 28 and earlier.

> that we restart the server we get stuff like this:
> 
> SOURCE: zfs-diagnosis, REV: 1.0
> EVENT-ID: 0168621d-3f61-c1fc-bc73-c50efaa836f4
> DESC: All faults associated with an event id have been addressed.
> Refer to http://sun.com/msg/FMD-8000-4M for more information.
> AUTO-RESPONSE: Some system components offlined because of the
> original fault may have been brought back online.
> IMPACT: Performance degradation of the system due to the original
> fault may have been recovered.
> REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired components.
> [ID 377184 daemon.notice] SUNW-MSG-ID: FMD-8000-6U, TYPE: Resolved,
> VER: 1, SEVERITY: Minor
> 
> And we need to export and import the pool in order to be  able to  access it.

The MD3000i systems that I have used have an irritating behavior when the LUNs
are scanned (eg during zpool import). There is an out-of-band systems management
LUN that takes up to 1 minute to respond to a SCSI inquiry. During a zpool 
import, 
Solaris tries to inquire each of the LUNs to see if they contain pool parts. 
Depending
on the various timeout values set in the iSCSI client stack, this can be 
painful. I am not
aware of a workaround or bug fix on the Dell side and Dell docs just say "don't 
use that
LUN"

> 
> Now my question is do you guys know if we upgrade the pool does this
> process  fix some issues in the metadata of the pool ?
> We've been holding back the upgrade because we know that after the
> upgrade there is no way to return to version 10.

To remain more flexible, avoid zpool version 29 or later.

> 
> Does anybody has experienced corruption in the pool without a hardware
> failure ?

Yes, but I don't think that is your current problem.

> Is there any tools or procedures to find corruption on the pool or
> File systems inside the pool ? (besides scrub)

scrub is the method.

> 
> So far we went through the connections cables, ports and controllers
> between the storage and the server everything seems fine, we've
> swapped network interfaces, cables, switch ports etc etc.
> 
> 
> Any ideas would be really appreciate it.

HTH,
 -- richard

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422



_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to