[zfs-discuss] Replacing root pool disk

2012-04-12 Thread Peter Wood
Hi,

I was following the instructions in ZFS Troubleshooting Guide on how to
replace a disk in the root pool on x86 system. I'm using OpenIndiana, ZFS
pool v.28 with mirrored system rpool. The replacement disk is brand new.

root:~# zpool status
  pool: rpool
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
  scan: resilvered 17.6M in 0h0m with 0 errors on Wed Apr 11 17:45:16 2012
config:

NAME STATE READ WRITE CKSUM
rpoolDEGRADED 0 0 0
  mirror-0   DEGRADED 0 0 0
c2t5000CCA369C55DB8d0s0  OFFLINE  0   126 0
c2t5000CCA369D5231Cd0s0  ONLINE   0 0 0

errors: No known data errors
root:~#

I'm not very familiar with Solaris partitions and slices so somewhere in
the format/partition commands I must to have made a mistake because when I
try to replace the disk I'm getting the following error:

root:~# zpool replace rpool c2t5000CCA369C55DB8d0s0 c2t5000CCA369C89636d0s0
invalid vdev specification
use '-f' to override the following errors:
/dev/dsk/c2t5000CCA369C89636d0s0 overlaps with
/dev/dsk/c2t5000CCA369C89636d0s2
root:~#

I used -f and it worked but I was wondering is there a way to completely
"reset" the new disk? Remove all partitions and start from scratch.

Thank you
Peter
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Replacing root pool disk

2012-04-12 Thread Peter Wood
Thank you all for the replies. I'll try the suggested solutions.

--
Peter

On Thu, Apr 12, 2012 at 2:06 PM, Roberto Waltman  wrote:

> Cindy Swearingen wrote:
>
>>
>> We don't yet have an easy way to clear a disk label, ...
>>
>
> dd if=/dev/zero of=...  on the 1st and last 10% (roughly) of the disk has
> worked fine for me.
>
> --
> Roberto Waltman
>
> __**_
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/**mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Heavy write IO for no apparent reason

2013-01-17 Thread Peter Wood
I have a script that rotates hourly, daily and monthly snapshots. Each
filesystem has about 40 snapshots (zfsList.png - output of 'zfs list | grep
-v home/' -  the home directories datasets are snipped from the output. 4
users in total.)

I noticed that the hourly snapshots on the heaviest filesystem in use are
about 1.2GB in size where on the other system the regular NFS exported
filesystem has about 60MB snapshots (gallerySnapshots.png - output of
command 'zfs list -t snapshot -r pool01/utils/gallery')

I know that the gallery FS is in heavier use then normal but I was told it
will be mostly reading and based on the iostat seems that there is heavy
writing too.

I guess I'll schedule some downtime and disable gallery export and see if
that will effect the number of write operations and performance in general.

Unless there is some other way to test what/where these write operations
are applied.

The 'zpool iostat -v' output is uncomfortably static. The values of
read/write operations and bandwidth are the same for hours and even days.
I'd expect at least some variations between morning and night. The load on
the servers is different for sure. Any input?

Thanks,

-- Peter


On Wed, Jan 16, 2013 at 7:49 PM, Bob Friesenhahn <
bfrie...@simple.dallas.tx.us> wrote:

> On Wed, 16 Jan 2013, Peter Wood wrote:
>
>>
>> Running zpool iostat -v (attachment zpool-IOStat.png) shows 1,22K write
>> operations on the drives and 661 on the
>> ZIL. Compare to the other server (who is in way heavier use then this
>> one) these numbers are extremely high.
>>
>> Any idea how to debug any further?
>>
>
> Do some filesystems contain many snapshots?  Do some filesystems use small
> zfs block sizes.  Have the servers been used the same?
>
> Bob
> --
> Bob Friesenhahn
> bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/**
> users/bfriesen/ <http://www.simplesystems.org/users/bfriesen/>
> GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Heavy write IO for no apparent reason

2013-01-17 Thread Peter Wood
Right on Tim. Thanks. I didn't know that. I'm sure it's documented
somewhere and I should have read it so double thanks for explaining it.


On Thu, Jan 17, 2013 at 4:18 PM, Timothy Coalson  wrote:

> On Thu, Jan 17, 2013 at 5:33 PM, Peter Wood wrote:
>
>>
>> The 'zpool iostat -v' output is uncomfortably static. The values of
>> read/write operations and bandwidth are the same for hours and even days.
>> I'd expect at least some variations between morning and night. The load on
>> the servers is different for sure. Any input?
>>
>>
> Without a repetition time parameter, zpool iostat will print exactly once
> and exit, and the output is an average from kernel boot to "now", just like
> iostat, this is why it seems so static.  If you want to know the activity
> over 5 second intervals, use something like "zpool iostat -v 5" (repeat
> every 5 seconds) and wait for the second and later blocks.  The second and
> later blocks are average from previous output until "now".  I generally use
> 5 second intervals to match the 5 second commit interval on my pools.
>
> Tim
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Heavy write IO for no apparent reason

2013-01-17 Thread Peter Wood
Great points Jim. I have requested more information how the gallery share
is being used and any temporary data will be moved out of there.

About atime, it is set to "on" right now and I've considered to turn it off
but I wasn't sure if this will effect incremental zfs send/receive.

'zfs send -i snapshot0 snapshot1' doesn't rely on the atime, right?


On Thu, Jan 17, 2013 at 4:34 PM, Jim Klimov  wrote:

> On 2013-01-18 00:42, Bob Friesenhahn wrote:
>
>> You can install Brendan Gregg's DTraceToolkit and use it to find out who
>> and what is doing all the writing.  1.2GB in an hour is quite a lot of
>> writing.  If this is going continuously, then it may be causing more
>> fragmentation in conjunction with your snapshots.
>>
>
> As a moderately wild guess, since you're speaking of galleries,
> are these problematic filesystems often-read? By default ZFS
> updates the last access-time of files it reads, as do many other
> filesystems, and this causes avalanches of metadata updates -
> sync writes (likely) as well as fragmentation. This may also
> be a poorly traceable but considerable "used" space in frequent
> snapshots. You can verify (and unset) this behaviour with the
> ZFS FS dataset property "atime", i.e.:
>
> # zfs get atime pond/export/home
> NAME  PROPERTY  VALUE  SOURCE
> pond/export/home  atime offinherited from pond
>
> On another hand, verify where your software keeps the temporary
> files (i.e. during uploads as may be with galleries). Again, if
> this is a frequently snapshotted dataset (though 1 hour is not
> really that frequent) then needless temp files can be held by
> those older snapshots. Moving such temporary works to a different
> dataset with a different snapshot schedule and/or to a different
> pool (to keep related fragmentation constrained) may prove useful.
>
> HTH,
> //Jim Klimov
>
>
> __**_
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/**mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Is there performance penalty when adding vdev to existing pool

2013-02-20 Thread Peter Wood
I'm using OpenIndiana 151a7, zpool v28, zfs v5.

When I bought my storage servers I intentionally left hdd slots available
so I can add another vdev when needed and delay immediate expenses.

After reading some posts on the mailing list I'm getting concerned about
degrading performance due to unequal distribution of data among the vdevs.
I still have a chance to migrate the data away, add all drives and rebuild
the pools and start fresh.

Before going that road I was hoping to hear your opinion on what will be
the best way to handle this.

System: Supermicro with 36 hdd bays. 28 bays filled with 3TB SAS 7.2K
enterprise drives. 8 bays available to add another vdev to the pool.

Pool configuration:
# zpool status pool01
  pool: pool01
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Wed Nov 21 17:41:52 2012
config:

NAME   STATE READ WRITE CKSUM
pool01 ONLINE   0 0 0
  raidz2-0 ONLINE   0 0 0
c8t5000CCA01AA8E3C0d0  ONLINE   0 0 0
c8t5000CCA01AA8E3F0d0  ONLINE   0 0 0
c8t5000CCA01AA8E394d0  ONLINE   0 0 0
c8t5000CCA01AA8E434d0  ONLINE   0 0 0
c8t5000CCA01AA793A0d0  ONLINE   0 0 0
c8t5000CCA01AA79380d0  ONLINE   0 0 0
c8t5000CCA01AA79398d0  ONLINE   0 0 0
c8t5000CCA01AB56B10d0  ONLINE   0 0 0
  raidz2-1 ONLINE   0 0 0
c8t5000CCA01AB56B28d0  ONLINE   0 0 0
c8t5000CCA01AB56B64d0  ONLINE   0 0 0
c8t5000CCA01AB56B80d0  ONLINE   0 0 0
c8t5000CCA01AB56BB0d0  ONLINE   0 0 0
c8t5000CCA01AB56EA4d0  ONLINE   0 0 0
c8t5000CCA01ABDAEBCd0  ONLINE   0 0 0
c8t5000CCA01ABDAED0d0  ONLINE   0 0 0
c8t5000CCA01ABDAF1Cd0  ONLINE   0 0 0
  raidz2-2 ONLINE   0 0 0
c8t5000CCA01ABDAF7Cd0  ONLINE   0 0 0
c8t5000CCA01ABDAF10d0  ONLINE   0 0 0
c8t5000CCA01ABDAF40d0  ONLINE   0 0 0
c8t5000CCA01ABDAF60d0  ONLINE   0 0 0
c8t5000CCA01ABDAF74d0  ONLINE   0 0 0
c8t5000CCA01ABDAF80d0  ONLINE   0 0 0
c8t5000CCA01ABDB04Cd0  ONLINE   0 0 0
c8t5000CCA01ABDB09Cd0  ONLINE   0 0 0
logs
  mirror-3 ONLINE   0 0 0
c6t0d0 ONLINE   0 0 0
c6t1d0 ONLINE   0 0 0
cache
  c6t2d0   ONLINE   0 0 0
  c6t3d0   ONLINE   0 0 0
spares
  c8t5000CCA01ABDB020d0AVAIL
  c8t5000CCA01ABDB060d0AVAIL

errors: No known data errors
#

Will adding another vdev hurt the performance?

Thank you,

-- Peter
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is there performance penalty when adding vdev to existing pool

2013-02-20 Thread Peter Wood
Currently the pool is about 20% full:
# zpool list pool01
NAME SIZE  ALLOC   FREE  EXPANDSZCAP  DEDUP  HEALTH  ALTROOT
pool01  65.2T  15.4T  49.9T -23%  1.00x  ONLINE  -
#

The old data and new data will be equally use after adding the vdev.

The FS hold tens of thousands of small images (~500KB) that are read, write
and new one added depending on what customers are doing. It's pretty heavy
on the file system. About 800 IOPS going up to 1500 IOPS at times.

Performance is important.



On Wed, Feb 20, 2013 at 3:48 PM, Tim Cook  wrote:

>
>
>
> On Wed, Feb 20, 2013 at 5:46 PM, Bob Friesenhahn <
> bfrie...@simple.dallas.tx.us> wrote:
>
>> On Thu, 21 Feb 2013, Sašo Kiselkov wrote:
>>
>>  On 02/21/2013 12:27 AM, Peter Wood wrote:
>>>
>>>> Will adding another vdev hurt the performance?
>>>>
>>>
>>> In general, the answer is: no. ZFS will try to balance writes to
>>> top-level vdevs in a fashion that assures even data distribution. If
>>> your data is equally likely to be hit in all places, then you will not
>>> incur any performance penalties. If, OTOH, newer data is more likely to
>>> be hit than old data
>>> , then yes, newer data will be served from fewer spindles. In that case
>>> it is possible to do a send/receive of the affected datasets into new
>>> locations and then renaming them.
>>>
>>
>> You have this reversed.  The older data is served from fewer spindles
>> than data written after the new vdev is added. Performance with the newer
>> data should be improved.
>>
>> Bob
>>
>
>
> That depends entirely on how full the pool is when the new vdev is added,
> and how frequently the older data changes, snapshots, etc.
>
> --Tim
>
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Peter Wood
I have two identical Supermicro boxes with 32GB ram. Hardware details at
the end of the message.

They were running OI 151.a.5 for months. The zpool configuration was one
storage zpool with 3 vdevs of 8 disks in RAIDZ2.

The OI installation is absolutely clean. Just next-next-next until done.
All I do is configure the network after install. I don't install or enable
any other services.

Then I added more disks and rebuild the systems with OI 151.a.7 and this
time configured the zpool with 6 vdevs of 5 disks in RAIDZ.

The systems started crashing really bad. They just disappear from the
network, black and unresponsive console, no error lights but no activity
indication either. The only way out is to power cycle the system.

There is no pattern in the crashes. It may crash in 2 days in may crash in
2 hours.

I upgraded the memory on both systems to 128GB at no avail. This is the max
memory they can take.

In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool.

Any idea what could be the problem.

Thank you

-- Peter

Supermicro X9DRH-iF
Xeon E5-2620 @ 2.0 GHz 6-Core
LSI SAS9211-8i HBA
32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Peter Wood
I'm sorry. I should have mentioned it that I can't find any errors in the
logs. The last entry in /var/adm/messages is that I removed the keyboard
after the last reboot and then it shows the new boot up messages when I
boot up the system after the crash. The BIOS log is empty. I'm not sure how
to check the IPMI but IPMI is not configured and I'm not using it.

Just another observation - the crashes are more intense the more data the
system serves (NFS).

I'm looking into FRMW upgrades for the LSI now.


On Wed, Mar 20, 2013 at 8:40 AM, Will Murnane wrote:

> Does the Supermicro IPMI show anything when it crashes?  Does anything
> show up in event logs in the BIOS, or in system logs under OI?
>
>
> On Wed, Mar 20, 2013 at 11:34 AM, Peter Wood wrote:
>
>> I have two identical Supermicro boxes with 32GB ram. Hardware details at
>> the end of the message.
>>
>> They were running OI 151.a.5 for months. The zpool configuration was one
>> storage zpool with 3 vdevs of 8 disks in RAIDZ2.
>>
>> The OI installation is absolutely clean. Just next-next-next until done.
>> All I do is configure the network after install. I don't install or enable
>> any other services.
>>
>> Then I added more disks and rebuild the systems with OI 151.a.7 and this
>> time configured the zpool with 6 vdevs of 5 disks in RAIDZ.
>>
>> The systems started crashing really bad. They just disappear from the
>> network, black and unresponsive console, no error lights but no activity
>> indication either. The only way out is to power cycle the system.
>>
>> There is no pattern in the crashes. It may crash in 2 days in may crash
>> in 2 hours.
>>
>> I upgraded the memory on both systems to 128GB at no avail. This is the
>> max memory they can take.
>>
>> In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool.
>>
>> Any idea what could be the problem.
>>
>> Thank you
>>
>> -- Peter
>>
>> Supermicro X9DRH-iF
>> Xeon E5-2620 @ 2.0 GHz 6-Core
>> LSI SAS9211-8i HBA
>> 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K
>>
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Peter Wood
I'm going to need some help with the crash dumps. I'm not very familiar
with Solaris.

Do I have to enable something to get the crash dumps? Where should I look
for them?

Thanks for the help.


On Wed, Mar 20, 2013 at 8:53 AM, Michael Schuster  wrote:

> How about crash dumps?
>
> michael
>
>
> On Wed, Mar 20, 2013 at 4:50 PM, Peter Wood wrote:
>
>> I'm sorry. I should have mentioned it that I can't find any errors in the
>> logs. The last entry in /var/adm/messages is that I removed the keyboard
>> after the last reboot and then it shows the new boot up messages when I
>> boot up the system after the crash. The BIOS log is empty. I'm not sure how
>> to check the IPMI but IPMI is not configured and I'm not using it.
>>
>> Just another observation - the crashes are more intense the more data the
>> system serves (NFS).
>>
>> I'm looking into FRMW upgrades for the LSI now.
>>
>>
>> On Wed, Mar 20, 2013 at 8:40 AM, Will Murnane wrote:
>>
>>> Does the Supermicro IPMI show anything when it crashes?  Does anything
>>> show up in event logs in the BIOS, or in system logs under OI?
>>>
>>>
>>> On Wed, Mar 20, 2013 at 11:34 AM, Peter Wood wrote:
>>>
>>>> I have two identical Supermicro boxes with 32GB ram. Hardware details
>>>> at the end of the message.
>>>>
>>>> They were running OI 151.a.5 for months. The zpool configuration was
>>>> one storage zpool with 3 vdevs of 8 disks in RAIDZ2.
>>>>
>>>> The OI installation is absolutely clean. Just next-next-next until
>>>> done. All I do is configure the network after install. I don't install or
>>>> enable any other services.
>>>>
>>>> Then I added more disks and rebuild the systems with OI 151.a.7 and
>>>> this time configured the zpool with 6 vdevs of 5 disks in RAIDZ.
>>>>
>>>> The systems started crashing really bad. They just disappear from the
>>>> network, black and unresponsive console, no error lights but no activity
>>>> indication either. The only way out is to power cycle the system.
>>>>
>>>> There is no pattern in the crashes. It may crash in 2 days in may crash
>>>> in 2 hours.
>>>>
>>>> I upgraded the memory on both systems to 128GB at no avail. This is the
>>>> max memory they can take.
>>>>
>>>> In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool.
>>>>
>>>> Any idea what could be the problem.
>>>>
>>>> Thank you
>>>>
>>>> -- Peter
>>>>
>>>> Supermicro X9DRH-iF
>>>> Xeon E5-2620 @ 2.0 GHz 6-Core
>>>> LSI SAS9211-8i HBA
>>>> 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K
>>>>
>>>> ___
>>>> zfs-discuss mailing list
>>>> zfs-discuss@opensolaris.org
>>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>>>
>>>>
>>>
>>
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>>
>
>
> --
> Michael Schuster
> http://recursiveramblings.wordpress.com/
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Peter Wood
Hi Jim,

Thanks for the pointers. I'll definitely look into this.


--
Peter Blajev
IT Manager, TAAZ Inc.
Office: 858-597-0512 x125


On Wed, Mar 20, 2013 at 11:29 AM, Jim Klimov  wrote:

> On 2013-03-20 17:15, Peter Wood wrote:
>
>> I'm going to need some help with the crash dumps. I'm not very familiar
>> with Solaris.
>>
>> Do I have to enable something to get the crash dumps? Where should I
>> look for them?
>>
>
> Typically the kernel crash dumps are created as a result of kernel
> panic; also they may be forced by administrative actions like NMI.
> They require you to configure a dump volume of sufficient size (see
> dumpadm) and a /var/crash which may be a dataset on a large enough
> pool - after the reboot the dump data will be migrated there.
>
> To "help" with the hangs you can try the BIOS watchdog (which would
> require a bmc driver, one which is known from OpenSolaris is alas
> not opensourced and not redistributable), or with a software deadman
> timer:
>
> http://www.cuddletech.com/**blog/pivot/entry.php?id=1044<http://www.cuddletech.com/blog/pivot/entry.php?id=1044>
>
> http://wiki.illumos.org/**display/illumos/System+Hangs<http://wiki.illumos.org/display/illumos/System+Hangs>
>
> Also, if you configure "crash dump on NMI" and set up your IPMI card,
> then you can likely gain remote access to both the server console
> ("physical" and/or serial) and may be able to trigger the NMI, too.
>
> HTH,
> //Jim
>
>
>> Thanks for the help.
>>
>>
>> On Wed, Mar 20, 2013 at 8:53 AM, Michael Schuster
>> > <mailto:michaelsprivate@gmail.**com>>
>> wrote:
>>
>> How about crash dumps?
>>
>> michael
>>
>>
>> On Wed, Mar 20, 2013 at 4:50 PM, Peter Wood > <mailto:peterwood...@gmail.com**>> wrote:
>>
>> I'm sorry. I should have mentioned it that I can't find any
>> errors in the logs. The last entry in /var/adm/messages is that
>> I removed the keyboard after the last reboot and then it shows
>> the new boot up messages when I boot up the system after the
>> crash. The BIOS log is empty. I'm not sure how to check the IPMI
>> but IPMI is not configured and I'm not using it.
>>
>> Just another observation - the crashes are more intense the more
>> data the system serves (NFS).
>>
>> I'm looking into FRMW upgrades for the LSI now.
>>
>>
>> On Wed, Mar 20, 2013 at 8:40 AM, Will Murnane
>> mailto:will.murn...@gmail.com**>> wrote:
>>
>> Does the Supermicro IPMI show anything when it crashes?
>>   Does anything show up in event logs in the BIOS, or in
>> system logs under OI?
>>
>>
>> On Wed, Mar 20, 2013 at 11:34 AM, Peter Wood
>> mailto:peterwood...@gmail.com**>>
>> wrote:
>>
>> I have two identical Supermicro boxes with 32GB ram.
>> Hardware details at the end of the message.
>>
>> They were running OI 151.a.5 for months. The zpool
>> configuration was one storage zpool with 3 vdevs of 8
>> disks in RAIDZ2.
>>
>> The OI installation is absolutely clean. Just
>> next-next-next until done. All I do is configure the
>> network after install. I don't install or enable any
>> other services.
>>
>> Then I added more disks and rebuild the systems with OI
>> 151.a.7 and this time configured the zpool with 6 vdevs
>> of 5 disks in RAIDZ.
>>
>> The systems started crashing really bad. They
>> just disappear from the network, black and unresponsive
>> console, no error lights but no activity indication
>> either. The only way out is to power cycle the system.
>>
>> There is no pattern in the crashes. It may crash in 2
>> days in may crash in 2 hours.
>>
>> I upgraded the memory on both systems to 128GB at no
>> avail. This is the max memory they can take.
>>
>> In summary all I did is upgrade to OI 151.a.7 and
>> reconfigured zpool.
>>
>> Any idea what could be the problem.
>>
>> Thank you
>>
>>

Re: [zfs-discuss] [BULK] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Peter Wood
No problem Trey. Anything will help.

Yes, I did a clean install overwriting the old OS.



>  Just to make sure, you actually did an overwrite reinstall with OI151a7
> rather than upgrading the existing OS images?   If you did a pkg
> image-update, you should be able to boot back into the oi151a5 image from
> grub.  Apologies in advance if I'm stating the obvious.
>
>  -- Trey
>
>
> On Mar 20, 2013, at 11:34 AM, "Peter Wood"  wrote:
>
>   I have two identical Supermicro boxes with 32GB ram. Hardware details
> at the end of the message.
>
>  They were running OI 151.a.5 for months. The zpool configuration was one
> storage zpool with 3 vdevs of 8 disks in RAIDZ2.
>
>  The OI installation is absolutely clean. Just next-next-next until done.
> All I do is configure the network after install. I don't install or enable
> any other services.
>
>  Then I added more disks and rebuild the systems with OI 151.a.7 and this
> time configured the zpool with 6 vdevs of 5 disks in RAIDZ.
>
>  The systems started crashing really bad. They just disappear from the
> network, black and unresponsive console, no error lights but no activity
> indication either. The only way out is to power cycle the system.
>
>  There is no pattern in the crashes. It may crash in 2 days in may crash
> in 2 hours.
>
>  I upgraded the memory on both systems to 128GB at no avail. This is the
> max memory they can take.
>
>  In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool.
>
>  Any idea what could be the problem.
>
>  Thank you
>
>  -- Peter
>
>  Supermicro X9DRH-iF
>  Xeon E5-2620 @ 2.0 GHz 6-Core
>  LSI SAS9211-8i HBA
>  32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K
>
>  ___
>
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Peter Wood
Great write up Jens.

The chance of two MB to be broken is probably low but overheating is a very
good point. It was on my to-do list to setup IPMI and seems that now is the
best time to do it.

Thanks

On Wed, Mar 20, 2013 at 1:08 PM, Jens Elkner wrote:

> On Wed, Mar 20, 2013 at 08:50:40AM -0700, Peter Wood wrote:
> >I'm sorry. I should have mentioned it that I can't find any errors in
> the
> >logs. The last entry in /var/adm/messages is that I removed the
> keyboard
> >after the last reboot and then it shows the new boot up messages when
> I boot
> >up the system after the crash. The BIOS log is empty. I'm not sure
> how to
> >check the IPMI but IPMI is not configured and I'm not using it.
>
> You definitely should! Plugin a cable into the dedicated network port
> and configure it (easiest way for you is probably to jump into the BIOS
> and assign the appropriate IP address etc.). Than, for a quick look,
> point your browser to the given IP port 80 (default login is
> ADMIN/ADMIN). Also you may now configure some other details
> (accounts/passwords/roles).
>
> To track the problem, either write a script, which polls the parameters
> in question periodically or just install the latest ipmiViewer and use
> this to monitor your sensors ad hoc.
> see ftp://ftp.supermicro.com/utility/IPMIView/
>
> >Just another observation - the crashes are more intense the more data
> the
> >system serves (NFS).
> >I'm looking into FRMW upgrades for the LSI now.
>
> Latest LSI FW should be P15, for this MB type 217 (2.17), MB-BIOS C28
> (1.0b).
> However, I doubt, that your problem has anything to do with the
> SAS-ctrl or OI or ZFS.
>
> My guess is, that either your MB is broken (we had an X9DRH-iF, which
> instantly "disappeared" as soon as it got some real load) or you have
> a heat problem (watch you cpu temp e.g. via ipmiviewer). With 2GHz
> that's not very likely, but worth a try (socket placement on this board
> is not really smart IMHO).
>
> To test quickly
> - disable all addtional, unneeded service in OI, which may put some
>   load on the machine (like NFS service, http and bla) and perhaps
>   even export unneeded pools (just to be sure)
> - fire up your ipmiviewer and look at the sensors (set update to
>   10s) or refresh manually often
> - start 'openssl speed -multi 32' and keep watching your cpu temp
>   sensors (with 2GHz I guess it takes ~ 12min)
>
> I guess, your machine "disappears" before the CPUs getting really hot
> (broken MB). If CPUs switch off (usually first CPU2 and a little bit
> later CPU1) you have a cooling problem. If nothing happens, well, than
> it could be an OI or ZFS problem ;-)
>
> Have fun,
> jel.
> --
> Otto-von-Guericke University http://www.cs.uni-magdeburg.de/
> Department of Computer Science   Geb. 29 R 027, Universitaetsplatz 2
> 39106 Magdeburg, Germany Tel: +49 391 67 52768
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Peter Wood
I can reproduce the problem. I can crash the system.

Here are the steps I did (some steps may not be needed but I haven't tested
it):

- Clean install of OI 151.a.7 on Supermicro hardware described above (32GB
RAM though, not the 128GB)

- Create 1 zpool, 6 raidz vdevs with 5 drives each

- NFS export a dataset
  zfs set sharenfs="rw=@10.20.1/24" vol01/htmlspace

- Create zfs child dataset
  zfs create vol01/htmlspace/A

  $ zfs get -H sharenfs vol01/htmlspace/A
  vol01/htmlspace/A   sharenfsrw=@10.20.1/24  inherited from
vol01/htmlspace

- Stop NFS shearing for the child dataset

  zfs set sharenfs=off vol01/htmlspace/A

The crash is instant after the sharenfs=off command.

I thought it was coincident so after reboot I tried it on another dataset.
Instant crash again. I get my prompt back but that's it. The system is gone
after that.

The NFS exported file systems are not accessed by any system on the
network. They are not in use. That's why I wanted to stop exporting them.
And, even if they were in use this should now crash the system, right?

I can't try the other box because it is heavy in production. At least not
until later tonight.

I thought I'll collect some advice to make each crash as useful as possible.

Any pointers are appreciated.

Thanks,

-- Peter


On Wed, Mar 20, 2013 at 8:34 AM, Peter Wood  wrote:

> I have two identical Supermicro boxes with 32GB ram. Hardware details at
> the end of the message.
>
> They were running OI 151.a.5 for months. The zpool configuration was one
> storage zpool with 3 vdevs of 8 disks in RAIDZ2.
>
> The OI installation is absolutely clean. Just next-next-next until done.
> All I do is configure the network after install. I don't install or enable
> any other services.
>
> Then I added more disks and rebuild the systems with OI 151.a.7 and this
> time configured the zpool with 6 vdevs of 5 disks in RAIDZ.
>
> The systems started crashing really bad. They just disappear from the
> network, black and unresponsive console, no error lights but no activity
> indication either. The only way out is to power cycle the system.
>
> There is no pattern in the crashes. It may crash in 2 days in may crash in
> 2 hours.
>
> I upgraded the memory on both systems to 128GB at no avail. This is the
> max memory they can take.
>
> In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool.
>
> Any idea what could be the problem.
>
> Thank you
>
> -- Peter
>
> Supermicro X9DRH-iF
> Xeon E5-2620 @ 2.0 GHz 6-Core
> LSI SAS9211-8i HBA
> 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss