Re: [zfs-discuss] reboot when copying large amounts of data

Blake Thu, 12 Mar 2009 12:23:47 -0700

I've managed to get the data transfer to work by rearranging my disks
so that all of them sit on the integrated SATA controller.


So, I feel pretty certain that this is either an issue with the
Supermicro aoc-sat2-mv8 card, or with PCI-X on the motherboard (though
I would think that the integrated SATA would also be using the PCI
bus?).

The motherboard, for those interested, is an HD8ME-2 (not, I now find
after buying this box from Silicon Mechanics, a board that's on the
Solaris HCL...)

<http://www.supermicro.com/Aplus/motherboard/Opteron2000/MCP55/h8dme-2.cfm>

So I'm not considering one of LSI's HBA's - what do list members think
about this device:

<http://www.provantage.com/lsi-logic-lsi00117~7LSIG03X.htm>



On Thu, Mar 12, 2009 at 2:18 AM, Nathan Kroenert
<nathan.kroen...@sun.com> wrote:
> definitely time to bust out some mdb -K or boot -k and see what it's moaning
> about.
>
> I did not see the screenshot earlier... sorry about that.
>
> Nathan.
>
> Blake wrote:
>>
>> I start the cp, and then, with prstat -a, watch the cpu load for the
>> cp process climb to 25% on a 4-core machine.
>>
>> Load, measured for example with 'uptime', climbs steadily until the
>> reboot.
>>
>> Note that the machine does not dump properly, panic or hang - rather,
>> it reboots.
>>
>> I attached a screenshot earlier in this thread of the little bit of
>> error message I could see on the console.  The machine is trying to
>> dump to the dump zvol, but fails to do so.  Only sometimes do I see an
>> error on the machine's local console - mos times, it simply reboots.
>>
>>
>>
>> On Thu, Mar 12, 2009 at 1:55 AM, Nathan Kroenert
>> <nathan.kroen...@sun.com> wrote:
>>>
>>> Hm -
>>>
>>> Crashes, or hangs? Moreover - how do you know a CPU is pegged?
>>>
>>> Seems like we could do a little more discovery on what the actual problem
>>> here is, as I can read it about 4 different ways.
>>>
>>> By this last piece of information, I'm guessing the system does not
>>> crash,
>>> but goes really really slow??
>>>
>>> Crash == panic == we see stack dump on console and try to take a dump
>>> hang == nothing works == no response -> might be worth looking at mdb -K
>>>       or booting with a -k on the boot line.
>>>
>>> So - are we crashing, hanging, or something different?
>>>
>>> It might simply be that you are eating up all your memory, and your
>>> physical
>>> backing storage is taking a while to catch up....?
>>>
>>> Nathan.
>>>
>>> Blake wrote:
>>>>
>>>> My dump device is already on a different controller - the motherboards
>>>> built-in nVidia SATA controller.
>>>>
>>>> The raidz2 vdev is the one I'm having trouble with (copying the same
>>>> files to the mirrored rpool on the nVidia controller work nicely).  I
>>>> do notice that, when using cp to copy the files to the raidz2 pool,
>>>> load on the machine climbs steadily until the crash, and one proc core
>>>> pegs at 100%.
>>>>
>>>> Frustrating, yes.
>>>>
>>>> On Thu, Mar 12, 2009 at 12:31 AM, Maidak Alexander J
>>>> <maidakalexand...@johndeere.com> wrote:
>>>>>
>>>>> If you're having issues with a disk contoller or disk IO driver its
>>>>> highly likely that a savecore to disk after the panic will fail.  I'm
>>>>> not
>>>>> sure how to work around this, maybe a dedicated dump device not on a
>>>>> controller that uses a different driver then the one that you're having
>>>>> issues with?
>>>>>
>>>>> -----Original Message-----
>>>>> From: zfs-discuss-boun...@opensolaris.org
>>>>> [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Blake
>>>>> Sent: Wednesday, March 11, 2009 4:45 PM
>>>>> To: Richard Elling
>>>>> Cc: Marc Bevand; zfs-discuss@opensolaris.org
>>>>> Subject: Re: [zfs-discuss] reboot when copying large amounts of data
>>>>>
>>>>> I guess I didn't make it clear that I had already tried using savecore
>>>>> to
>>>>> retrieve the core from the dump device.
>>>>>
>>>>> I added a larger zvol for dump, to make sure that I wasn't running out
>>>>> of
>>>>> space on the dump device:
>>>>>
>>>>> r...@host:~# dumpadm
>>>>>    Dump content: kernel pages
>>>>>     Dump device: /dev/zvol/dsk/rpool/bigdump (dedicated) Savecore
>>>>> directory: /var/crash/host
>>>>>  Savecore enabled: yes
>>>>>
>>>>> I was using the -L option only to try to get some idea of why the
>>>>> system
>>>>> load was climbing to 1 during a simple file copy.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 11, 2009 at 4:58 PM, Richard Elling
>>>>> <richard.ell...@gmail.com> wrote:
>>>>>>
>>>>>> Blake wrote:
>>>>>>>
>>>>>>> I'm attaching a screenshot of the console just before reboot.  The
>>>>>>> dump doesn't seem to be working, or savecore isn't working.
>>>>>>>
>>>>>>> On Wed, Mar 11, 2009 at 11:33 AM, Blake <blake.ir...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I'm working on testing this some more by doing a savecore -L right
>>>>>>>> after I start the copy.
>>>>>>>>
>>>>>>>>
>>>>>> savecore -L is not what you want.
>>>>>>
>>>>>> By default, for OpenSolaris, savecore on boot is disabled.  But the
>>>>>> core will have been dumped into the dump slice, which is not used for
>>>>>> swap.
>>>>>> So you should be able to run savecore at a later time to collect the
>>>>>> core from the last dump.
>>>>>> -- richard
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> zfs-discuss mailing list
>>>>> zfs-discuss@opensolaris.org
>>>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>>>>
>>>> _______________________________________________
>>>> zfs-discuss mailing list
>>>> zfs-discuss@opensolaris.org
>>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>>
>>> --
>>> //////////////////////////////////////////////////////////////////
>>> // Nathan Kroenert              nathan.kroen...@sun.com         //
>>> // Systems Engineer             Phone:  +61 3 9869-6255         //
>>> // Sun Microsystems             Fax:    +61 3 9869-6288         //
>>> // Level 7, 476 St. Kilda Road  Mobile: 0419 305 456            //
>>> // Melbourne 3004   Victoria    Australia                       //
>>> //////////////////////////////////////////////////////////////////
>>>
>
> --
> //////////////////////////////////////////////////////////////////
> // Nathan Kroenert              nathan.kroen...@sun.com         //
> // Systems Engineer             Phone:  +61 3 9869-6255         //
> // Sun Microsystems             Fax:    +61 3 9869-6288         //
> // Level 7, 476 St. Kilda Road  Mobile: 0419 305 456            //
> // Melbourne 3004   Victoria    Australia                       //
> //////////////////////////////////////////////////////////////////
>
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] reboot when copying large amounts of data

Reply via email to