Re: [zfs-discuss] reboot when copying large amounts of data

Blake Wed, 11 Mar 2009 23:12:25 -0700

I start the cp, and then, with prstat -a, watch the cpu load for the
cp process climb to 25% on a 4-core machine.


Load, measured for example with 'uptime', climbs steadily until the reboot.

Note that the machine does not dump properly, panic or hang - rather,
it reboots.

I attached a screenshot earlier in this thread of the little bit of
error message I could see on the console.  The machine is trying to
dump to the dump zvol, but fails to do so.  Only sometimes do I see an
error on the machine's local console - mos times, it simply reboots.



On Thu, Mar 12, 2009 at 1:55 AM, Nathan Kroenert
<nathan.kroen...@sun.com> wrote:
> Hm -
>
> Crashes, or hangs? Moreover - how do you know a CPU is pegged?
>
> Seems like we could do a little more discovery on what the actual problem
> here is, as I can read it about 4 different ways.
>
> By this last piece of information, I'm guessing the system does not crash,
> but goes really really slow??
>
> Crash == panic == we see stack dump on console and try to take a dump
> hang == nothing works == no response -> might be worth looking at mdb -K
>        or booting with a -k on the boot line.
>
> So - are we crashing, hanging, or something different?
>
> It might simply be that you are eating up all your memory, and your physical
> backing storage is taking a while to catch up....?
>
> Nathan.
>
> Blake wrote:
>>
>> My dump device is already on a different controller - the motherboards
>> built-in nVidia SATA controller.
>>
>> The raidz2 vdev is the one I'm having trouble with (copying the same
>> files to the mirrored rpool on the nVidia controller work nicely).  I
>> do notice that, when using cp to copy the files to the raidz2 pool,
>> load on the machine climbs steadily until the crash, and one proc core
>> pegs at 100%.
>>
>> Frustrating, yes.
>>
>> On Thu, Mar 12, 2009 at 12:31 AM, Maidak Alexander J
>> <maidakalexand...@johndeere.com> wrote:
>>>
>>> If you're having issues with a disk contoller or disk IO driver its
>>> highly likely that a savecore to disk after the panic will fail.  I'm not
>>> sure how to work around this, maybe a dedicated dump device not on a
>>> controller that uses a different driver then the one that you're having
>>> issues with?
>>>
>>> -----Original Message-----
>>> From: zfs-discuss-boun...@opensolaris.org
>>> [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Blake
>>> Sent: Wednesday, March 11, 2009 4:45 PM
>>> To: Richard Elling
>>> Cc: Marc Bevand; zfs-discuss@opensolaris.org
>>> Subject: Re: [zfs-discuss] reboot when copying large amounts of data
>>>
>>> I guess I didn't make it clear that I had already tried using savecore to
>>> retrieve the core from the dump device.
>>>
>>> I added a larger zvol for dump, to make sure that I wasn't running out of
>>> space on the dump device:
>>>
>>> r...@host:~# dumpadm
>>>     Dump content: kernel pages
>>>      Dump device: /dev/zvol/dsk/rpool/bigdump (dedicated) Savecore
>>> directory: /var/crash/host
>>>  Savecore enabled: yes
>>>
>>> I was using the -L option only to try to get some idea of why the system
>>> load was climbing to 1 during a simple file copy.
>>>
>>>
>>>
>>> On Wed, Mar 11, 2009 at 4:58 PM, Richard Elling
>>> <richard.ell...@gmail.com> wrote:
>>>>
>>>> Blake wrote:
>>>>>
>>>>> I'm attaching a screenshot of the console just before reboot.  The
>>>>> dump doesn't seem to be working, or savecore isn't working.
>>>>>
>>>>> On Wed, Mar 11, 2009 at 11:33 AM, Blake <blake.ir...@gmail.com> wrote:
>>>>>
>>>>>> I'm working on testing this some more by doing a savecore -L right
>>>>>> after I start the copy.
>>>>>>
>>>>>>
>>>> savecore -L is not what you want.
>>>>
>>>> By default, for OpenSolaris, savecore on boot is disabled.  But the
>>>> core will have been dumped into the dump slice, which is not used for
>>>> swap.
>>>> So you should be able to run savecore at a later time to collect the
>>>> core from the last dump.
>>>> -- richard
>>>>
>>>>
>>> _______________________________________________
>>> zfs-discuss mailing list
>>> zfs-discuss@opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>>
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> --
> //////////////////////////////////////////////////////////////////
> // Nathan Kroenert              nathan.kroen...@sun.com         //
> // Systems Engineer             Phone:  +61 3 9869-6255         //
> // Sun Microsystems             Fax:    +61 3 9869-6288         //
> // Level 7, 476 St. Kilda Road  Mobile: 0419 305 456            //
> // Melbourne 3004   Victoria    Australia                       //
> //////////////////////////////////////////////////////////////////
>
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] reboot when copying large amounts of data

Reply via email to