Re: [zfs-discuss] migrating ufs to zfs - cant boot system

Enda O'Connor Thu, 06 Nov 2008 08:49:06 -0800

Hi
Wierd, almost like some kind of memory corruption.

Could I see the upgrade logs, that got you to u6
ie
/var/sadm/system/logs/upgrade_log
for the u6 env.
What kind of upgrade did you do, liveupgrade, text based etc?


Enda

On 11/06/08 15:41, Krzys wrote:
> Seems like core.vold.* are not being created until I try to boot from zfsBE, 
> just creating zfsBE gets onlu core.cpio created.
> 
> 
> 
> [10:29:48] @adas: /var/crash > mdb core.cpio.5545
> Loading modules: [ libc.so.1 libavl.so.1 ld.so.1 ]
>> ::status
> debugging core file of cpio (32-bit) from adas
> file: /usr/bin/cpio
> initial argv: /usr/bin/cpio -pPcdum /.alt.tmp.b-Prb.mnt
> threading model: multi-threaded
> status: process terminated by SIGBUS (Bus Error)
>> $C
> ffbfe5b0 libc.so.1`_malloc_unlocked+0x164(30, 0, 39c28, ff, 2e2f2e2f, 0)
> ffbfe610 libc.so.1`malloc+0x4c(30, 1, e8070, 0, ff33e3c0, ff3485b8)
> ffbfe670 libsec.so.1`cacl_get+0x138(ffbfe7c4, 2, 0, 35bc0, 0, 35f98)
> ffbfe768 libsec.so.1`acl_get+0x14(37fe2, 2, 35bc0, 354c0, 1000, 1)
> ffbfe7d0 0x183b4(1, 35800, 359e8, 346b0, 34874, 34870)
> ffbfec30 main+0x28c(34708, 1, 35bc0, 166fc, 35800, 34400)
> ffbfec90 _start+0x108(0, 0, 0, 0, 0, 0)
>> $r
> %g0 = 0x00000000                 %l0 = 0x00000000
> %g1 = 0xff25638c libc.so.1`malloc+0x44 %l1 = 0x00039c28
> %g2 = 0x00037fe0                 %l2 = 0x2e2f2e2f
> %g3 = 0x00008000                 %l3 = 0x000003c8
> %g4 = 0x00000000                 %l4 = 0x2e2f2e2f
> %g5 = 0x00000000                 %l5 = 0x00000000
> %g6 = 0x00000000                 %l6 = 0xffffdc00
> %g7 = 0xff382a00                 %l7 = 0xff347344 libc.so.1`Lfree
> %o0 = 0x00000000                 %i0 = 0x00000030
> %o1 = 0x00000000                 %i1 = 0x00000000
> %o2 = 0x000e70c4                 %i2 = 0x00039c28
> %o3 = 0x00000000                 %i3 = 0x000000ff
> %o4 = 0xff33e3c0                 %i4 = 0x2e2f2e2f
> %o5 = 0xff347344 libc.so.1`Lfree %i5 = 0x00000000
> %o6 = 0xffbfe5b0                 %i6 = 0xffbfe610
> %o7 = 0xff2564a4 libc.so.1`_malloc_unlocked+0xf4 %i7 = 0xff256394
> libc.so.1`malloc+0x4c
> 
>   %psr = 0xfe001002 impl=0xf ver=0xe icc=nzvc
>                     ec=0 ef=4096 pil=0 s=0 ps=0 et=0 cwp=0x2
>     %y = 0x00000000
>    %pc = 0xff256514 libc.so.1`_malloc_unlocked+0x164
>   %npc = 0xff2564d8 libc.so.1`_malloc_unlocked+0x128
>    %sp = 0xffbfe5b0
>    %fp = 0xffbfe610
> 
>   %wim = 0x00000000
>   %tbr = 0x00000000
> 
> 
> 
> 
> 
> 
> 
> On Thu, 6 Nov 2008, Enda O'Connor wrote:
> 
>> Hi
>> try and get the stack trace from the core
>> ie mdb core.vold.24978
>> ::status
>> $C
>> $r
>>
>> also run the same 3 mdb commands on the cpio core dump.
>>
>> also if you could extract some data from the truss log, ie a few hundred 
>> lines before the first SIGBUS
>>
>>
>> Enda
>>
>> On 11/06/08 01:25, Krzys wrote:
>>> THis is so bizare, I am unable to pass this problem. I though I had not 
>>> enough space on my hard drive (new one) so I replaced it with 72gb drive, 
>>> but still getting that bus error. Originally when I restarted my server it 
>>> did not want to boot, do I had to power it off and then back on and it then 
>>> booted up. But constantly I am getting this "Bus Error - core dumped"
>>>
>>> anyway in my /var/crash I see hundreds of core.void files and 3 core.cpio 
>>> files. I would imagine core.cpio are the ones that are direct result of 
>>> what I am probably eperiencing.
>>>
>>> -rw-------   1 root     root     4126301 Nov  5 19:22 core.vold.24854
>>> -rw-------   1 root     root     4126301 Nov  5 19:22 core.vold.24867
>>> -rw-------   1 root     root     4126301 Nov  5 19:22 core.vold.24880
>>> -rw-------   1 root     root     4126301 Nov  5 19:22 core.vold.24893
>>> -rw-------   1 root     root     4126301 Nov  5 19:22 core.vold.24906
>>> -rw-------   1 root     root     4126301 Nov  5 19:22 core.vold.24919
>>> -rw-------   1 root     root     4126301 Nov  5 19:22 core.vold.24932
>>> -rw-------   1 root     root     4126301 Nov  5 19:22 core.vold.24950
>>> -rw-------   1 root     root     4126301 Nov  5 19:22 core.vold.24978
>>> drwxr-xr-x   3 root     root       81408 Nov  5 20:06 .
>>> -rw-------   1 root     root     31351099 Nov  5 20:06 core.cpio.6208
>>>
>>>
>>>
>>> On Wed, 5 Nov 2008, Enda O'Connor wrote:
>>>
>>>> Hi
>>>> Looks ok, some mounts left over from pervious fail.
>>>> In regards to swap and dump on zpool you can set them
>>>> zfs set volsize=1G rootpool/dump
>>>> zfs set volsize=1G rootpool/swap
>>>>
>>>> for instance, of course above are only an example of how to do it.
>>>> or make the zvol doe rootpool/dump etc before lucreate, in which case it 
>>>> will take the swap and dump size you have preset.
>>>>
>>>> But I think we need to see the coredump/truss at this point to get an idea 
>>>> of where things went wrong.
>>>> Enda
>>>>
>>>> On 11/05/08 15:38, Krzys wrote:
>>>>> I did upgrade my U5 to U6 from DVD, went trough the upgrade process.
>>>>> my file system is setup as follow:
>>>>> [10:11:54] [EMAIL PROTECTED]: /root > df -h | egrep -v 
>>>>> "platform|sharefs|objfs|mnttab|proc|ctfs|devices|fd|nsr"
>>>>> Filesystem             size   used  avail capacity  Mounted on
>>>>> /dev/dsk/c1t0d0s0       16G   7.2G   8.4G    47%    /
>>>>> swap                   8.3G   1.5M   8.3G     1%    /etc/svc/volatile
>>>>> /dev/dsk/c1t0d0s6       16G   8.7G   6.9G    56%    /usr
>>>>> /dev/dsk/c1t0d0s1       16G   2.5G    13G    17%    /var
>>>>> swap                   8.5G   229M   8.3G     3%    /tmp
>>>>> swap                   8.3G    40K   8.3G     1%    /var/run
>>>>> /dev/dsk/c1t0d0s7       78G   1.2G    76G     2%    /export/home
>>>>> rootpool                33G    19K    21G     1%    /rootpool
>>>>> rootpool/ROOT           33G    18K    21G     1%    /rootpool/ROOT
>>>>> rootpool/ROOT/zfsBE     33G    31M    21G     1%    /.alt.tmp.b-UUb.mnt
>>>>> /export/home            78G   1.2G    76G     2% 
>>>>> /.alt.tmp.b-UUb.mnt/export/home
>>>>> /rootpool               21G    19K    21G     1% 
>>>>> /.alt.tmp.b-UUb.mnt/rootpool
>>>>> /rootpool/ROOT          21G    18K    21G     1% 
>>>>> /.alt.tmp.b-UUb.mnt/rootpool/ROOT
>>>>> swap                   8.3G     0K   8.3G     0% 
>>>>> /.alt.tmp.b-UUb.mnt/var/run
>>>>> swap                   8.3G     0K   8.3G     0% 
>>>>> /.alt.tmp.b-UUb.mnt/tmp
>>>>> [10:12:00] [EMAIL PROTECTED]: /root >
>>>>>
>>>>>
>>>>> so I have /, /usr, /var and /export/home on that primary disk. Original 
>>>>> disk is 140gb, this new one is only 36gb, but disk utilization on that 
>>>>> primary disk is much less utilized so easily should fit on it.
>>>>>
>>>>> / 7.2GB
>>>>> /usr 8.7GB
>>>>> /var 2.5GB
>>>>> /export/home 1.2GB
>>>>> total space 19.6GB
>>>>> I did notice that lucreate did alocate 8GB to SWAP and 4GB to DUMP
>>>>> total space needed 31.6GB
>>>>> seems like total available disk space on my disk should be 33.92GB
>>>>> so its quite close as both numbers do approach. So to make sure I will 
>>>>> change disk for 72gb and will try again. I do not beleive that I need to 
>>>>> match my main disk size as 146gb as I am not using that much disk space 
>>>>> on it. But let me try this and it might be why I am getting this 
>>>>> problem...
>>>>>
>>>>>
>>>>>
>>>>> On Wed, 5 Nov 2008, Enda O'Connor wrote:
>>>>>
>>>>>> Hi Krzys
>>>>>> Also some info on the actual system
>>>>>> ie what was it upgraded to u6 from and how.
>>>>>> and an idea of how the filesystems are laid out, ie is usr seperate from 
>>>>>> / and so on ( maybe a df -k ). Don't appear to have any zones installed, 
>>>>>> just to confirm.
>>>>>> Enda
>>>>>>
>>>>>> On 11/05/08 14:07, Enda O'Connor wrote:
>>>>>>> Hi
>>>>>>> did you get a core dump?
>>>>>>> would be nice to see the core file to get an idea of what dumped core,
>>>>>>> might configure coreadm if not already done
>>>>>>> run coreadm first, if the output looks like
>>>>>>>
>>>>>>> # coreadm
>>>>>>>      global core file pattern: /var/crash/core.%f.%p
>>>>>>>      global core file content: default
>>>>>>>        init core file pattern: core
>>>>>>>        init core file content: default
>>>>>>>             global core dumps: enabled
>>>>>>>        per-process core dumps: enabled
>>>>>>>       global setid core dumps: enabled
>>>>>>>  per-process setid core dumps: disabled
>>>>>>>      global core dump logging: enabled
>>>>>>>
>>>>>>> then all should be good, and cores should appear in /var/crash
>>>>>>>
>>>>>>> otherwise the following should configure coreadm:
>>>>>>> coreadm -g /var/crash/core.%f.%p
>>>>>>> coreadm -G all
>>>>>>> coreadm -e global
>>>>>>> coreadm -e per-process
>>>>>>>
>>>>>>>
>>>>>>> coreadm -u to load the new settings without rebooting.
>>>>>>>
>>>>>>> also might need to set the size of the core dump via
>>>>>>> ulimit -c unlimited
>>>>>>> check ulimit -a first.
>>>>>>>
>>>>>>> then rerun test and check /var/crash for core dump.
>>>>>>>
>>>>>>> If that fails a truss via say truss -fae -o /tmp/truss.out lucreate -c 
>>>>>>> ufsBE -n zfsBE -p rootpool
>>>>>>>
>>>>>>> might give an indication, look for SIGBUS in the truss log
>>>>>>>
>>>>>>> NOTE, that you might want to reset the coreadm and ulimit for coredumps 
>>>>>>> after this, in order to not risk filling the system with coredumps in 
>>>>>>> the case of some utility coredumping in a loop say.
>>>>>>>
>>>>>>>
>>>>>>> Enda
>>
>> -- 
>> Enda O'Connor x19781  Software Product Engineering
>> Patch System Test : Ireland : x19781/353-1-8199718
>>
>>
>> !DSPAM:122,4912d10015286266247132!
>>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


-- 
Enda O'Connor x19781  Software Product Engineering
Patch System Test : Ireland : x19781/353-1-8199718
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] migrating ufs to zfs - cant boot system

Reply via email to