Re: [zfs-discuss] migrating ufs to zfs - cant boot system

Krzys Thu, 06 Nov 2008 10:37:04 -0800

I think I did figure it out.

It is the issue with cpio that is in my system... I am not sure but I did copy 
cpio from my solaris sparc 9 server and it seems like lucreate completed 
without 
bus error, and system booted up using root zpool.


original cpio that I have on all of my solaris 10 U6 boxes are:
[11:04:16] @adas: /usr/bin > ls -la cpi*
-r-xr-xr-x   1 root     bin        85856 May 21 18:48 cpio

then I did copy solaris 9 cpio to my system:
-r-xr-xr-x   1 root     root       76956 May 14 15:46 cpio.3_sol9

so that old CPIO seems to work, new cpio on Soalris 10 U6 does not work. :(


[11:03:49] [EMAIL PROTECTED]: /root > zfs list
NAME            USED  AVAIL  REFER  MOUNTPOINT
rootpool       12.0G  54.9G    19K  /rootpool
rootpool/ROOT    18K  54.9G    18K  /rootpool/ROOT
rootpool/dump     4G  58.9G    16K  -
rootpool/swap  8.00G  62.9G    16K  -
[11:04:06] [EMAIL PROTECTED]: /root > lucreate -c ufsBE -n zfsBE -p rootpool
Analyzing system configuration.
Comparing source boot environment <ufsBE> file systems with the file
system(s) you specified for the new boot environment. Determining which
file systems should be in the new boot environment.
Updating boot environment description database on all BEs.
Updating system configuration files.
The device </dev/dsk/c1t1d0s0> is not a root device for any boot environment; 
cannot get BE ID.
Creating configuration for boot environment <zfsBE>.
Source boot environment is <ufsBE>.
Creating boot environment <zfsBE>.
Creating file systems on boot environment <zfsBE>.
Creating <zfs> file system for </> in zone <global> on <rootpool/ROOT/zfsBE>.
Populating file systems on boot environment <zfsBE>.
Checking selection integrity.
Integrity check OK.
Populating contents of mount point </>.
Copying.
Creating shared file system mount points.
Creating compare databases for boot environment <zfsBE>.
Creating compare database for file system </var>.
Creating compare database for file system </usr>.
Creating compare database for file system </>.
Updating compare databases on boot environment <zfsBE>.
Making boot environment <zfsBE> bootable.
Creating boot_archive for /.alt.tmp.b-tvg.mnt
updating /.alt.tmp.b-tvg.mnt/platform/sun4u/boot_archive
Population of boot environment <zfsBE> successful.
Creation of boot environment <zfsBE> successful.
[12:45:04] [EMAIL PROTECTED]: /root > lustatus
Boot Environment           Is       Active Active    Can    Copy
Name                       Complete Now    On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
ufsBE                      yes      yes    yes       no     -
zfsBE                      yes      no     no        yes    -
[13:14:57] [EMAIL PROTECTED]: /root >
[13:14:59] [EMAIL PROTECTED]: /root > zfs list
NAME                  USED  AVAIL  REFER  MOUNTPOINT
rootpool             24.3G  42.6G    19K  /rootpool
rootpool/ROOT        12.3G  42.6G    18K  /rootpool/ROOT
rootpool/ROOT/zfsBE  12.3G  42.6G  12.3G  /
rootpool/dump           4G  46.6G    16K  -
rootpool/swap        8.00G  50.6G    16K  -
[13:15:25] [EMAIL PROTECTED]: /root > luactivate zfsBE
A Live Upgrade Sync operation will be performed on startup of boot environment 
<zfsBE>.


**********************************************************************

The target boot environment has been activated. It will be used when you
reboot. NOTE: You MUST NOT USE the reboot, halt, or uadmin commands. You
MUST USE either the init or the shutdown command when you reboot. If you
do not use either init or shutdown, the system will not boot using the
target BE.

**********************************************************************

In case of a failure while booting to the target BE, the following process
needs to be followed to fallback to the currently working boot environment:

1. Enter the PROM monitor (ok prompt).

2. Change the boot device back to the original boot environment by typing:

      setenv boot-device /[EMAIL PROTECTED],600000/[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0:a

3. Boot to the original boot environment by typing:

      boot

**********************************************************************

Modifying boot archive service
Activation of boot environment <zfsBE> successful.
[13:16:57] [EMAIL PROTECTED]: /root > init 6
stopping NetWorker daemons:
  nsr_shutdown -q
svc.startd: The system is coming down.  Please wait.
svc.startd: 90 system services are now being stopped.
Nov  6 13:18:09 adas syslogd: going down on signal 15
umount: /appl busy
svc.startd: The system is down.
syncing file systems... done
rebooting...

SC Alert: Host System has Reset
Probing system devices
Probing memory
Probing I/O buses

Sun Fire V210, No Keyboard
Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.22.33, 4096 MB memory installed, Serial #64938415.
Ethernet address 0:3:ba:de:e1:af, Host ID: 83dee1af.



Rebooting with command: boot
Boot device: /[EMAIL PROTECTED],600000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0:a  
File and args:
SunOS Release 5.10 Version Generic_137137-09 64-bit
Copyright 1983-2008 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Hardware watchdog enabled
Hostname: adas
Configuring devices.
/dev/rdsk/c1t0d0s7 is clean
Reading ZFS config: done.
Mounting ZFS filesystems: (3/3)
Nov  6 13:22:23 squid[380]: Squid Parent: child process 383 started

adas console login: root
Password:
Nov  6 13:22:38 adas login: ROOT LOGIN /dev/console
Last login: Thu Nov  6 10:44:17 from kasiczynka.ny.p
Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
You have mail.
# bash
[13:22:40] @adas: /root > df -h
Filesystem             size   used  avail capacity  Mounted on
rootpool/ROOT/zfsBE     67G    12G    43G    23%    /
/devices                 0K     0K     0K     0%    /devices
ctfs                     0K     0K     0K     0%    /system/contract
proc                     0K     0K     0K     0%    /proc
mnttab                   0K     0K     0K     0%    /etc/mnttab
swap                   7.8G   360K   7.8G     1%    /etc/svc/volatile
objfs                    0K     0K     0K     0%    /system/object
sharefs                  0K     0K     0K     0%    /etc/dfs/sharetab
/platform/sun4u-us3/lib/libc_psr/libc_psr_hwcap1.so.1
                         55G    12G    43G    23% 
/platform/sun4u-us3/lib/libc_psr.so.1
/platform/sun4u-us3/lib/sparcv9/libc_psr/libc_psr_hwcap1.so.1
                         55G    12G    43G    23% 
/platform/sun4u-us3/lib/sparcv9/libc_psr.so.1
fd                       0K     0K     0K     0%    /dev/fd
swap                   7.8G    72K   7.8G     1%    /tmp
swap                   7.8G    56K   7.8G     1%    /var/run
/dev/dsk/c1t0d0s7       78G   1.2G    76G     2%    /export/home
rootpool                67G    21K    43G     1%    /rootpool
rootpool/ROOT           67G    18K    43G     1%    /rootpool/ROOT
[13:22:42] @adas: /root > starting NetWorker daemons:
  nsrexecd






On Thu, 6 Nov 2008, Enda O'Connor wrote:

> Hi
> Wierd, almost like some kind of memory corruption.
>
> Could I see the upgrade logs, that got you to u6
> ie
> /var/sadm/system/logs/upgrade_log
> for the u6 env.
> What kind of upgrade did you do, liveupgrade, text based etc?
>
> Enda
>
> On 11/06/08 15:41, Krzys wrote:
>> Seems like core.vold.* are not being created until I try to boot from 
>> zfsBE, just creating zfsBE gets onlu core.cpio created.
>> 
>> 
>> 
>> [10:29:48] @adas: /var/crash > mdb core.cpio.5545
>> Loading modules: [ libc.so.1 libavl.so.1 ld.so.1 ]
>>> ::status
>> debugging core file of cpio (32-bit) from adas
>> file: /usr/bin/cpio
>> initial argv: /usr/bin/cpio -pPcdum /.alt.tmp.b-Prb.mnt
>> threading model: multi-threaded
>> status: process terminated by SIGBUS (Bus Error)
>>> $C
>> ffbfe5b0 libc.so.1`_malloc_unlocked+0x164(30, 0, 39c28, ff, 2e2f2e2f, 0)
>> ffbfe610 libc.so.1`malloc+0x4c(30, 1, e8070, 0, ff33e3c0, ff3485b8)
>> ffbfe670 libsec.so.1`cacl_get+0x138(ffbfe7c4, 2, 0, 35bc0, 0, 35f98)
>> ffbfe768 libsec.so.1`acl_get+0x14(37fe2, 2, 35bc0, 354c0, 1000, 1)
>> ffbfe7d0 0x183b4(1, 35800, 359e8, 346b0, 34874, 34870)
>> ffbfec30 main+0x28c(34708, 1, 35bc0, 166fc, 35800, 34400)
>> ffbfec90 _start+0x108(0, 0, 0, 0, 0, 0)
>>> $r
>> %g0 = 0x00000000                 %l0 = 0x00000000
>> %g1 = 0xff25638c libc.so.1`malloc+0x44 %l1 = 0x00039c28
>> %g2 = 0x00037fe0                 %l2 = 0x2e2f2e2f
>> %g3 = 0x00008000                 %l3 = 0x000003c8
>> %g4 = 0x00000000                 %l4 = 0x2e2f2e2f
>> %g5 = 0x00000000                 %l5 = 0x00000000
>> %g6 = 0x00000000                 %l6 = 0xffffdc00
>> %g7 = 0xff382a00                 %l7 = 0xff347344 libc.so.1`Lfree
>> %o0 = 0x00000000                 %i0 = 0x00000030
>> %o1 = 0x00000000                 %i1 = 0x00000000
>> %o2 = 0x000e70c4                 %i2 = 0x00039c28
>> %o3 = 0x00000000                 %i3 = 0x000000ff
>> %o4 = 0xff33e3c0                 %i4 = 0x2e2f2e2f
>> %o5 = 0xff347344 libc.so.1`Lfree %i5 = 0x00000000
>> %o6 = 0xffbfe5b0                 %i6 = 0xffbfe610
>> %o7 = 0xff2564a4 libc.so.1`_malloc_unlocked+0xf4 %i7 = 0xff256394
>> libc.so.1`malloc+0x4c
>>
>>   %psr = 0xfe001002 impl=0xf ver=0xe icc=nzvc
>>                     ec=0 ef=4096 pil=0 s=0 ps=0 et=0 cwp=0x2
>>     %y = 0x00000000
>>    %pc = 0xff256514 libc.so.1`_malloc_unlocked+0x164
>>   %npc = 0xff2564d8 libc.so.1`_malloc_unlocked+0x128
>>    %sp = 0xffbfe5b0
>>    %fp = 0xffbfe610
>>
>>   %wim = 0x00000000
>>   %tbr = 0x00000000
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Thu, 6 Nov 2008, Enda O'Connor wrote:
>> 
>>> Hi
>>> try and get the stack trace from the core
>>> ie mdb core.vold.24978
>>> ::status
>>> $C
>>> $r
>>> 
>>> also run the same 3 mdb commands on the cpio core dump.
>>> 
>>> also if you could extract some data from the truss log, ie a few hundred 
>>> lines before the first SIGBUS
>>> 
>>> 
>>> Enda
>>> 
>>> On 11/06/08 01:25, Krzys wrote:
>>>> THis is so bizare, I am unable to pass this problem. I though I had not 
>>>> enough space on my hard drive (new one) so I replaced it with 72gb drive, 
>>>> but still getting that bus error. Originally when I restarted my server 
>>>> it did not want to boot, do I had to power it off and then back on and it 
>>>> then booted up. But constantly I am getting this "Bus Error - core 
>>>> dumped"
>>>> 
>>>> anyway in my /var/crash I see hundreds of core.void files and 3 core.cpio 
>>>> files. I would imagine core.cpio are the ones that are direct result of 
>>>> what I am probably eperiencing.
>>>> 
>>>> -rw-------   1 root     root     4126301 Nov  5 19:22 core.vold.24854
>>>> -rw-------   1 root     root     4126301 Nov  5 19:22 core.vold.24867
>>>> -rw-------   1 root     root     4126301 Nov  5 19:22 core.vold.24880
>>>> -rw-------   1 root     root     4126301 Nov  5 19:22 core.vold.24893
>>>> -rw-------   1 root     root     4126301 Nov  5 19:22 core.vold.24906
>>>> -rw-------   1 root     root     4126301 Nov  5 19:22 core.vold.24919
>>>> -rw-------   1 root     root     4126301 Nov  5 19:22 core.vold.24932
>>>> -rw-------   1 root     root     4126301 Nov  5 19:22 core.vold.24950
>>>> -rw-------   1 root     root     4126301 Nov  5 19:22 core.vold.24978
>>>> drwxr-xr-x   3 root     root       81408 Nov  5 20:06 .
>>>> -rw-------   1 root     root     31351099 Nov  5 20:06 core.cpio.6208
>>>> 
>>>> 
>>>> 
>>>> On Wed, 5 Nov 2008, Enda O'Connor wrote:
>>>> 
>>>>> Hi
>>>>> Looks ok, some mounts left over from pervious fail.
>>>>> In regards to swap and dump on zpool you can set them
>>>>> zfs set volsize=1G rootpool/dump
>>>>> zfs set volsize=1G rootpool/swap
>>>>> 
>>>>> for instance, of course above are only an example of how to do it.
>>>>> or make the zvol doe rootpool/dump etc before lucreate, in which case it 
>>>>> will take the swap and dump size you have preset.
>>>>> 
>>>>> But I think we need to see the coredump/truss at this point to get an 
>>>>> idea of where things went wrong.
>>>>> Enda
>>>>> 
>>>>> On 11/05/08 15:38, Krzys wrote:
>>>>>> I did upgrade my U5 to U6 from DVD, went trough the upgrade process.
>>>>>> my file system is setup as follow:
>>>>>> [10:11:54] [EMAIL PROTECTED]: /root > df -h | egrep -v 
>>>>>> "platform|sharefs|objfs|mnttab|proc|ctfs|devices|fd|nsr"
>>>>>> Filesystem             size   used  avail capacity  Mounted on
>>>>>> /dev/dsk/c1t0d0s0       16G   7.2G   8.4G    47%    /
>>>>>> swap                   8.3G   1.5M   8.3G     1%    /etc/svc/volatile
>>>>>> /dev/dsk/c1t0d0s6       16G   8.7G   6.9G    56%    /usr
>>>>>> /dev/dsk/c1t0d0s1       16G   2.5G    13G    17%    /var
>>>>>> swap                   8.5G   229M   8.3G     3%    /tmp
>>>>>> swap                   8.3G    40K   8.3G     1%    /var/run
>>>>>> /dev/dsk/c1t0d0s7       78G   1.2G    76G     2%    /export/home
>>>>>> rootpool                33G    19K    21G     1%    /rootpool
>>>>>> rootpool/ROOT           33G    18K    21G     1%    /rootpool/ROOT
>>>>>> rootpool/ROOT/zfsBE     33G    31M    21G     1%    /.alt.tmp.b-UUb.mnt
>>>>>> /export/home            78G   1.2G    76G     2% 
>>>>>> /.alt.tmp.b-UUb.mnt/export/home
>>>>>> /rootpool               21G    19K    21G     1% 
>>>>>> /.alt.tmp.b-UUb.mnt/rootpool
>>>>>> /rootpool/ROOT          21G    18K    21G     1% 
>>>>>> /.alt.tmp.b-UUb.mnt/rootpool/ROOT
>>>>>> swap                   8.3G     0K   8.3G     0% 
>>>>>> /.alt.tmp.b-UUb.mnt/var/run
>>>>>> swap                   8.3G     0K   8.3G     0% 
>>>>>> /.alt.tmp.b-UUb.mnt/tmp
>>>>>> [10:12:00] [EMAIL PROTECTED]: /root >
>>>>>> 
>>>>>> 
>>>>>> so I have /, /usr, /var and /export/home on that primary disk. Original 
>>>>>> disk is 140gb, this new one is only 36gb, but disk utilization on that 
>>>>>> primary disk is much less utilized so easily should fit on it.
>>>>>> 
>>>>>> / 7.2GB
>>>>>> /usr 8.7GB
>>>>>> /var 2.5GB
>>>>>> /export/home 1.2GB
>>>>>> total space 19.6GB
>>>>>> I did notice that lucreate did alocate 8GB to SWAP and 4GB to DUMP
>>>>>> total space needed 31.6GB
>>>>>> seems like total available disk space on my disk should be 33.92GB
>>>>>> so its quite close as both numbers do approach. So to make sure I will 
>>>>>> change disk for 72gb and will try again. I do not beleive that I need 
>>>>>> to match my main disk size as 146gb as I am not using that much disk 
>>>>>> space on it. But let me try this and it might be why I am getting this 
>>>>>> problem...
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wed, 5 Nov 2008, Enda O'Connor wrote:
>>>>>> 
>>>>>>> Hi Krzys
>>>>>>> Also some info on the actual system
>>>>>>> ie what was it upgraded to u6 from and how.
>>>>>>> and an idea of how the filesystems are laid out, ie is usr seperate 
>>>>>>> from / and so on ( maybe a df -k ). Don't appear to have any zones 
>>>>>>> installed, just to confirm.
>>>>>>> Enda
>>>>>>> 
>>>>>>> On 11/05/08 14:07, Enda O'Connor wrote:
>>>>>>>> Hi
>>>>>>>> did you get a core dump?
>>>>>>>> would be nice to see the core file to get an idea of what dumped 
>>>>>>>> core,
>>>>>>>> might configure coreadm if not already done
>>>>>>>> run coreadm first, if the output looks like
>>>>>>>> 
>>>>>>>> # coreadm
>>>>>>>>      global core file pattern: /var/crash/core.%f.%p
>>>>>>>>      global core file content: default
>>>>>>>>        init core file pattern: core
>>>>>>>>        init core file content: default
>>>>>>>>             global core dumps: enabled
>>>>>>>>        per-process core dumps: enabled
>>>>>>>>       global setid core dumps: enabled
>>>>>>>>  per-process setid core dumps: disabled
>>>>>>>>      global core dump logging: enabled
>>>>>>>> 
>>>>>>>> then all should be good, and cores should appear in /var/crash
>>>>>>>> 
>>>>>>>> otherwise the following should configure coreadm:
>>>>>>>> coreadm -g /var/crash/core.%f.%p
>>>>>>>> coreadm -G all
>>>>>>>> coreadm -e global
>>>>>>>> coreadm -e per-process
>>>>>>>> 
>>>>>>>> 
>>>>>>>> coreadm -u to load the new settings without rebooting.
>>>>>>>> 
>>>>>>>> also might need to set the size of the core dump via
>>>>>>>> ulimit -c unlimited
>>>>>>>> check ulimit -a first.
>>>>>>>> 
>>>>>>>> then rerun test and check /var/crash for core dump.
>>>>>>>> 
>>>>>>>> If that fails a truss via say truss -fae -o /tmp/truss.out lucreate 
>>>>>>>> -c ufsBE -n zfsBE -p rootpool
>>>>>>>> 
>>>>>>>> might give an indication, look for SIGBUS in the truss log
>>>>>>>> 
>>>>>>>> NOTE, that you might want to reset the coreadm and ulimit for 
>>>>>>>> coredumps after this, in order to not risk filling the system with 
>>>>>>>> coredumps in the case of some utility coredumping in a loop say.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Enda
>>> 
>>> -- 
>>> Enda O'Connor x19781  Software Product Engineering
>>> Patch System Test : Ireland : x19781/353-1-8199718
>>> 
>>> 
>>> 
>>> 
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
> -- 
> Enda O'Connor x19781  Software Product Engineering
> Patch System Test : Ireland : x19781/353-1-8199718
>
>
> !DSPAM:122,4913201c5081163845084!
>
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] migrating ufs to zfs - cant boot system

Reply via email to