I think I did figure it out. It is the issue with cpio that is in my system... I am not sure but I did copy cpio from my solaris sparc 9 server and it seems like lucreate completed without bus error, and system booted up using root zpool.
original cpio that I have on all of my solaris 10 U6 boxes are: [11:04:16] @adas: /usr/bin > ls -la cpi* -r-xr-xr-x 1 root bin 85856 May 21 18:48 cpio then I did copy solaris 9 cpio to my system: -r-xr-xr-x 1 root root 76956 May 14 15:46 cpio.3_sol9 so that old CPIO seems to work, new cpio on Soalris 10 U6 does not work. :( [11:03:49] [EMAIL PROTECTED]: /root > zfs list NAME USED AVAIL REFER MOUNTPOINT rootpool 12.0G 54.9G 19K /rootpool rootpool/ROOT 18K 54.9G 18K /rootpool/ROOT rootpool/dump 4G 58.9G 16K - rootpool/swap 8.00G 62.9G 16K - [11:04:06] [EMAIL PROTECTED]: /root > lucreate -c ufsBE -n zfsBE -p rootpool Analyzing system configuration. Comparing source boot environment <ufsBE> file systems with the file system(s) you specified for the new boot environment. Determining which file systems should be in the new boot environment. Updating boot environment description database on all BEs. Updating system configuration files. The device </dev/dsk/c1t1d0s0> is not a root device for any boot environment; cannot get BE ID. Creating configuration for boot environment <zfsBE>. Source boot environment is <ufsBE>. Creating boot environment <zfsBE>. Creating file systems on boot environment <zfsBE>. Creating <zfs> file system for </> in zone <global> on <rootpool/ROOT/zfsBE>. Populating file systems on boot environment <zfsBE>. Checking selection integrity. Integrity check OK. Populating contents of mount point </>. Copying. Creating shared file system mount points. Creating compare databases for boot environment <zfsBE>. Creating compare database for file system </var>. Creating compare database for file system </usr>. Creating compare database for file system </>. Updating compare databases on boot environment <zfsBE>. Making boot environment <zfsBE> bootable. Creating boot_archive for /.alt.tmp.b-tvg.mnt updating /.alt.tmp.b-tvg.mnt/platform/sun4u/boot_archive Population of boot environment <zfsBE> successful. Creation of boot environment <zfsBE> successful. [12:45:04] [EMAIL PROTECTED]: /root > lustatus Boot Environment Is Active Active Can Copy Name Complete Now On Reboot Delete Status -------------------------- -------- ------ --------- ------ ---------- ufsBE yes yes yes no - zfsBE yes no no yes - [13:14:57] [EMAIL PROTECTED]: /root > [13:14:59] [EMAIL PROTECTED]: /root > zfs list NAME USED AVAIL REFER MOUNTPOINT rootpool 24.3G 42.6G 19K /rootpool rootpool/ROOT 12.3G 42.6G 18K /rootpool/ROOT rootpool/ROOT/zfsBE 12.3G 42.6G 12.3G / rootpool/dump 4G 46.6G 16K - rootpool/swap 8.00G 50.6G 16K - [13:15:25] [EMAIL PROTECTED]: /root > luactivate zfsBE A Live Upgrade Sync operation will be performed on startup of boot environment <zfsBE>. ********************************************************************** The target boot environment has been activated. It will be used when you reboot. NOTE: You MUST NOT USE the reboot, halt, or uadmin commands. You MUST USE either the init or the shutdown command when you reboot. If you do not use either init or shutdown, the system will not boot using the target BE. ********************************************************************** In case of a failure while booting to the target BE, the following process needs to be followed to fallback to the currently working boot environment: 1. Enter the PROM monitor (ok prompt). 2. Change the boot device back to the original boot environment by typing: setenv boot-device /[EMAIL PROTECTED],600000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0:a 3. Boot to the original boot environment by typing: boot ********************************************************************** Modifying boot archive service Activation of boot environment <zfsBE> successful. [13:16:57] [EMAIL PROTECTED]: /root > init 6 stopping NetWorker daemons: nsr_shutdown -q svc.startd: The system is coming down. Please wait. svc.startd: 90 system services are now being stopped. Nov 6 13:18:09 adas syslogd: going down on signal 15 umount: /appl busy svc.startd: The system is down. syncing file systems... done rebooting... SC Alert: Host System has Reset Probing system devices Probing memory Probing I/O buses Sun Fire V210, No Keyboard Copyright 2007 Sun Microsystems, Inc. All rights reserved. OpenBoot 4.22.33, 4096 MB memory installed, Serial #64938415. Ethernet address 0:3:ba:de:e1:af, Host ID: 83dee1af. Rebooting with command: boot Boot device: /[EMAIL PROTECTED],600000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0:a File and args: SunOS Release 5.10 Version Generic_137137-09 64-bit Copyright 1983-2008 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. Hardware watchdog enabled Hostname: adas Configuring devices. /dev/rdsk/c1t0d0s7 is clean Reading ZFS config: done. Mounting ZFS filesystems: (3/3) Nov 6 13:22:23 squid[380]: Squid Parent: child process 383 started adas console login: root Password: Nov 6 13:22:38 adas login: ROOT LOGIN /dev/console Last login: Thu Nov 6 10:44:17 from kasiczynka.ny.p Sun Microsystems Inc. SunOS 5.10 Generic January 2005 You have mail. # bash [13:22:40] @adas: /root > df -h Filesystem size used avail capacity Mounted on rootpool/ROOT/zfsBE 67G 12G 43G 23% / /devices 0K 0K 0K 0% /devices ctfs 0K 0K 0K 0% /system/contract proc 0K 0K 0K 0% /proc mnttab 0K 0K 0K 0% /etc/mnttab swap 7.8G 360K 7.8G 1% /etc/svc/volatile objfs 0K 0K 0K 0% /system/object sharefs 0K 0K 0K 0% /etc/dfs/sharetab /platform/sun4u-us3/lib/libc_psr/libc_psr_hwcap1.so.1 55G 12G 43G 23% /platform/sun4u-us3/lib/libc_psr.so.1 /platform/sun4u-us3/lib/sparcv9/libc_psr/libc_psr_hwcap1.so.1 55G 12G 43G 23% /platform/sun4u-us3/lib/sparcv9/libc_psr.so.1 fd 0K 0K 0K 0% /dev/fd swap 7.8G 72K 7.8G 1% /tmp swap 7.8G 56K 7.8G 1% /var/run /dev/dsk/c1t0d0s7 78G 1.2G 76G 2% /export/home rootpool 67G 21K 43G 1% /rootpool rootpool/ROOT 67G 18K 43G 1% /rootpool/ROOT [13:22:42] @adas: /root > starting NetWorker daemons: nsrexecd On Thu, 6 Nov 2008, Enda O'Connor wrote: > Hi > Wierd, almost like some kind of memory corruption. > > Could I see the upgrade logs, that got you to u6 > ie > /var/sadm/system/logs/upgrade_log > for the u6 env. > What kind of upgrade did you do, liveupgrade, text based etc? > > Enda > > On 11/06/08 15:41, Krzys wrote: >> Seems like core.vold.* are not being created until I try to boot from >> zfsBE, just creating zfsBE gets onlu core.cpio created. >> >> >> >> [10:29:48] @adas: /var/crash > mdb core.cpio.5545 >> Loading modules: [ libc.so.1 libavl.so.1 ld.so.1 ] >>> ::status >> debugging core file of cpio (32-bit) from adas >> file: /usr/bin/cpio >> initial argv: /usr/bin/cpio -pPcdum /.alt.tmp.b-Prb.mnt >> threading model: multi-threaded >> status: process terminated by SIGBUS (Bus Error) >>> $C >> ffbfe5b0 libc.so.1`_malloc_unlocked+0x164(30, 0, 39c28, ff, 2e2f2e2f, 0) >> ffbfe610 libc.so.1`malloc+0x4c(30, 1, e8070, 0, ff33e3c0, ff3485b8) >> ffbfe670 libsec.so.1`cacl_get+0x138(ffbfe7c4, 2, 0, 35bc0, 0, 35f98) >> ffbfe768 libsec.so.1`acl_get+0x14(37fe2, 2, 35bc0, 354c0, 1000, 1) >> ffbfe7d0 0x183b4(1, 35800, 359e8, 346b0, 34874, 34870) >> ffbfec30 main+0x28c(34708, 1, 35bc0, 166fc, 35800, 34400) >> ffbfec90 _start+0x108(0, 0, 0, 0, 0, 0) >>> $r >> %g0 = 0x00000000 %l0 = 0x00000000 >> %g1 = 0xff25638c libc.so.1`malloc+0x44 %l1 = 0x00039c28 >> %g2 = 0x00037fe0 %l2 = 0x2e2f2e2f >> %g3 = 0x00008000 %l3 = 0x000003c8 >> %g4 = 0x00000000 %l4 = 0x2e2f2e2f >> %g5 = 0x00000000 %l5 = 0x00000000 >> %g6 = 0x00000000 %l6 = 0xffffdc00 >> %g7 = 0xff382a00 %l7 = 0xff347344 libc.so.1`Lfree >> %o0 = 0x00000000 %i0 = 0x00000030 >> %o1 = 0x00000000 %i1 = 0x00000000 >> %o2 = 0x000e70c4 %i2 = 0x00039c28 >> %o3 = 0x00000000 %i3 = 0x000000ff >> %o4 = 0xff33e3c0 %i4 = 0x2e2f2e2f >> %o5 = 0xff347344 libc.so.1`Lfree %i5 = 0x00000000 >> %o6 = 0xffbfe5b0 %i6 = 0xffbfe610 >> %o7 = 0xff2564a4 libc.so.1`_malloc_unlocked+0xf4 %i7 = 0xff256394 >> libc.so.1`malloc+0x4c >> >> %psr = 0xfe001002 impl=0xf ver=0xe icc=nzvc >> ec=0 ef=4096 pil=0 s=0 ps=0 et=0 cwp=0x2 >> %y = 0x00000000 >> %pc = 0xff256514 libc.so.1`_malloc_unlocked+0x164 >> %npc = 0xff2564d8 libc.so.1`_malloc_unlocked+0x128 >> %sp = 0xffbfe5b0 >> %fp = 0xffbfe610 >> >> %wim = 0x00000000 >> %tbr = 0x00000000 >> >> >> >> >> >> >> >> On Thu, 6 Nov 2008, Enda O'Connor wrote: >> >>> Hi >>> try and get the stack trace from the core >>> ie mdb core.vold.24978 >>> ::status >>> $C >>> $r >>> >>> also run the same 3 mdb commands on the cpio core dump. >>> >>> also if you could extract some data from the truss log, ie a few hundred >>> lines before the first SIGBUS >>> >>> >>> Enda >>> >>> On 11/06/08 01:25, Krzys wrote: >>>> THis is so bizare, I am unable to pass this problem. I though I had not >>>> enough space on my hard drive (new one) so I replaced it with 72gb drive, >>>> but still getting that bus error. Originally when I restarted my server >>>> it did not want to boot, do I had to power it off and then back on and it >>>> then booted up. But constantly I am getting this "Bus Error - core >>>> dumped" >>>> >>>> anyway in my /var/crash I see hundreds of core.void files and 3 core.cpio >>>> files. I would imagine core.cpio are the ones that are direct result of >>>> what I am probably eperiencing. >>>> >>>> -rw------- 1 root root 4126301 Nov 5 19:22 core.vold.24854 >>>> -rw------- 1 root root 4126301 Nov 5 19:22 core.vold.24867 >>>> -rw------- 1 root root 4126301 Nov 5 19:22 core.vold.24880 >>>> -rw------- 1 root root 4126301 Nov 5 19:22 core.vold.24893 >>>> -rw------- 1 root root 4126301 Nov 5 19:22 core.vold.24906 >>>> -rw------- 1 root root 4126301 Nov 5 19:22 core.vold.24919 >>>> -rw------- 1 root root 4126301 Nov 5 19:22 core.vold.24932 >>>> -rw------- 1 root root 4126301 Nov 5 19:22 core.vold.24950 >>>> -rw------- 1 root root 4126301 Nov 5 19:22 core.vold.24978 >>>> drwxr-xr-x 3 root root 81408 Nov 5 20:06 . >>>> -rw------- 1 root root 31351099 Nov 5 20:06 core.cpio.6208 >>>> >>>> >>>> >>>> On Wed, 5 Nov 2008, Enda O'Connor wrote: >>>> >>>>> Hi >>>>> Looks ok, some mounts left over from pervious fail. >>>>> In regards to swap and dump on zpool you can set them >>>>> zfs set volsize=1G rootpool/dump >>>>> zfs set volsize=1G rootpool/swap >>>>> >>>>> for instance, of course above are only an example of how to do it. >>>>> or make the zvol doe rootpool/dump etc before lucreate, in which case it >>>>> will take the swap and dump size you have preset. >>>>> >>>>> But I think we need to see the coredump/truss at this point to get an >>>>> idea of where things went wrong. >>>>> Enda >>>>> >>>>> On 11/05/08 15:38, Krzys wrote: >>>>>> I did upgrade my U5 to U6 from DVD, went trough the upgrade process. >>>>>> my file system is setup as follow: >>>>>> [10:11:54] [EMAIL PROTECTED]: /root > df -h | egrep -v >>>>>> "platform|sharefs|objfs|mnttab|proc|ctfs|devices|fd|nsr" >>>>>> Filesystem size used avail capacity Mounted on >>>>>> /dev/dsk/c1t0d0s0 16G 7.2G 8.4G 47% / >>>>>> swap 8.3G 1.5M 8.3G 1% /etc/svc/volatile >>>>>> /dev/dsk/c1t0d0s6 16G 8.7G 6.9G 56% /usr >>>>>> /dev/dsk/c1t0d0s1 16G 2.5G 13G 17% /var >>>>>> swap 8.5G 229M 8.3G 3% /tmp >>>>>> swap 8.3G 40K 8.3G 1% /var/run >>>>>> /dev/dsk/c1t0d0s7 78G 1.2G 76G 2% /export/home >>>>>> rootpool 33G 19K 21G 1% /rootpool >>>>>> rootpool/ROOT 33G 18K 21G 1% /rootpool/ROOT >>>>>> rootpool/ROOT/zfsBE 33G 31M 21G 1% /.alt.tmp.b-UUb.mnt >>>>>> /export/home 78G 1.2G 76G 2% >>>>>> /.alt.tmp.b-UUb.mnt/export/home >>>>>> /rootpool 21G 19K 21G 1% >>>>>> /.alt.tmp.b-UUb.mnt/rootpool >>>>>> /rootpool/ROOT 21G 18K 21G 1% >>>>>> /.alt.tmp.b-UUb.mnt/rootpool/ROOT >>>>>> swap 8.3G 0K 8.3G 0% >>>>>> /.alt.tmp.b-UUb.mnt/var/run >>>>>> swap 8.3G 0K 8.3G 0% >>>>>> /.alt.tmp.b-UUb.mnt/tmp >>>>>> [10:12:00] [EMAIL PROTECTED]: /root > >>>>>> >>>>>> >>>>>> so I have /, /usr, /var and /export/home on that primary disk. Original >>>>>> disk is 140gb, this new one is only 36gb, but disk utilization on that >>>>>> primary disk is much less utilized so easily should fit on it. >>>>>> >>>>>> / 7.2GB >>>>>> /usr 8.7GB >>>>>> /var 2.5GB >>>>>> /export/home 1.2GB >>>>>> total space 19.6GB >>>>>> I did notice that lucreate did alocate 8GB to SWAP and 4GB to DUMP >>>>>> total space needed 31.6GB >>>>>> seems like total available disk space on my disk should be 33.92GB >>>>>> so its quite close as both numbers do approach. So to make sure I will >>>>>> change disk for 72gb and will try again. I do not beleive that I need >>>>>> to match my main disk size as 146gb as I am not using that much disk >>>>>> space on it. But let me try this and it might be why I am getting this >>>>>> problem... >>>>>> >>>>>> >>>>>> >>>>>> On Wed, 5 Nov 2008, Enda O'Connor wrote: >>>>>> >>>>>>> Hi Krzys >>>>>>> Also some info on the actual system >>>>>>> ie what was it upgraded to u6 from and how. >>>>>>> and an idea of how the filesystems are laid out, ie is usr seperate >>>>>>> from / and so on ( maybe a df -k ). Don't appear to have any zones >>>>>>> installed, just to confirm. >>>>>>> Enda >>>>>>> >>>>>>> On 11/05/08 14:07, Enda O'Connor wrote: >>>>>>>> Hi >>>>>>>> did you get a core dump? >>>>>>>> would be nice to see the core file to get an idea of what dumped >>>>>>>> core, >>>>>>>> might configure coreadm if not already done >>>>>>>> run coreadm first, if the output looks like >>>>>>>> >>>>>>>> # coreadm >>>>>>>> global core file pattern: /var/crash/core.%f.%p >>>>>>>> global core file content: default >>>>>>>> init core file pattern: core >>>>>>>> init core file content: default >>>>>>>> global core dumps: enabled >>>>>>>> per-process core dumps: enabled >>>>>>>> global setid core dumps: enabled >>>>>>>> per-process setid core dumps: disabled >>>>>>>> global core dump logging: enabled >>>>>>>> >>>>>>>> then all should be good, and cores should appear in /var/crash >>>>>>>> >>>>>>>> otherwise the following should configure coreadm: >>>>>>>> coreadm -g /var/crash/core.%f.%p >>>>>>>> coreadm -G all >>>>>>>> coreadm -e global >>>>>>>> coreadm -e per-process >>>>>>>> >>>>>>>> >>>>>>>> coreadm -u to load the new settings without rebooting. >>>>>>>> >>>>>>>> also might need to set the size of the core dump via >>>>>>>> ulimit -c unlimited >>>>>>>> check ulimit -a first. >>>>>>>> >>>>>>>> then rerun test and check /var/crash for core dump. >>>>>>>> >>>>>>>> If that fails a truss via say truss -fae -o /tmp/truss.out lucreate >>>>>>>> -c ufsBE -n zfsBE -p rootpool >>>>>>>> >>>>>>>> might give an indication, look for SIGBUS in the truss log >>>>>>>> >>>>>>>> NOTE, that you might want to reset the coreadm and ulimit for >>>>>>>> coredumps after this, in order to not risk filling the system with >>>>>>>> coredumps in the case of some utility coredumping in a loop say. >>>>>>>> >>>>>>>> >>>>>>>> Enda >>> >>> -- >>> Enda O'Connor x19781 Software Product Engineering >>> Patch System Test : Ireland : x19781/353-1-8199718 >>> >>> >>> >>> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss@opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > -- > Enda O'Connor x19781 Software Product Engineering > Patch System Test : Ireland : x19781/353-1-8199718 > > > !DSPAM:122,4913201c5081163845084! > _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss