On Wed, 6 Jun 2007, Otto Moerbeek wrote:

> On Wed, 6 Jun 2007, Markus Lude wrote:
> 
> > On Tue, Jun 05, 2007 at 07:51:48AM +0200, Otto Moerbeek wrote:
> > > 
> > > On Tue, 5 Jun 2007, Markus Lude wrote:
> > > 
> > > > On Mon, Jun 04, 2007 at 06:02:59PM -0500, Emilio Perea wrote:
> > > > > I follow -current on an i386 at work and an amd64 at home, and rarely
> > > > > run into any problem which is not self-inflicted.  So when I had a 
> > > > > weird
> > > > > experience this weekend, I assumed it was my fault.
> > > > > 
> > > > > What happened was that after the usual sequence of [build kernel;
> > > > > reboot; build userland; reboot] the system complained that it could 
> > > > > not
> > > > > fsck wd1j and dropped into single-user mode.  wd1j is mounted on
> > > > > /usr/obj, and I thought that something in the last build had messed it
> > > > > up, so I ran "newfs wd1j" and got 
> > > > > 
> > > > >  newfs: /dev/rwd1j: Device not configured
> > > > > 
> > > > > "disklabel wd1" showed partitions d-i and k-p, but no j.  I added the
> > > > > partition, ran newfs, and everything seemed fine.  This afternoon I
> > > > > installed the i386 snapshot downloaded this morning (dated Jun 3 
> > > > > 19:19)
> > > > > on the work pc, and after reboot it was missing the /usr/obj partition
> > > > > (sd0g in this case).
> > > > > 
> > > > > Everything seems to be working fine on both computers, but I didn't
> > > > > expect the partitions to disappear.  Did nobody else run into this
> > > > > "problem"?  Or did everybody else who saw it thought it was too 
> > > > > obvious
> > > > > to mention it to the mailing list?
> > > > 
> > > > I had a similar problem on sparc64 with a snapshot from jun 2. The
> > > > system was unable to fsck some partitions and dropped to single user
> > > > mode.
> > > > Here the problems were with the /usr, /var, /tmp and /home partitions.
> > > > Some further (and larger partitions) weren't affected.
> > > > 
> > > > I installed an older snapshot.
> > > > 
> > > > Any suggestions how to get this fixed or what to test/try?
> > > 
> > > There were some validations checkc added to partitions. If a bad
> > > partition is found, it will be marked "unused". The checks were a
> > > little to strict for some cases. A fix for that went in yesterday, so
> > > try a new snap. 
> > 
> > Thanks for your info.
> > 
> > After rebuilding kernel and userland the problem still exists, but now
> > the affected partitions are /var, /home and /data. Hmm. Unmounting /data
> > and doing a manual fsck -f runs without problems.
> > 
> > > If the problem persists, please report with full disklabel output.
> > 
> > $ cat /etc/fstab
> > /dev/wd0a / ffs rw 1 1
> > /dev/wd0d /tmp ffs rw,nodev,nosuid 1 2
> > /dev/wd0e /usr ffs rw,nodev 1 2
> > /dev/wd0f /var ffs rw,nodev,nosuid 1 2
> > /dev/wd0g /home ffs rw,nodev,nosuid 1 2
> > /dev/wd0h /data ffs rw,nodev,nosuid 1 2
> > /dev/wd1d /backup ffs rw,nodev,nosuid 1 2
> > 
> > with an actual kernel:
> > 
> > $ sudo disklabel wd0
> > # /dev/rwd0c:
> > type: ESDI
> > disk: ESDI/IDE disk
> > label: ST3120213A      
> > flags:
> > bytes/sector: 512
> > sectors/track: 63
> > tracks/cylinder: 16
> > sectors/cylinder: 1008
> > cylinders: 16383
> > total sectors: 16514064
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> 1008 * 16383 = 16514064
> 
> > rpm: 3600
> > interleave: 1
> > trackskew: 0
> > cylinderskew: 0
> > headswitch: 0           # microseconds
> > track-to-track seek: 0  # microseconds
> > drivedata: 0 
> > 
> > 16 partitions:
> > #             size        offset  fstype [fsize bsize  cpg]
> >   a:       1024128             0  4.2BSD   2048 16384   16 # Cyl     0 -  
> > 1015 
> >   b:       3072384       1024128    swap                   # Cyl  1016 -  
> > 4063 
> >   c:     234441648             0  unused      0     0      # Cyl     0 
> > -232580 
> ^^^^^^^^^^^^^^^^^^^^^
> 
> Your disk size and c partition size do not match. Can you send a
> dmesg, to see what the actual size of your disk is? This is really
> needed to see what is going on.
> 
> Did you at any time edit the disk size by hand?
> 
> >   d:       2048256       4096512  4.2BSD   2048 16384   16 # Cyl  4064 -  
> > 6095 
> >   e:      20479536       6144768  4.2BSD   2048 16384   16 # Cyl  6096 - 
> > 26412 
> > disklabel: partition c: partition extends past end of unit
> > disklabel: partition e: partition extends past end of unit
> > 
> > older kernel:
> > $ sudo disklabel wd0
> > [...]
> > 16 partitions:
> > #             size        offset  fstype [fsize bsize  cpg]
> >   a:       1024128             0  4.2BSD      0     0   16 # Cyl     0 -  
> > 1015 
> >   b:       3072384       1024128    swap                   # Cyl  1016 -  
> > 4063 
> >   c:     234441648             0  unused      0     0      # Cyl     0 
> > -232580 
> >   d:       2048256       4096512  4.2BSD      0     0   16 # Cyl  4064 -  
> > 6095 
> >   e:      20479536       6144768  4.2BSD      0     0   16 # Cyl  6096 - 
> > 26412 
> >   f:       4095504      26624304  4.2BSD      0     0   16 # Cyl 26413 - 
> > 30475 
> >   g:      20479536      30719808  4.2BSD      0     0   16 # Cyl 30476 - 
> > 50792 
> >   h:     183242304      51199344  4.2BSD      0     0   16 # Cyl 50793 
> > -232580 
> > disklabel: partition c: partition extends past end of unit
> > disklabel: partition e: partition extends past end of unit
> > disklabel: partition f: offset past end of unit
> > disklabel: partition f: partition extends past end of unit
> > disklabel: partition g: offset past end of unit
> > disklabel: partition g: partition extends past end of unit
> > disklabel: partition h: offset past end of unit
> > disklabel: partition h: partition extends past end of unit
> > 
> > Any hints how to fix this beside repartition and reinstall?
> 
> If possible, please leave the disk as is, until we've done further
> diagnosis.  If that is not possible, you can use the 'e' command in
> disklabel, to set the actual size of the disk to the size (in sectors)
> reported in the dmesg.  You might need to adjust the 'c' partition as
> well. 

After having sen your dmesg, I see that your disk size is really
234441648 sectors. The disklabel says 16514064 though.  The new
consistency checks did not like that. The consistency checks have been
disabled in two steps (rev 1.44. and rev 1.66 of
sys/kern/subr_disk.c). So a current kernel should not trip on this
anymore. 

There remain two questions: how did the size end up being wrong in the
disklabel, and how to repair.

To the first question I can only guess; it could be you dd'ed an image
from another disk, you edited the size by hand or we are seeing the
results of a (old?) bug in disklabel handling that now surfaced
because of the concistency checks. 

The second question I already answered: using the 'e' command in
disklabel lets you set the size of the disk in the label. After that,
things should be back to normal.

Let us know how it goes.

        -Otto

Reply via email to