On Sun, Feb 13, 2011 at 11:30:56PM +1100, Bruce Evans wrote: > On Sat, 12 Feb 2011, Konstantin Belousov wrote: > > >Log: > > In checker, read journal by sectors. > > > > Due to UFS insistence to pretend that device sector size is 512 bytes, > > Do you mean that ffs_fsck does this? UFS doesn't exist, and ffs in the > kernel didn't use device sectors at all until recently (it used DEV_BSIZE > units, but those are not sectors but are the units for bread() and friends). Yes, the journal writer started using DEV_BSIZE as _sector_ size to write the journal blocks, and fixing this was the point of the series of commits. Journal was unoperable on the providers with non-512 byte sectors.
> newfs uses a hack to do this. When handling kernel sizes in units of > DEV_BSIZE (not a sector size!), ffs_fsck should use DEV_BSIZE (or maybe > just fsbtodb()) instead of `dev_bsize' or `sectorsize' and not > pretend that DEV_BSIZE (or the corresponding superblock value for the > conversion macro) is the sector size. > > > sector size is obtained from ioctl(DIOCGSECTORSIZE) for real devices, > > and from the label otherwise. The file images without label have to > > be made with 512 sector size. > > >Modified: head/sbin/fsck_ffs/fsck.h > >============================================================================== > >--- head/sbin/fsck_ffs/fsck.h Sat Feb 12 13:12:45 2011 > >(r218603) > >+++ head/sbin/fsck_ffs/fsck.h Sat Feb 12 13:17:14 2011 > >(r218604) > >@@ -268,6 +268,7 @@ char snapname[BUFSIZ]; /* when doing sna > >char *cdevname; /* name of device being checked */ > >long dev_bsize; /* computed value of DEV_BSIZE */ > >long secsize; /* actual disk sector size */ > >+long real_dev_bsize; > > If `secsize' is the actual disk sector size, then we don't need another > variable giving the real sector size. > > The new variable has no comment. What is the difference between a sector > size and a dev_bsize? There is enough confusion between DEV_BSIZE and > sector sizes already. DEV_BSIZE has come to means nothing to do with > sectors and little to do with devices or block sizes. It is just a minimal > i/o sizes for use in old interfaces (ones that didn't want to use more than > 32 bits for disk addresses, but now use 64 bits anyway, so that they can > address 72 bits after multiplication by DEV_BSIZE). There is no "actual" difference (I tired of the word actual when debugging the patches), they are all DEV_BSIZE. > > These variables are named and documented slightly better in mkfs: > > % extern int sectorsize; /* bytes/sector */ > % extern int realsectorsize; /* bytes/sector in hardware*/ > > At lease the fake sector size variable doesn't claim to be actual, and > the real sector size variable doesn't have extra underscores and claims > to be a sector size variable. > > >Modified: head/sbin/fsck_ffs/setup.c > >============================================================================== > >--- head/sbin/fsck_ffs/setup.c Sat Feb 12 13:12:45 2011 (r218603) > >+++ head/sbin/fsck_ffs/setup.c Sat Feb 12 13:17:14 2011 (r218604) > >@@ -446,7 +446,7 @@ sblock_init(void) > > if (sblk.b_un.b_buf == NULL || asblk.b_un.b_buf == NULL) > > errx(EEXIT, "cannot allocate space for superblock"); > > if ((lp = getdisklabel(NULL, fsreadfd))) > >- dev_bsize = secsize = lp->d_secsize; > >+ real_dev_bsize = dev_bsize = secsize = lp->d_secsize; > > else > > dev_bsize = secsize = DEV_BSIZE; > >} > > Both the variables are real enough here, provided getdisklabel() works. > Otherwise, you are in trouble and should fail instead of defaulting the > size unless the size is set by another means. newfs seems to fail, and > I'm sure newfs_msdos does fail if the sector size is unknown. > > >Modified: head/sbin/fsck_ffs/suj.c > >============================================================================== > >--- head/sbin/fsck_ffs/suj.c Sat Feb 12 13:12:45 2011 (r218603) > >+++ head/sbin/fsck_ffs/suj.c Sat Feb 12 13:17:14 2011 (r218604) > >@@ -28,6 +28,7 @@ > >__FBSDID("$FreeBSD$"); > > > >#include <sys/param.h> > >+#include <sys/disk.h> > >#include <sys/disklabel.h> > >#include <sys/mount.h> > >#include <sys/stat.h> > >@@ -201,6 +202,11 @@ opendisk(const char *devnam) > > disk->d_error); > > } > > fs = &disk->d_fs; > >+ if (real_dev_bsize == 0 && ioctl(disk->d_fd, DIOCGSECTORSIZE, > >+ &real_dev_bsize) == -1) > >+ real_dev_bsize = secsize; > >+ if (debug) > >+ printf("dev_bsize %ld\n", real_dev_bsize); > >} > > > >/* > > If this gives a different "real_dev_bsize", and this is different from the > sector size in the label or defaulted-to, then the latter is presumably > wrong, and we're left with a `secsize' that is not the "actual sector size". Yes, secsize is not the "actual sector size", it is equal to DEV_BSIZE, see setup.c:sblock_init(). > > It is a bug to replace the sector size in the label with the one obtained > here. The former should be correct, and it is the only one under user > control. It might need to be changed if one given by the ioctl is wrong > or just if it is fake. (Labels only have 32-bit disk addresses, so fake > sector sizes are needed to address more than 41 bits if the physical sector > size has is 512.) > > newfs doesn't have the precedence that I want: > - sectorsize set on the command line (-S option) has precedence. [Bug; > -S 0 is needed to cancel any previous setting of sectorsize on the > command line by -S or maybe by -T, but it is treated as an error.] > fsck_ffs unfortunately doesn't seem to have any -S or -T option. These > used to be unneeded since sector size weren't used, and labels worked > and contain enough info to locate the superblock where there is more > info. > - then try setting sectorsize using the ioctl if sectorsize is not already > set. [Bug: no error checking for this ioctl. If it fails, then it > may clobber sectorsize, which breaks the next step.] > - then set sectorsize from the label if there is a label and sectorsize > is not already set > - fail if sectorsize is not set newfs.c does not do what you describe, in fact. After going to all troubles finding the sector size, it performs the following: if (sectorsize != DEV_BSIZE) { /* XXX */ int secperblk = sectorsize / DEV_BSIZE; sectorsize = DEV_BSIZE; fssize *= secperblk; if (pp != NULL) pp->p_size *= secperblk; } And now layout calculations happen as if sector size is 512 bytes. I was puzzled for long time why libufs givess me 512 in d_bsize for a volume over 4096-bytes per sector md: disk->d_bsize = fs->fs_fsize / fsbtodb(fs, 1); > > Then the hacks involving realsectorsize: > - set it to sectorsize > - if sectorsize != DEV_BSIZE, change it to DEV_BSIZE and fix up related > quantities (I hope the change to the partition table is never written). > Now `sectorsize' is no longer "actual", though it was actual before the > hacks. We continue with the actual sector size recorded in > realsectorsize. realsectorsize is not used. You are citing the piece of code I pasted above. > > >@@ -2262,7 +2268,7 @@ suj_build(void) > > rec = (union jrec *)seg->ss_blk; > > for (i = 0; i < seg->ss_rec.jsr_cnt; off += JREC_SIZE, > > rec++) { > > /* skip the segrec. */ > >- if ((off % DEV_BSIZE) == 0) > >+ if ((off % real_dev_bsize) == 0) > > continue; > > switch (rec->rec_jrefrec.jr_op) { > > case JOP_ADDREF: > >... > > This is all sort of backwards. DEV_BSIZE is the real DEV_BSIZE. Old code > that uses units of DEV_BSIZE spells this as dev_bsize although it is > constant. New code that uses units of sectors spelled this as DEV_BSIZE > although the correct size is variable; now the new code spelled this as > real_dev_bsize, which is variable but unrelated to dev_bsize = DEV_BSIZE > :-). We need to write the journal block atomically, and device/geom level only provides that: - we must write at least sector; - any write bigger then sector is not atomic. > > I think using fragments instead of sectors would work better. For a start, > you can get the fragment size from the superblock and not have to worry > about labels of ioctls. No, it would not, since fragment write can be non-atomic. > > There is a problem locating the superblock. The kernel and various > utilities only try reading the superblock with size SBSIZE (8K). This > fails if the sector size is not a divisor of 8K. They also only try > reading at various offsets which might not be a multiple of the sector > size, but which are larger than 8K so they are more likely to be a > multiple. I fixed this problem in fsck_msdosfs where it is larger > (fsck_msdosfs assumes that the sector size is a divisor of 512, but > sector sizes like 2K and 4K are now common relative to sizes of 16K). > > Hmm, the initialization of dev_bsize is even more obfuscated than I > remembered: > > % /* > % * Compute block size that the file system is based on, > % * according to fsbtodb, and adjust superblock block number > % * so we can tell if this is an alternate later. > % */ > % super *= dev_bsize; > > We may have needed a fake sector size to read the superblock. I forget if > we got a non-fake one from the label or ioctl before here. We should have > just tried all reasonable power of 2 sizes between 1 and 1M (I use > 64K down to DOSBOOTBLOCKSIZE = 512 in fsck_msdosfs). > > % dev_bsize = sblock.fs_fsize / fsbtodb(&sblock, 1); > > This is an obfuscated way of spelling DEV_BSIZE. Although fsbtodb() > converts to blocks with fixed size DEV_BSIZE, it needs a variable shift > count to handle the variable fragment size. This shift count is > fs_fsbtodb, which was initialized in newfs to > ilog2(fs_fsize / sectorsize). But sectorsize was fake there > (always DEV_BSIZE). Suppose for example than fs_fsize is the default > (2K). Then fs_fsbtodb = ilog2(4) = 2, so fsbtodb(&sbblock, 1) is 4 > and dev_bsize = 2K / 4 = DEV_BSIZE again. It is always DEV_BSIZE again, > since the fake sectorsize in newfs is always DEV_BSIZE. The fakery is > extended to fsck_ffs by the above assignment. The above assignment is > necessary if the kernel is changed to not fake things so much, or to > fake them differently (newfs can't can change it's i/o methods but can't > change the generated fs_fsbtodb etc. unless the kernel changes. All it > could do much better is to make the conversions between different block > sizes (real and virtual either way) more explicit). However, things > are already confusing enough with a fixed DEV_BSIZE. Again, you are describing what I pasted above, and what I referred to as "UFS (ok, FFS) pretending that sector size is 512 bytes". I agree that it would be cleaner if FFS does not make such obscuring arithmetic, but it did not matter while i/o was done in units of whole fragments. Journal requirements are different.
pgpHJ1kKcmcGg.pgp
Description: PGP signature