Re: fs/udf: vm pages "overlap" while reading large dir

Bruce Evans Tue, 12 Feb 2008 18:16:10 -0800

On Tue, 12 Feb 2008, Andriy Gapon wrote:

on 12/02/2008 15:11 Bruce Evans said the following:

On Tue, 12 Feb 2008, Poul-Henning Kamp wrote:

In message <[EMAIL PROTECTED]>, Andriy Gapon writes:

3.1. for a fresh buf getlbk would assign the following:
bsize = bo->bo_bsize;
offset = blkno * bsize;

That's clearly wrong.


If units were always 512-blocks, then anything except bsize = DEV_BSIZE
would be clearly wrong.  Things aren't that simple (but probably should
be).  Even RELENG_3 has bsize = f_iosize (possibly != 512) for non-disks.


O, I missed this obvious thing!


Me too.

Bruce, thank you for putting me back on the ground :-)
Even in UDF case, when we bread() via a file or directory vnode we pass
a logical 2048-byte block number (within the file) to bread(). In this
case the code in getblk() does the right thing when it calculates offset
as blkno * 2048. Calculating it assuming 512-byte units would be a disaster.


I think the size is mnt_stat.f_iosize in general (but not for device vnodes).

And the actual reading works correctly because udf_strategy is called
for such vnodes and it translates block numbers from physical to logical
and also correctly re-calculates b_iooffset for actual reading. So
b_iooffset value set in breadn() (using bdtob) is completely ignored.


Is this setting ever used (and the corresponding setting for writing)
ever used?

I remember that this is why g_vfs_* was my primary target.
It seems that I revert my opinion back: either g_vfs_open should be
smart about setting bo_bsize, or g_vfs_strategy should be smart about
calculating bio_offset, or individual filesystems should not depend on
g_vfs_* always doing the best thing for them.


In fact, ffs already overrides the setting of bo_bsize for the device
vnode to a different wrong setting -- g_vfs_open() sets the sector size,
and ffs_mount() changes the setting to fs_bsize, but ffs actually needs
the setting to be DEV_BSIZE (I think).  Other bugs from this:
- ffs_rawread wants the sector size, and it assumes that this is in bo_bsize
  for the device vnode, but ffs_mount() has changed this to fs_bsize.
- multiple r/o mounts are supposed to work, but don't, since there is only
  one device vnode with a shared bufobj, but the bufobj needs to be
  per-file-system since all mounts write to it.  Various bad things
  including panics result.  There is a well-know panic from bo_private
  becoming garbage on unmount.  I just noticed more possibilities for
  panics.  bo_ops points to static storage, so it never becomes complete
  garbage.  However, at least ffs sets it blindly early on in
  ffs_mountfs(), before looking at the file system to see if ffs can
  mount it.  Thus if the file system is already mounted by another
  ffs, then ffs clobbers the other fs's bo_ops.  The ffs mount will
  presumably fail, leaving bo_ops clobbered.  Also, a successful
  g_vfs_open() has clobbered bo_ops, bo_private and bo_bsize a little
  earlier.  g_vfs_open() is fundamental to looking at the file system,
  since I/O is not set up until it completes.  Clobbering the pointers
  is most dangerous, but just clobbering bo_bsize breaks blkno
  calculations for any code that uses bo_bsize.

Apart from these bugs, the fix for the blkno calculations for device
vp's may be as simple as setting bo_bsize to DEV_BSIZE for the device
vp of all disk file systems (since all disk file systems use expect
this size).  Currently, ffs seems to be the only file system that
overrides g_vfs_open()'s default of the sector size.  Nothing in any
of fs/, gnu/fs/ and contrib/ references bo_bsize.

In any case, it seems that it is the g_vfs_* that is currently
inconsistent: it sets bo_bsize to a somewhat arbitrary value, but
expects that it would always be provided with correct b_iooffset (the
latter being rigidly calculated via bdtob() in buffcache code). So this
leaves filesystems dead in the water while reading via device vnode:
provide blkno in 512-byte units - screw up VM cache, provide blkno in
bo_bsize units - screw up reading from disk.


I think I/O on the disk doesn't use bo_bsize, but only the sector size
(to scale the byte count to a sector count).  Otherwise, ffs's override
would break I/O.  geom is another place that has few references to
bo_bsize -- it just clobbers it in g_vfs_open().

Not sure if the FS-s should have wits to set bo_bsize properly and/or
provide proper bo_ops - we are talking about a device vnode here.
Should filesystems be involved in the business of setting its
properties? Not sure.


Yes.  They can set it more easily as they can tell g_vfs_open() what to
set it to.  Except, until the bugs are fixed properly, g_vfs_open() can
more easily do tests to prevent clobbering.  For bo_bsize and bo_ops,
sharing a common value is safe and safe changes can be detected by
setting to a special value on last unmount.  For bo_private, a safety
check would probably disallow multiple mounts (since cp is dynamically
allocated (?)).

Bruce
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: fs/udf: vm pages "overlap" while reading large dir

Reply via email to