Reply Sashiko comment:

> https://sashiko.dev/#/patchset/[email protected]
>
> > +   if (writesize % dio_align != 0) {
> +             ksft_test_result_skip("DIO alignment (%u) incompatible with 
> offset %zu\n",
> +                             dio_align, writesize);
> +             return;
> +     }
>
> Is this alignment check complete? 
>
> Direct I/O requires both the transfer length and the memory buffer address
> to be aligned. Later in this function, start_off is used as the buffer offset:
>       buffer = orig_buffer;
>       buffer += start_off;
> If start_off is pagesize / 2 (e.g., 2048) and writesize is pagesize * 3
> (e.g., 12288), writesize is a multiple of a 4096-byte alignment, so the test
> is not skipped.
>
> However, the memory buffer itself is only 2048-byte aligned. Will the
> subsequent write() still fail with -EINVAL on 4K-sector devices?

TL;DR: Yes, we should do both buffer address and writesize alignment
       checks to satisfy all FS types.

Looking at the kernel code: fs/iomap/direct-dio.c, the only alignment
check there is at line#413, which checks file's pos and write length.

EXT4:

ext4_file_write_iter
  ext4_dio_write_iter
    iomap_dio_rw
      __iomap_dio_rw
        iomap_dio_iter
          iomap_dio_bio_iter

  390   static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio 
*dio)
  391   {
                ...
  403
  404           /*
  405            * File systems that write out of place and always allocate new 
blocks
  406            * need each bio to be block aligned as that's the unit of 
allocation.
  407            */
  408           if (dio->flags & IOMAP_DIO_FSBLOCK_ALIGNED)
  409                   alignment = fs_block_size;
  410           else
  411                   alignment = bdev_logical_block_size(iomap->bdev);
  412
  413           if ((pos | length) & (alignment - 1))
  414                   return -EINVAL;
  415           ...

Sashiko points out the buffer-address should do alignment check as well,
I firstly suspect it based on the FS extra check before the iomap_dio_rw:

ext4_file_write_iter
  ext4_dio_write_iter
    ext4_should_use_dio
       iov_iter_alignment <--- do buffer/writesize alignment check

  842 unsigned long iov_iter_alignment(const struct iov_iter *i)
  843 {
                ...
  853                 return iov_iter_alignment_iovec(i);
                ...
  865 }

  799 static unsigned long iov_iter_alignment_iovec(const struct iov_iter *i)
  800 {
                ...
  809                   res |= (unsigned long)iov->iov_base + skip;
  812                   res |= len;
                ...
  818         return res;
  819 }

But eventually I found that this is only fallback to the buffer I/O
when the direct I/O is unsupported (go to: ext4_buffered_write_iter).
This wouldn't happen in the test as it open with O_DIRECT flag.

Then, I turned to look at Btrfs path:

btrfs_file_write_iter
  btrfs_do_write_iter
    btrfs_direct_write
      check_direct_IO <--- do buffer alignment check
      ...
      btrfs_dio_write
        __iomap_dio_rw <--- do samething like ext4

  778 static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
  779                                const struct iov_iter *iter, loff_t offset)
  780 {
  781         const u32 blocksize_mask = fs_info->sectorsize - 1;
  782
  783         if (offset & blocksize_mask)
  784                 return -EINVAL;
  785
  786         if (iov_iter_alignment(iter) & blocksize_mask)
  787                 return -EINVAL;
  788         return 0;
  789 }

Yes, here I found the evendice that iov_iter_alignment(iter) & blocksize_mask)
do the alignment check.

Unlike ext4 which never reaches the check for normal files, btrfs always checks
buffer alignment for every DIO operation. And it's a hard -EINVAL, not a silent
fallback to buffered I/O.

-- 
Regards,
Li Wang


Reply via email to