On 2025/07/14 14:53, Christoph Hellwig wrote: > On Fri, Jul 11, 2025 at 05:44:26PM +0900, Damien Le Moal wrote: >> On 7/11/25 5:09 PM, John Garry wrote: >>> This value in io_min is used to configure any atomic write limit for the >>> stacked device. The idea is that the atomic write unit max is a >>> power-of-2 factor of the stripe size, and the stripe size is available >>> in io_min. >>> >>> Using io_min causes issues, as: >>> a. it may be mutated >>> b. the check for io_min being set for determining if we are dealing with >>> a striped device is hard to get right, as reported in [0]. >>> >>> This series now sets chunk_sectors limit to share stripe size. >> >> Hmm... chunk_sectors for a zoned device is the zone size. So is this all safe >> if we are dealing with a zoned block device that also supports atomic writes >> ? > > Btw, I wonder if it's time to decouple the zone size from the chunk > size eventually. It seems like a nice little hack, but with things > like parity raid for zoned devices now showing up at least in academia, > and nvme devices reporting chunk sizes the overload might not be that > good any more.
Agreed, it would be nice to clean that up. BUT, the chunk_sectors sysfs attribute file is reporting the zone size today. Changing that may break applications. So I am not sure if we can actually do that, unless the sysfs interface is considered as "unstable" ? > >> Not that I know of any such device, but better be safe, so maybe for now >> do not enable atomic write support on zoned devices ? > > How would atomic writes make sense for zone devices? Because all writes > up to the reported write pointer must be valid, there usual checks for > partial updates a lacking, so the only use would be to figure out if a > write got truncated. At least for file systems we detects this using the > fs metadata that must be written on I/O completion anyway, so the only > user would be an application with some sort of speculative writes that > can't detect partial writes. Which sounds rather fringe and dangerous. The only thing I can think of which would make sense is to avoid torn writes with SAS drives. But in itself, that is extremely niche. > > Now we should be able to implement the software atomic writes pretty > easily for zoned XFS, and funnily they might actually be slightly faster > than normal writes due to the transaction batching. Now that we're > getting reasonable test coverage we should be able to give it a spin, but > I have a few too many things on my plate at the moment. -- Damien Le Moal Western Digital Research