Re: Phyr Starter

2022-01-20 Thread John Hubbard
On 1/20/22 6:12 AM, Christoph Hellwig wrote: On Tue, Jan 11, 2022 at 12:17:18AM -0800, John Hubbard wrote: Zooming in on the pinning aspect for a moment: last time I attempted to convert O_DIRECT callers from gup to pup, I recall wanting very much to record, in each bio_vec, whether these pages

Re: Phyr Starter

2022-01-20 Thread Robin Murphy
On 2022-01-20 15:27, Keith Busch wrote: On Thu, Jan 20, 2022 at 02:56:02PM +0100, Christoph Hellwig wrote: - on the input side to dma mapping the bio_vecs (or phyrs) are chained as bios or whatever the containing structure is. These already exist and have infrastructure at least in th

Re: Phyr Starter

2022-01-20 Thread Jason Gunthorpe
On Thu, Jan 20, 2022 at 03:03:40PM +0100, Christoph Hellwig wrote: > On Wed, Jan 12, 2022 at 06:37:03PM +, Matthew Wilcox wrote: > > But let's go further than that (which only brings us to 32 bytes per > > range). For the systems you care about which use an identity mapping, > > and have sizeo

Re: Phyr Starter

2022-01-20 Thread Christoph Hellwig
On Thu, Jan 20, 2022 at 07:27:36AM -0800, Keith Busch wrote: > It doesn't look like IOMMU page sizes are exported, or even necessarily > consistently sized on at least one arch (power). At the DMA API layer dma_get_merge_boundary is the API for it.

Re: Phyr Starter

2022-01-20 Thread Keith Busch
On Thu, Jan 20, 2022 at 02:56:02PM +0100, Christoph Hellwig wrote: > - on the input side to dma mapping the bio_vecs (or phyrs) are chained >as bios or whatever the containing structure is. These already exist >and have infrastructure at least in the block layer > - on the output side I

Re: Phyr Starter

2022-01-20 Thread Christoph Hellwig
On Tue, Jan 11, 2022 at 12:17:18AM -0800, John Hubbard wrote: > Zooming in on the pinning aspect for a moment: last time I attempted to > convert O_DIRECT callers from gup to pup, I recall wanting very much to > record, in each bio_vec, whether these pages were acquired via FOLL_PIN, > or some non-

Re: Phyr Starter

2022-01-20 Thread Christoph Hellwig
On Tue, Jan 11, 2022 at 04:26:48PM -0400, Jason Gunthorpe wrote: > What I did in RDMA was make an iterator rdma_umem_for_each_dma_block() > > The driver passes in the page size it wants and the iterator breaks up > the SGL into that size. > > So, eg on a 16k page size system the SGL would be full

Re: Phyr Starter

2022-01-20 Thread Christoph Hellwig
On Wed, Jan 12, 2022 at 06:37:03PM +, Matthew Wilcox wrote: > But let's go further than that (which only brings us to 32 bytes per > range). For the systems you care about which use an identity mapping, > and have sizeof(dma_addr_t) == sizeof(phys_addr_t), we can simply > point the dma_range p

Re: Phyr Starter

2022-01-20 Thread Christoph Hellwig
On Tue, Jan 11, 2022 at 11:01:42AM -0400, Jason Gunthorpe wrote: > Then we are we using get_user_phyr() at all if we are just storing it > in a sg? I think we need to stop calling the output of the phyr dma map helper a sg. Yes, a { dma_addr, len } tuple is scatter/gather I/O in its purest form,

Re: Phyr Starter

2022-01-20 Thread Christoph Hellwig
On Mon, Jan 10, 2022 at 08:41:26PM -0400, Jason Gunthorpe wrote: > > Finally, it may be possible to stop using scatterlist to describe the > > input to the DMA-mapping operation. We may be able to get struct > > scatterlist down to just dma_address and dma_length, with chaining > > handled through

Re: Phyr Starter

2022-01-20 Thread Christoph Hellwig
On Mon, Jan 10, 2022 at 07:34:49PM +, Matthew Wilcox wrote: > TLDR: I want to introduce a new data type: > > struct phyr { > phys_addr_t addr; > size_t len; > }; > > and use it to replace bio_vec as well as using it to replace the array > of struct pages used by get_user_pages

Re: Phyr Starter

2022-01-12 Thread Jason Gunthorpe
On Wed, Jan 12, 2022 at 06:37:03PM +, Matthew Wilcox wrote: > On Tue, Jan 11, 2022 at 06:53:06PM -0400, Jason Gunthorpe wrote: > > IOMMU is not common in those cases, it is slow. > > > > So you end up with 16 bytes per entry then another 24 bytes in the > > entirely redundant scatter list. Tha

Re: Phyr Starter

2022-01-12 Thread Matthew Wilcox
On Tue, Jan 11, 2022 at 06:53:06PM -0400, Jason Gunthorpe wrote: > IOMMU is not common in those cases, it is slow. > > So you end up with 16 bytes per entry then another 24 bytes in the > entirely redundant scatter list. That is now 40 bytes/page for typical > HPC case, and I can't see that being

Re: Phyr Starter

2022-01-11 Thread Logan Gunthorpe
On 2022-01-11 4:02 p.m., Jason Gunthorpe wrote: > On Tue, Jan 11, 2022 at 03:57:07PM -0700, Logan Gunthorpe wrote: >> >> >> On 2022-01-11 3:53 p.m., Jason Gunthorpe wrote: >>> I just want to share the whole API that will have to exist to >>> reasonably support this flexible array of intervals da

Re: Phyr Starter

2022-01-11 Thread Logan Gunthorpe
On 2022-01-11 3:57 p.m., Jason Gunthorpe wrote: > On Tue, Jan 11, 2022 at 03:09:13PM -0700, Logan Gunthorpe wrote: > >> Either that, or we need a wrapper that allocates an appropriately >> sized SGL to pass to any dma_map implementation that doesn't support >> the new structures. > > This is w

Re: Phyr Starter

2022-01-11 Thread Jason Gunthorpe
On Tue, Jan 11, 2022 at 03:57:07PM -0700, Logan Gunthorpe wrote: > > > On 2022-01-11 3:53 p.m., Jason Gunthorpe wrote: > > I just want to share the whole API that will have to exist to > > reasonably support this flexible array of intervals data structure.. > > Is that really worth it? I feel li

Re: Phyr Starter

2022-01-11 Thread Jason Gunthorpe
On Tue, Jan 11, 2022 at 03:09:13PM -0700, Logan Gunthorpe wrote: > Either that, or we need a wrapper that allocates an appropriately > sized SGL to pass to any dma_map implementation that doesn't support > the new structures. This is what I think we should do. If we start with RDMA then we can mo

Re: Phyr Starter

2022-01-11 Thread Logan Gunthorpe
On 2022-01-11 3:53 p.m., Jason Gunthorpe wrote: > I just want to share the whole API that will have to exist to > reasonably support this flexible array of intervals data structure.. Is that really worth it? I feel like type safety justifies replicating a bit of iteration and allocation infrast

Re: Phyr Starter

2022-01-11 Thread Jason Gunthorpe
On Tue, Jan 11, 2022 at 09:25:40PM +, Matthew Wilcox wrote: > > I don't need the sgt at all. I just need another list of physical > > addresses for DMA. I see no issue with a phsr_list storing either CPU > > Physical Address or DMA Physical Addresses, same data structure. > > There's a differe

Re: Phyr Starter

2022-01-11 Thread Logan Gunthorpe
On 2022-01-11 2:25 p.m., Matthew Wilcox wrote: > That's reproducing the bad decision of the scatterlist, only with > a different encoding. You end up with something like: > > struct neoscat { > dma_addr_t dma_addr; > phys_addr_t phys_addr; > size_t dma_len; > size_t phy

Re: Phyr Starter

2022-01-11 Thread Matthew Wilcox
On Tue, Jan 11, 2022 at 04:21:59PM -0400, Jason Gunthorpe wrote: > On Tue, Jan 11, 2022 at 06:33:57PM +, Matthew Wilcox wrote: > > > > Then we are we using get_user_phyr() at all if we are just storing it > > > in a sg? > > > > I did consider just implementing get_user_sg() (actually 4 years

Re: Phyr Starter

2022-01-11 Thread Jason Gunthorpe
On Tue, Jan 11, 2022 at 10:05:40AM +0100, Daniel Vetter wrote: > If we go with page size I think hardcoding a PHYS_PAGE_SIZE KB(4) > would make sense, because thanks to x86 that's pretty much the lowest > common denominator that all hw (I know of at least) supports. Not > having to fiddle with "wh

Re: Phyr Starter

2022-01-11 Thread Jason Gunthorpe
On Tue, Jan 11, 2022 at 06:33:57PM +, Matthew Wilcox wrote: > > Then we are we using get_user_phyr() at all if we are just storing it > > in a sg? > > I did consider just implementing get_user_sg() (actually 4 years ago), > but that cements the use of sg as both an input and output data struc

Re: Phyr Starter

2022-01-11 Thread Matthew Wilcox
On Tue, Jan 11, 2022 at 11:01:42AM -0400, Jason Gunthorpe wrote: > On Tue, Jan 11, 2022 at 04:32:56AM +, Matthew Wilcox wrote: > > On Mon, Jan 10, 2022 at 08:41:26PM -0400, Jason Gunthorpe wrote: > > > On Mon, Jan 10, 2022 at 07:34:49PM +, Matthew Wilcox wrote: > > > > > > > Finally, it ma

Re: Phyr Starter

2022-01-11 Thread Logan Gunthorpe
On 2022-01-11 1:17 a.m., John Hubbard wrote: > On 1/10/22 11:34, Matthew Wilcox wrote: >> TLDR: I want to introduce a new data type: >> >> struct phyr { >> phys_addr_t addr; >> size_t len; >> }; >> >> and use it to replace bio_vec as well as using it to replace the array >> of

Re: Phyr Starter

2022-01-11 Thread Jason Gunthorpe
On Tue, Jan 11, 2022 at 02:01:17PM +, Matthew Wilcox wrote: > On Tue, Jan 11, 2022 at 12:17:18AM -0800, John Hubbard wrote: > > Zooming in on the pinning aspect for a moment: last time I attempted to > > convert O_DIRECT callers from gup to pup, I recall wanting very much to > > record, in each

Re: Phyr Starter

2022-01-11 Thread Jason Gunthorpe
On Tue, Jan 11, 2022 at 04:32:56AM +, Matthew Wilcox wrote: > On Mon, Jan 10, 2022 at 08:41:26PM -0400, Jason Gunthorpe wrote: > > On Mon, Jan 10, 2022 at 07:34:49PM +, Matthew Wilcox wrote: > > > > > Finally, it may be possible to stop using scatterlist to describe the > > > input to the

Re: Phyr Starter

2022-01-11 Thread Thomas Zimmermann
Hi Am 11.01.22 um 14:56 schrieb Matthew Wilcox: On Tue, Jan 11, 2022 at 12:40:10PM +0100, Thomas Zimmermann wrote: Hi Am 10.01.22 um 20:34 schrieb Matthew Wilcox: TLDR: I want to introduce a new data type: struct phyr { phys_addr_t addr; size_t len; }; Did you look at s

Re: Phyr Starter

2022-01-11 Thread Matthew Wilcox
On Tue, Jan 11, 2022 at 12:17:18AM -0800, John Hubbard wrote: > Zooming in on the pinning aspect for a moment: last time I attempted to > convert O_DIRECT callers from gup to pup, I recall wanting very much to > record, in each bio_vec, whether these pages were acquired via FOLL_PIN, > or some non-

Re: Phyr Starter

2022-01-11 Thread Matthew Wilcox
On Tue, Jan 11, 2022 at 12:40:10PM +0100, Thomas Zimmermann wrote: > Hi > > Am 10.01.22 um 20:34 schrieb Matthew Wilcox: > > TLDR: I want to introduce a new data type: > > > > struct phyr { > > phys_addr_t addr; > > size_t len; > > }; > > Did you look at struct dma_buf_map? [1]

Re: Phyr Starter

2022-01-11 Thread Thomas Zimmermann
Hi Am 10.01.22 um 20:34 schrieb Matthew Wilcox: TLDR: I want to introduce a new data type: struct phyr { phys_addr_t addr; size_t len; }; Did you look at struct dma_buf_map? [1] For graphics framebuffers, we have the problem that these buffers can be in I/O or system memor

Re: Phyr Starter

2022-01-11 Thread Daniel Vetter
Dropping some thoughts from the gpu driver perspective, feel free to tell me it's nonsense from the mm/block view :-) Generally I think we really, really need something like this that's across all subsystems and consistent. On Tue, Jan 11, 2022 at 1:41 AM Jason Gunthorpe wrote: > On Mon, Jan 10,

Re: Phyr Starter

2022-01-11 Thread John Hubbard
On 1/10/22 11:34, Matthew Wilcox wrote: TLDR: I want to introduce a new data type: struct phyr { phys_addr_t addr; size_t len; }; and use it to replace bio_vec as well as using it to replace the array of struct pages used by get_user_pages() and friends. --- This would cert

Re: Phyr Starter

2022-01-10 Thread Matthew Wilcox
On Mon, Jan 10, 2022 at 08:41:26PM -0400, Jason Gunthorpe wrote: > On Mon, Jan 10, 2022 at 07:34:49PM +, Matthew Wilcox wrote: > > > Finally, it may be possible to stop using scatterlist to describe the > > input to the DMA-mapping operation. We may be able to get struct > > scatterlist down

Re: Phyr Starter

2022-01-10 Thread Jason Gunthorpe
On Mon, Jan 10, 2022 at 07:34:49PM +, Matthew Wilcox wrote: > Finally, it may be possible to stop using scatterlist to describe the > input to the DMA-mapping operation. We may be able to get struct > scatterlist down to just dma_address and dma_length, with chaining > handled through an encl

Phyr Starter

2022-01-10 Thread Matthew Wilcox
TLDR: I want to introduce a new data type: struct phyr { phys_addr_t addr; size_t len; }; and use it to replace bio_vec as well as using it to replace the array of struct pages used by get_user_pages() and friends. --- There are two distinct problems I want to address: doing I/O