Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-09 Thread Dave Chinner
On Fri, May 08, 2015 at 11:02:28PM -0400, Rik van Riel wrote: > On 05/08/2015 09:14 PM, Linus Torvalds wrote: > > On Fri, May 8, 2015 at 9:59 AM, Rik van Riel wrote: > >> > >> However, for persistent memory, all of the files will be "in memory". > > > > Yes. However, I doubt you will find a very

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-08 Thread Linus Torvalds
On Fri, May 8, 2015 at 8:02 PM, Rik van Riel wrote: > > The TLB performance bonus of accessing the large files with > large pages may make it worthwhile to solve that hard problem. Very few people can actually measure that TLB advantage on systems with good TLB's. It's largely a myth, fed by som

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-08 Thread Rik van Riel
On 05/08/2015 09:14 PM, Linus Torvalds wrote: > On Fri, May 8, 2015 at 9:59 AM, Rik van Riel wrote: >> >> However, for persistent memory, all of the files will be "in memory". > > Yes. However, I doubt you will find a very sane rw filesystem that > then also makes them contiguous and aligns them

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-08 Thread Linus Torvalds
On Fri, May 8, 2015 at 9:59 AM, Rik van Riel wrote: > > However, for persistent memory, all of the files will be "in memory". Yes. However, I doubt you will find a very sane rw filesystem that then also makes them contiguous and aligns them at 2MB boundaries. Anything is possible, I guess, but t

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-08 Thread John Stoffel
> "Linus" == Linus Torvalds writes: Linus> On Fri, May 8, 2015 at 7:40 AM, John Stoffel wrote: >> >> Now go and look at your /home or /data/ or /work areas, where the >> endusers are actually keeping their day to day work. Photos, mp3, >> design files, source code, object code littered aro

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-08 Thread Rik van Riel
On 05/08/2015 11:54 AM, Linus Torvalds wrote: > On Fri, May 8, 2015 at 7:40 AM, John Stoffel wrote: >> >> Now go and look at your /home or /data/ or /work areas, where the >> endusers are actually keeping their day to day work. Photos, mp3, >> design files, source code, object code littered aroun

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-08 Thread Al Viro
On Fri, May 08, 2015 at 08:54:06AM -0700, Linus Torvalds wrote: > However, the big files in that list are almost immaterial from a > caching standpoint. .git/objects/pack/* caching matters a lot, though... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-08 Thread Linus Torvalds
On Fri, May 8, 2015 at 7:40 AM, John Stoffel wrote: > > Now go and look at your /home or /data/ or /work areas, where the > endusers are actually keeping their day to day work. Photos, mp3, > design files, source code, object code littered around, etc. However, the big files in that list are alm

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-08 Thread Rik van Riel
On 05/08/2015 10:05 AM, Ingo Molnar wrote: > * Rik van Riel wrote: >> Memory trends point in one direction, file size trends in another. >> >> For persistent memory, we would not need 4kB page struct pages >> unless memory from a particular area was in small files AND those >> files were being

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-08 Thread John Stoffel
> "Ingo" == Ingo Molnar writes: Ingo> * Rik van Riel wrote: >> The disadvantage is pretty obvious too: 4kB pages would no longer be >> the fast case, with an indirection. I do not know how much of an >> issue that would be, or whether it even makes sense for 4kB pages to >> continue bein

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-08 Thread Ingo Molnar
* Rik van Riel wrote: > The disadvantage is pretty obvious too: 4kB pages would no longer be > the fast case, with an indirection. I do not know how much of an > issue that would be, or whether it even makes sense for 4kB pages to > continue being the fast case going forward. I strongly disa

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-08 Thread Rik van Riel
On 05/07/2015 03:11 PM, Ingo Molnar wrote: > Stable, global page-struct descriptors are a given for real RAM, where > we allocate a struct page for every page in nice, large, mostly linear > arrays. > > We'd really need that for pmem too, to get the full power of struct > page: and that means

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-08 Thread Al Viro
On Fri, May 08, 2015 at 11:26:01AM +0200, Ingo Molnar wrote: > > * Al Viro wrote: > > > On Fri, May 08, 2015 at 07:37:59AM +0200, Ingo Molnar wrote: > > > > > So if code does iov_iter_get_pages_alloc() on a user address that > > > has a real struct page behind it - and some other code does a

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-08 Thread Ingo Molnar
* Al Viro wrote: > On Fri, May 08, 2015 at 07:37:59AM +0200, Ingo Molnar wrote: > > > So if code does iov_iter_get_pages_alloc() on a user address that > > has a real struct page behind it - and some other code does a > > regular get_user_pages() on it, we'll have two sets of struct page > >

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-08 Thread Al Viro
On Fri, May 08, 2015 at 07:37:59AM +0200, Ingo Molnar wrote: > same as iov_iter_get_pages(), except that pages array is allocated > (kmalloc if possible, vmalloc if that fails) and left for caller to > free. Lustre and NFS ->direct_IO() switched to it. > > Signed-off-by: Al V

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-07 Thread Ingo Molnar
Al, I was wondering about the struct page rules of iov_iter_get_pages_alloc(), used in various places. There's no documentation whatsoever in lib/iov_iter.c, nor in include/linux/uio.h, and the changelog that introduced it only says: commit 91f79c43d1b54d7154b118860d81b39bad07dfff Author: A

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-07 Thread Jerome Glisse
On Thu, May 07, 2015 at 09:53:13PM +0200, Ingo Molnar wrote: > > * Ingo Molnar wrote: > > > > Is handling kernel pagefault on the vmemmap completely out of the > > > picture ? So we would carveout a chunck of kernel address space > > > for those pfn and use it for vmemmap and handle pagefault

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-07 Thread Dan Williams
On Thu, May 7, 2015 at 10:43 AM, Linus Torvalds wrote: > On Thu, May 7, 2015 at 9:03 AM, Dan Williams wrote: >> >> Ok, I'll keep thinking about this and come back when we have a better >> story about passing mmap'd persistent memory around in userspace. > > Ok. And if we do decide to go with your

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-07 Thread Ingo Molnar
* Ingo Molnar wrote: > > Is handling kernel pagefault on the vmemmap completely out of the > > picture ? So we would carveout a chunck of kernel address space > > for those pfn and use it for vmemmap and handle pagefault on it. > > That's pretty clever. The page fault doesn't even have to do

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-07 Thread Ingo Molnar
* Jerome Glisse wrote: > > So I think the main value of struct page is if everyone on the > > system sees the same struct page for the same pfn - not just the > > temporary IO instance. > > > > The idea of having very temporary struct page arrays misses the > > point I think: if struct page

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-07 Thread Dan Williams
On Thu, May 7, 2015 at 11:40 AM, Ingo Molnar wrote: > > * Dan Williams wrote: > >> On Thu, May 7, 2015 at 9:18 AM, Christoph Hellwig wrote: >> > On Wed, May 06, 2015 at 05:19:48PM -0700, Linus Torvalds wrote: >> >> What is the primary thing that is driving this need? Do we have a very >> >> conc

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-07 Thread Jerome Glisse
On Thu, May 07, 2015 at 09:11:07PM +0200, Ingo Molnar wrote: > > * Dave Hansen wrote: > > > On 05/07/2015 10:42 AM, Dan Williams wrote: > > > On Thu, May 7, 2015 at 10:36 AM, Ingo Molnar wrote: > > >> * Dan Williams wrote: > > >> > > >> So is there anything fundamentally wrong about creating s

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-07 Thread Ingo Molnar
* Dave Hansen wrote: > On 05/07/2015 10:42 AM, Dan Williams wrote: > > On Thu, May 7, 2015 at 10:36 AM, Ingo Molnar wrote: > >> * Dan Williams wrote: > >> > >> So is there anything fundamentally wrong about creating struct > >> page backing at mmap() time (and making sure aliased mmaps share

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-07 Thread Ingo Molnar
* Dan Williams wrote: > On Thu, May 7, 2015 at 9:18 AM, Christoph Hellwig wrote: > > On Wed, May 06, 2015 at 05:19:48PM -0700, Linus Torvalds wrote: > >> What is the primary thing that is driving this need? Do we have a very > >> concrete example? > > > > FYI, I plan to to implement RAID accele

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-07 Thread Dave Hansen
On 05/07/2015 10:42 AM, Dan Williams wrote: > On Thu, May 7, 2015 at 10:36 AM, Ingo Molnar wrote: >> * Dan Williams wrote: >> So is there anything fundamentally wrong about creating struct page >> backing at mmap() time (and making sure aliased mmaps share struct >> page arrays)? > > Something l

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-07 Thread Ingo Molnar
* Dan Williams wrote: > > That looks like a layering violation and a mistake to me. If we > > want to do direct (sector_t -> sector_t) IO, with no serialization > > worries, it should have its own (simple) API - which things like > > hierarchical RAID or RDMA APIs could use. > > I'm wrapped

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-07 Thread Linus Torvalds
On Thu, May 7, 2015 at 9:03 AM, Dan Williams wrote: > > Ok, I'll keep thinking about this and come back when we have a better > story about passing mmap'd persistent memory around in userspace. Ok. And if we do decide to go with your kind of "__pfn" type, I'd probably prefer that we encode the ty

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-07 Thread Dan Williams
On Thu, May 7, 2015 at 10:36 AM, Ingo Molnar wrote: > > * Dan Williams wrote: > >> > Anyway, I did want to say that while I may not be convinced about >> > the approach, I think the patches themselves don't look horrible. >> > I actually like your "__pfn_t". So while I (very obviously) have >> >

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-07 Thread Ingo Molnar
* Dan Williams wrote: > > Anyway, I did want to say that while I may not be convinced about > > the approach, I think the patches themselves don't look horrible. > > I actually like your "__pfn_t". So while I (very obviously) have > > some doubts about this approach, it may be that the most

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-07 Thread Jerome Glisse
On Thu, May 07, 2015 at 06:18:07PM +0200, Christoph Hellwig wrote: > On Wed, May 06, 2015 at 05:19:48PM -0700, Linus Torvalds wrote: > > What is the primary thing that is driving this need? Do we have a very > > concrete example? > > FYI, I plan to to implement RAID acceleration using nvdimms, and

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-07 Thread Dan Williams
On Thu, May 7, 2015 at 9:18 AM, Christoph Hellwig wrote: > On Wed, May 06, 2015 at 05:19:48PM -0700, Linus Torvalds wrote: >> What is the primary thing that is driving this need? Do we have a very >> concrete example? > > FYI, I plan to to implement RAID acceleration using nvdimms, and I plan to >

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-07 Thread Christoph Hellwig
On Wed, May 06, 2015 at 05:19:48PM -0700, Linus Torvalds wrote: > What is the primary thing that is driving this need? Do we have a very > concrete example? FYI, I plan to to implement RAID acceleration using nvdimms, and I plan to ue pages for that. The code just merge for 4.1 can easily support

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-07 Thread Dan Williams
On Thu, May 7, 2015 at 8:58 AM, Linus Torvalds wrote: > On Thu, May 7, 2015 at 8:40 AM, Dan Williams wrote: >> >> blkdev_get(FMODE_EXCL) is the protection in this case. > > Ugh. That looks like a horrible nasty big hammer that will bite us > badly some day. Since you'd have to hold it for the who

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-07 Thread Linus Torvalds
On Thu, May 7, 2015 at 8:40 AM, Dan Williams wrote: > > blkdev_get(FMODE_EXCL) is the protection in this case. Ugh. That looks like a horrible nasty big hammer that will bite us badly some day. Since you'd have to hold it for the whole IO. But I guess it at least works. Anyway, I did want to say

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-07 Thread Dan Williams
On Thu, May 7, 2015 at 7:42 AM, Ingo Molnar wrote: > > * Ingo Molnar wrote: > >> [...] >> >> For anything more complex, that maps any of this storage to >> user-space, or exposes it to higher level struct page based APIs, >> etc., where references matter and it's more of a cache with >> potential

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-07 Thread Dan Williams
On Thu, May 7, 2015 at 8:00 AM, Linus Torvalds wrote: > On Wed, May 6, 2015 at 7:36 PM, Dan Williams wrote: >> >> My pet concrete example is covered by __pfn_t. Referencing persistent >> memory in an md/dm hierarchical storage configuration. Setting aside >> the thrash to get existing block use

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-07 Thread Linus Torvalds
On Wed, May 6, 2015 at 7:36 PM, Dan Williams wrote: > > My pet concrete example is covered by __pfn_t. Referencing persistent > memory in an md/dm hierarchical storage configuration. Setting aside > the thrash to get existing block users to do "bvec_set_page(page)" > instead of "bvec->page = pag

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-07 Thread Ingo Molnar
* Ingo Molnar wrote: > [...] > > For anything more complex, that maps any of this storage to > user-space, or exposes it to higher level struct page based APIs, > etc., where references matter and it's more of a cache with > potentially multiple users, not an IO space, the natural API is > s

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-07 Thread Ingo Molnar
* Dan Williams wrote: > > What is the primary thing that is driving this need? Do we have a > > very concrete example? > > My pet concrete example is covered by __pfn_t. Referencing > persistent memory in an md/dm hierarchical storage configuration. > Setting aside the thrash to get existi

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-06 Thread Dan Williams
On Wed, May 6, 2015 at 5:19 PM, Linus Torvalds wrote: > On Wed, May 6, 2015 at 4:47 PM, Dan Williams wrote: >> >> Conceptually better, but certainly more difficult to audit if the fake >> struct page is initialized in a subtle way that breaks when/if it >> leaks to some unwitting context. > > May

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-06 Thread Linus Torvalds
On Wed, May 6, 2015 at 4:47 PM, Dan Williams wrote: > > Conceptually better, but certainly more difficult to audit if the fake > struct page is initialized in a subtle way that breaks when/if it > leaks to some unwitting context. Maybe. It could go either way, though. In particular, with the "dyn

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-06 Thread Dan Williams
On Wed, May 6, 2015 at 3:10 PM, Linus Torvalds wrote: > On Wed, May 6, 2015 at 1:04 PM, Dan Williams wrote: >> >> The motivation for this change is persistent memory and the desire to >> use it not only via the pmem driver, but also as a memory target for I/O >> (DAX, O_DIRECT, DMA, RDMA, etc) in

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-06 Thread Linus Torvalds
On Wed, May 6, 2015 at 1:04 PM, Dan Williams wrote: > > The motivation for this change is persistent memory and the desire to > use it not only via the pmem driver, but also as a memory target for I/O > (DAX, O_DIRECT, DMA, RDMA, etc) in other parts of the kernel. I detest this approach. I'd muc

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-06 Thread Al Viro
On Wed, May 06, 2015 at 04:04:53PM -0400, Dan Williams wrote: > Changes since v1 [1]: > > 1/ added include/asm-generic/pfn.h for the __pfn_t definition and helpers. > > 2/ added kmap_atomic_pfn_t() > > 3/ rebased on v4.1-rc2 > > [1]: http://marc.info/?l=linux-kernel&m=142653770511970&w=2 > > -

[PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-06 Thread Dan Williams
Changes since v1 [1]: 1/ added include/asm-generic/pfn.h for the __pfn_t definition and helpers. 2/ added kmap_atomic_pfn_t() 3/ rebased on v4.1-rc2 [1]: http://marc.info/?l=linux-kernel&m=142653770511970&w=2 --- A lead in note, this looks scarier than it is. Most of the code thrash is autom