On Mon, 08 Jul 2013 14:32:09 +0900
HATAYAMA Daisuke <d.hatay...@jp.fujitsu.com> wrote:

> (2013/07/02 4:32), Michael Holzheu wrote:
> > For zfcpdump we can't map the HSA storage because it is only available
> > via a read interface. Therefore, for the new vmcore mmap feature we have
> > introduce a new mechanism to create mappings on demand.
> >
> > This patch introduces a new architecture function remap_oldmem_pfn_range()
> > that should be used to create mappings with remap_pfn_range() for oldmem
> > areas that can be directly mapped. For zfcpdump this is everything besides
> > of the HSA memory. For the areas that are not mapped by 
> > remap_oldmem_pfn_range()
> > a generic vmcore a new generic vmcore fault handler mmap_vmcore_fault()
> > is called.
> >
> 
> This fault handler is only for s390 specific issue. Other architectures don't 
> need
> this for the time being.
> 
> Also, from the same reason, I'm doing this review based on source code only.
> I cannot run the fault handler on meaningful system, which is currently s390 
> only.

You can test the code on other architectures if you do not map anything in 
advance.
For example you could just "return 0" in remap_oldmem_pfn_range():

/*
 * Architectures may override this function to map oldmem
 */
int __weak remap_oldmem_pfn_range(struct vm_area_struct *vma,
                                  unsigned long from, unsigned long pfn,
                                  unsigned long size, pgprot_t prot)
{
        return 0;
}

In that case for all pages the new mechanism would be used.

> 
> I'm also concerned about the fault handler covers a full range of vmcore, 
> which
> could hide some kind of mmap() bug that results in page fault.
> 
> So, the fault handler should be enclosed by ifdef CONFIG_S390 for the time 
> being.

I personally do not like that, but if Vivek and you prefer this, of course we
can do that.

What about something like:

#ifdef CONFIG_S390
static int mmap_vmcore_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
{
...
}
#else
static int mmap_vmcore_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
{
        BUG();
}
#endif

> 
> > This handler works as follows:
> >
> > * Get already available or new page from page cache (find_or_create_page)
> > * Check if /proc/vmcore page is filled with data (PageUptodate)
> > * If yes:
> >    Return that page
> > * If no:
> >    Fill page using __vmcore_read(), set PageUptodate, and return page
> >
> 
> It seems good to me on this page-in logic.
> 
> <cut>
> > @@ -225,6 +250,48 @@ static ssize_t read_vmcore(struct file *file, char 
> > __user *buffer,
> >     return acc;
> >   }
> >
> > +static ssize_t read_vmcore(struct file *file, char __user *buffer,
> > +                      size_t buflen, loff_t *fpos)
> > +{
> > +   return __read_vmcore(buffer, buflen, fpos, 1);
> > +}
> > +
> > +/*
> > + * The vmcore fault handler uses the page cache and fills data using the
> > + * standard __vmcore_read() function.
> > + */
> 
> Could you describe usecase of this fault handler on s390?

ok.

> 
> > +static int mmap_vmcore_fault(struct vm_area_struct *vma, struct vm_fault 
> > *vmf)
> > +{
> > +   struct address_space *mapping = vma->vm_file->f_mapping;
> > +   pgoff_t index = vmf->pgoff;
> > +   struct page *page;
> > +   loff_t src;
> > +   char *buf;
> > +   int rc;
> > +
> 
> You should check where faulting address points to valid range.
> If the fault happens on invalid range, return VM_FAULT_SIGBUS.
> 
> On s390 case, I think the range except for HSA should be thought of as 
> invalid.
> 

Hmmm, this would require another architecture backend interface. If we get a 
fault
for a page outside of the HSA this would be a programming error in our code. Not
sure if we should introduce new architecture interfaces just for sanity checks.

> > +   page = find_or_create_page(mapping, index, GFP_KERNEL);
> > +   if (!page)
> > +           return VM_FAULT_OOM;
> > +   if (!PageUptodate(page)) {
> > +           src = index << PAGE_CACHE_SHIFT;
> 
> src = (loff_t)index << PAGE_CACHE_SHIFT;
> 
> loff_t has long long while index has unsigned long.

Ok.

> On s390 both might have the same byte length.

On s390x both are 64 bit. On our 32 bit s390 archtecture long long is 64 bit
and unsigned long 32 bit.

> Also I prefer offset to src, but this is minor suggestion.

Yes, I agree.

> 
> > +           buf = (void *) (page_to_pfn(page) << PAGE_SHIFT);
> 
> I found page_to_virt() macro.

Looks like page_to_virt() is not usable on most architectures and probably
pfn_to_kaddr(pfn) would be the correct thing here. Unfortunately is also not
defined on s390.

But when I discussed your comment with Martin, we found out that the current

buf = (void *) (page_to_pfn(page) << PAGE_SHIFT);

is not correct on x86. It should be:

buf = __va((page_to_pfn(page) << PAGE_SHIFT));

So would be the following patch ok for you?
---
 fs/proc/vmcore.c           |   94 +++++++++++++++++++++++++++++++++++++++++----
 include/linux/crash_dump.h |    3 +
 2 files changed, 89 insertions(+), 8 deletions(-)

--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -21,6 +21,7 @@
 #include <linux/crash_dump.h>
 #include <linux/list.h>
 #include <linux/vmalloc.h>
+#include <linux/pagemap.h>
 #include <asm/uaccess.h>
 #include <asm/io.h>
 #include "internal.h"
@@ -153,11 +154,35 @@ ssize_t __weak elfcorehdr_read_notes(cha
        return read_from_oldmem(buf, count, ppos, 0);
 }
 
+/*
+ * Architectures may override this function to map oldmem
+ */
+int __weak remap_oldmem_pfn_range(struct vm_area_struct *vma,
+                                 unsigned long from, unsigned long pfn,
+                                 unsigned long size, pgprot_t prot)
+{
+       return remap_pfn_range(vma, from, pfn, size, prot);
+}
+
+/*
+ * Copy to either kernel or user space
+ */
+static int copy_to(void *target, void *src, size_t size, int userbuf)
+{
+       if (userbuf) {
+               if (copy_to_user(target, src, size))
+                       return -EFAULT;
+       } else {
+               memcpy(target, src, size);
+       }
+       return 0;
+}
+
 /* Read from the ELF header and then the crash dump. On error, negative value 
is
  * returned otherwise number of bytes read are returned.
  */
-static ssize_t read_vmcore(struct file *file, char __user *buffer,
-                               size_t buflen, loff_t *fpos)
+static ssize_t __read_vmcore(char *buffer, size_t buflen, loff_t *fpos,
+                            int userbuf)
 {
        ssize_t acc = 0, tmp;
        size_t tsz;
@@ -174,7 +199,7 @@ static ssize_t read_vmcore(struct file *
        /* Read ELF core header */
        if (*fpos < elfcorebuf_sz) {
                tsz = min(elfcorebuf_sz - (size_t)*fpos, buflen);
-               if (copy_to_user(buffer, elfcorebuf + *fpos, tsz))
+               if (copy_to(buffer, elfcorebuf + *fpos, tsz, userbuf))
                        return -EFAULT;
                buflen -= tsz;
                *fpos += tsz;
@@ -192,7 +217,7 @@ static ssize_t read_vmcore(struct file *
 
                tsz = min(elfcorebuf_sz + elfnotes_sz - (size_t)*fpos, buflen);
                kaddr = elfnotes_buf + *fpos - elfcorebuf_sz;
-               if (copy_to_user(buffer, kaddr, tsz))
+               if (copy_to(buffer, kaddr, tsz, userbuf))
                        return -EFAULT;
                buflen -= tsz;
                *fpos += tsz;
@@ -208,7 +233,7 @@ static ssize_t read_vmcore(struct file *
                if (*fpos < m->offset + m->size) {
                        tsz = min_t(size_t, m->offset + m->size - *fpos, 
buflen);
                        start = m->paddr + *fpos - m->offset;
-                       tmp = read_from_oldmem(buffer, tsz, &start, 1);
+                       tmp = read_from_oldmem(buffer, tsz, &start, userbuf);
                        if (tmp < 0)
                                return tmp;
                        buflen -= tsz;
@@ -225,6 +250,58 @@ static ssize_t read_vmcore(struct file *
        return acc;
 }
 
+static ssize_t read_vmcore(struct file *file, char __user *buffer,
+                          size_t buflen, loff_t *fpos)
+{
+       return __read_vmcore(buffer, buflen, fpos, 1);
+}
+
+#ifdef CONFIG_S390
+/*
+ * The vmcore fault handler uses the page cache and fills data using the
+ * standard __vmcore_read() function.
+ *
+ * On s390 the fault handler is used for memory regions that can't be mapped
+ * directly with remap_pfn_range().
+ */
+static int mmap_vmcore_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+       struct address_space *mapping = vma->vm_file->f_mapping;
+       pgoff_t index = vmf->pgoff;
+       struct page *page;
+       loff_t offset;
+       char *buf;
+       int rc;
+
+       page = find_or_create_page(mapping, index, GFP_KERNEL);
+       if (!page)
+               return VM_FAULT_OOM;
+       if (!PageUptodate(page)) {
+               offset = (loff_t) index << PAGE_CACHE_SHIFT;
+               buf = __va((page_to_pfn(page) << PAGE_SHIFT));
+               rc = __read_vmcore(buf, PAGE_SIZE, &offset, 0);
+               if (rc < 0) {
+                       unlock_page(page);
+                       page_cache_release(page);
+                       return (rc == -ENOMEM) ? VM_FAULT_OOM : VM_FAULT_SIGBUS;
+               }
+               SetPageUptodate(page);
+       }
+       unlock_page(page);
+       vmf->page = page;
+       return 0;
+}
+#else
+static int mmap_vmcore_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+       BUG();
+}
+#endif
+
+static const struct vm_operations_struct vmcore_mmap_ops = {
+       .fault = mmap_vmcore_fault,
+};
+
 static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
 {
        size_t size = vma->vm_end - vma->vm_start;
@@ -242,6 +319,7 @@ static int mmap_vmcore(struct file *file
 
        vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
        vma->vm_flags |= VM_MIXEDMAP;
+       vma->vm_ops = &vmcore_mmap_ops;
 
        len = 0;
 
@@ -283,9 +361,9 @@ static int mmap_vmcore(struct file *file
 
                        tsz = min_t(size_t, m->offset + m->size - start, size);
                        paddr = m->paddr + start - m->offset;
-                       if (remap_pfn_range(vma, vma->vm_start + len,
-                                           paddr >> PAGE_SHIFT, tsz,
-                                           vma->vm_page_prot))
+                       if (remap_oldmem_pfn_range(vma, vma->vm_start + len,
+                                                  paddr >> PAGE_SHIFT, tsz,
+                                                  vma->vm_page_prot))
                                goto fail;
                        size -= tsz;
                        start += tsz;
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -17,6 +17,9 @@ extern int __weak elfcorehdr_alloc(unsig
 extern void __weak elfcorehdr_free(unsigned long long addr);
 extern ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos);
 extern ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 
*ppos);
+extern int __weak remap_oldmem_pfn_range(struct vm_area_struct *vma,
+                                        unsigned long from, unsigned long pfn,
+                                        unsigned long size, pgprot_t prot);
 
 extern ssize_t copy_oldmem_page(unsigned long, char *, size_t,
                                                unsigned long, int);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to