Re: [RFC v10 0/4] pstore/block: new support logger for block devices

2019-02-26 Thread Greg Kroah-Hartman
On Tue, Feb 26, 2019 at 02:33:41PM +0800, liaoweixiong wrote:
> Why should we need pstore_block?
> 1. Most embedded intelligent equipment have no persistent ram, which
> increases costs. We perfer to cheaper solutions, like block devices.
> In fast, there is already a sample for block device logger in driver
> MTD (drivers/mtd/mtdoops.c).
> 2. Do not any equipment have battery, which means that it lost all data
> on general ram if power failure. Pstore has little to do for these
> equipments.
> 
> [PATCH v10]

Why are you still labeling these as "RFC"?  No one should actually be
applying a Request For Comments patchset, as you obviously are not
thinking it is ready to be merged :(

After 10 revisions, I hope you are confident in this patchset :)

thanks,

greg k-h


Re: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support

2019-02-26 Thread Peter Zijlstra
On Wed, Feb 20, 2019 at 10:08:48AM -0500, Len Brown wrote:
> Thanks for the comments, Peter. I'll update the patch to address the
> syntax points.  (Maybe checkpatch.pl should be updated to reflect your
> preferences?).

Don't know about checkpatch; I ignore plenty of its output. I think tglx
started a document somewhere for what tip prefers, but I'm not sure
where that went.

> About macros vs C.  I agree with your preference.
> I used macros to be consistent with the existing code, and to be as
> backport friendly as possible.
> (a number of distros need to pull these patches into their supported kernels)
> Sure, I'm willing to write in a cosmetic-only patch, after the
> functional changes are upstream.

Fair enough.

> > It would've been nice to have the CPUID instruction 1F leaf reference
> > 3B-3.9 in the SDM, and maybe mention this here too.
> 
> I didn't mention SDM sections because they change -- leaving stale
> pointers in our commit messages.  The SDM is re-published 4 times per
> year.

Yah, I know. Which is why I keep all SDMs. So if you say, book 3 section
8 of Jul'17, I can find it :-)

> > You haven't explained, and I can't readily find it in the SDM either,
> > how these topology bits relate to caches and interconnects.
> >
> > Will these die thingies share LLC, or will LLC be per die. Will NUMA
> > span dies or not.
> 
> Excellent question.
> Cache enumeration in Leaf-4 is totally unchanged.
> ACPI NUMA tables are totally unchanged.

Sure; and yet Sub-NUMA-Clustering broke stuff in interesting ways. I'm
trying to get a feel for how these levels will interact with all that.

Before that SNC stuff, caches had never spanned NODEs (and I still
think that is 'creative' at best).

> From a scheduler point of view, imagine that a SKX system with 4 die
> in 4 packages was mechanically re-designed so that those 4 die resided
> in 2 double-sized packages.
> 
> They may have tweaked the links between the die, but logically it is
> identical and compatible, and the legacy kernel will function
> properly.

This example has LLC in die and yes that works.

But I can imagine things like L2 in tile and L3 across tiles but within
DIE and then it _might_ make sense to still consider the tile for
scheduling.

Another option is having the LLC off die; also not unheard of.

And then there's many creative and slightly crazy ways this can all be
combined :/

> So the effect of Leaf B,1F is that it defines the scope of MSRs.  eg.
> what processors does a die-scope MSR cover.  That is why the rest of
> the patch is about sysfs topology, and about package MSR scope.
> 
> Yes, there will be more exotic MSR situations in future products --
> the first ones are pretty simple -- something  called a
> package-scope-MSR  in the SDM today becomes a die-scope-MSR in this
> generation on a multi-die/package system.

Yes :-(

> It also reflects how many packages appear in sysfs, and this can
> effect licensing of some kinds of software.

That's just plain insanity and we should not let that affect our sysfs
interfaces.


Re: [PATCH v10 06/12] fs, arm64: untag user pointers in copy_mount_options

2019-02-26 Thread Andrey Konovalov
On Sat, Feb 23, 2019 at 12:03 AM Dave Hansen  wrote:
>
> On 2/22/19 4:53 AM, Andrey Konovalov wrote:
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -2730,7 +2730,7 @@ void *copy_mount_options(const void __user * data)
> >* the remainder of the page.
> >*/
> >   /* copy_from_user cannot cross TASK_SIZE ! */
> > - size = TASK_SIZE - (unsigned long)data;
> > + size = TASK_SIZE - (unsigned long)untagged_addr(data);
> >   if (size > PAGE_SIZE)
> >   size = PAGE_SIZE;
>
> I would have thought that copy_from_user() *is* entirely capable of
> detecting and returning an error in the case that its arguments cross
> TASK_SIZE.  It will fail and return an error, but that's what it's
> supposed to do.
>
> I'd question why this code needs to be doing its own checking in the
> first place.  Is there something subtle going on?

The comment above exact_copy_from_user() states:

Some copy_from_user() implementations do not return the exact number of
bytes remaining to copy on a fault.  But copy_mount_options() requires that.
Note that this function differs from copy_from_user() in that it will oops
on bad values of `to', rather than returning a short copy.


Re: [PATCH v10 07/12] fs, arm64: untag user pointers in fs/userfaultfd.c

2019-02-26 Thread Andrey Konovalov
On Sat, Feb 23, 2019 at 12:06 AM Dave Hansen  wrote:
>
> On 2/22/19 4:53 AM, Andrey Konovalov wrote:
> > userfaultfd_register() and userfaultfd_unregister() use provided user
> > pointers for vma lookups, which can only by done with untagged pointers.
>
> So, we have to patch all these sites before the tagged values get to the
> point of hitting the vma lookup functions.  Dumb question: Why don't we
> just patch the vma lookup functions themselves instead of all of these
> callers?

That might be a working approach as well. We'll still need to fix up
places where the vma fields are accessed directly. Catalin, what do
you think?


Re: [PATCH v10 04/12] mm, arm64: untag user pointers passed to memory syscalls

2019-02-26 Thread Andrey Konovalov
On Sat, Feb 23, 2019 at 12:07 AM Dave Hansen  wrote:
>
> On 2/22/19 4:53 AM, Andrey Konovalov wrote:
> > --- a/mm/mprotect.c
> > +++ b/mm/mprotect.c
> > @@ -578,6 +578,7 @@ static int do_mprotect_pkey(unsigned long start, size_t 
> > len,
> >  SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
> >   unsigned long, prot)
> >  {
> > + start = untagged_addr(start);
> >   return do_mprotect_pkey(start, len, prot, -1);
> >  }
> >
> > @@ -586,6 +587,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, 
> > len,
> >  SYSCALL_DEFINE4(pkey_mprotect, unsigned long, start, size_t, len,
> >   unsigned long, prot, int, pkey)
> >  {
> > + start = untagged_addr(start);
> >   return do_mprotect_pkey(start, len, prot, pkey);
> >  }
>
> This seems to have taken the approach of going as close as possible to
> the syscall boundary and untagging the pointer there.  I guess that's
> OK, but it does lead to more churn than necessary.  For instance, why
> not just do the untagging in do_mprotect_pkey()?

I think that makes more sense, will do in the next version, thanks!

>
> I think that's an overall design question.  I kinda asked the same thing
> about patching call sites vs. VMA lookup functions.

Replied in the other thread.


Re: [PATCH v10 00/12] arm64: untag user pointers passed to the kernel

2019-02-26 Thread Andrey Konovalov
On Fri, Feb 22, 2019 at 5:10 PM Szabolcs Nagy  wrote:
>
> On 22/02/2019 15:40, Andrey Konovalov wrote:
> > On Fri, Feb 22, 2019 at 4:35 PM Szabolcs Nagy  wrote:
> >>
> >> On 22/02/2019 12:53, Andrey Konovalov wrote:
> >>> This patchset is meant to be merged together with "arm64 relaxed ABI" [1].
> >>>
> >>> arm64 has a feature called Top Byte Ignore, which allows to embed pointer
> >>> tags into the top byte of each pointer. Userspace programs (such as
> >>> HWASan, a memory debugging tool [2]) might use this feature and pass
> >>> tagged user pointers to the kernel through syscalls or other interfaces.
> >>>
> >>> Right now the kernel is already able to handle user faults with tagged
> >>> pointers, due to these patches:
> >>>
> >>> 1. 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a
> >>>  tagged pointer")
> >>> 2. 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged
> >>> pointers")
> >>> 3. 276e9327 ("arm64: entry: improve data abort handling of tagged
> >>> pointers")
> >>>
> >>> This patchset extends tagged pointer support to syscall arguments.
> >>>
> >>> For non-memory syscalls this is done by untaging user pointers when the
> >>> kernel performs pointer checking to find out whether the pointer comes
> >>> from userspace (most notably in access_ok). The untagging is done only
> >>> when the pointer is being checked, the tag is preserved as the pointer
> >>> makes its way through the kernel.
> >>>
> >>> Since memory syscalls (mmap, mprotect, etc.) don't do memory accesses but
> >>> rather deal with memory ranges, untagged pointers are better suited to
> >>> describe memory ranges internally. Thus for memory syscalls we untag
> >>> pointers completely when they enter the kernel.
> >>
> >> i think the same is true when user pointers are compared.
> >>
> >> e.g. i suspect there may be issues with tagged robust mutex
> >> list pointers because the kernel does
> >>
> >> futex.c:3541:   while (entry != &head->list) {
> >>
> >> where entry is a user pointer that may be tagged, and
> >> &head->list is probably not tagged.
> >
> > You're right. I'll expand the cover letter in the next version to
> > describe this more accurately. The patchset however contains "mm,
> > arm64: untag user pointers in mm/gup.c" that should take care of futex
> > pointers.
>
> the robust mutex list pointer is not a futex pointer,
> i'm not sure how the mm/gup.c patch helps.

Oh, I've misinterpreted what you said, sorry.

I've looked at the robust futex list implementation, and I'm not sure
if we need to add untagging here.

> >> futex.c:3541:   while (entry != &head->list) {

Here head has whatever value user has set via the set_robust_list
syscall and it might be tagged. AFAIU this loop iterates over the
robust list stored in userspace, until it encounters the head pointer
again, at which point the kernel decides that it has iterated over the
whole list and stops. The question is whether we want the user to use
the same tag for the pointer that is passed to the set_robust_list
syscall and the pointer that is used to mark the end of the robust
list.

Catalin, what do you think?

>
> >>
> >>> One of the alternative approaches to untagging that was considered is to
> >>> completely strip the pointer tag as the pointer enters the kernel with
> >>> some kind of a syscall wrapper, but that won't work with the countless
> >>> number of different ioctl calls. With this approach we would need a custom
> >>> wrapper for each ioctl variation, which doesn't seem practical.
> >>>
> >>> The following testing approaches has been taken to find potential issues
> >>> with user pointer untagging:
> >>>
> >>> 1. Static testing (with sparse [3] and separately with a custom static
> >>>analyzer based on Clang) to track casts of __user pointers to integer
> >>>types to find places where untagging needs to be done.
> >>>
> >>> 2. Static testing with grep to find parts of the kernel that call
> >>>find_vma() (and other similar functions) or directly compare against
> >>>vm_start/vm_end fields of vma.
> >>>
> >>> 3. Static testing with grep to find parts of the kernel that compare
> >>>user pointers with TASK_SIZE or other similar consts and macros.
> >>>
> >>> 4. Dynamic testing: adding BUG_ON(has_tag(addr)) to find_vma() and running
> >>>a modified syzkaller version that passes tagged pointers to the kernel.
> >>>
> >>> Based on the results of the testing the requried patches have been added
> >>> to the patchset.
> >>>
> >>> This patchset has been merged into the Pixel 2 kernel tree and is now
> >>> being used to enable testing of Pixel 2 phones with HWASan.
> >>>
> >>> This patchset is a prerequisite for ARM's memory tagging hardware feature
> >>> support [4].
> >>>
> >>> Thanks!
> >>>
> >>> [1] https://lkml.org/lkml/2018/12/10/402
> >>>
> >>> [2] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
> >>>
> >>> [3] 
> >>> https://githu

Re: [PATCH v2] Documentation/process/howto: Update for 4.x -> 5.x versioning

2019-02-26 Thread Zenghui Yu
Hi Jon,

On Tue, Feb 26, 2019 at 2:26 AM Jonathan Corbet  wrote:
>
> On Sun, 24 Feb 2019 23:45:23 +0800
> Zenghui Yu  wrote:
>
> > As linux-5.0 is coming up soon, the howto.rst document can be
> > updated for the new kernel version. Change all 4.x references
> > to 5.x now.
> >
> > Signed-off-by: Zenghui Yu 
>
> Overall: I think there's value in having the docs reflect current
> numbers, though it would be better if the docs as a whole were kept
> current at the same time.  howto.rst hasn't been updated yet, so this
> attention is welcome - thanks for taking a look at it.  That said, I
> really think we can do a little better.

Thanks for your reviewing and nice suggestions.  Now I have realized that
simply changing version numbers in howto.rst (like what I've done ...) is
shortsighted. And yeah, we can do it better.

> >  Documentation/process/howto.rst | 24 
> >  1 file changed, 12 insertions(+), 12 deletions(-)
> >
> > diff --git a/Documentation/process/howto.rst 
> > b/Documentation/process/howto.rst
> > index f16242b..19001e2 100644
> > --- a/Documentation/process/howto.rst
> > +++ b/Documentation/process/howto.rst
> > @@ -235,16 +235,16 @@ Linux kernel development process currently consists 
> > of a few different
> >  main kernel "branches" and lots of different subsystem-specific kernel
> >  branches.  These different branches are:
> >
> > -  - main 4.x kernel tree
> > -  - 4.x.y -stable kernel tree
> > +  - main 5.x kernel tree
> > +  - 5.x.y -stable kernel tree
> >- subsystem specific kernel trees and patches
> > -  - the 4.x -next kernel tree for integration tests
> > +  - the 5.x -next kernel tree for integration tests
>
> One thing I think we can do is to simply get rid of version numbers in a
> lot of places and make this process easier when 6.x comes around.  What
> this is really trying to say is that we have:
>
>  - Linus's mainline tree
>  - Various stable trees with multiple major numbers
>  - Subsystem-specific trees
>  - linux-next
>
> If we could rework this along those lines, it will more accurately
> reflect reality and not require updating next time.

Obviously a better classification. Will follow your suggestion and modify it.

> > -4.x kernel tree
> > +5.x kernel tree
> >  ~~~
> >
> > -4.x kernels are maintained by Linus Torvalds, and can be found on
> > -https://kernel.org in the pub/linux/kernel/v4.x/ directory.  Its 
> > development
> > +5.x kernels are maintained by Linus Torvalds, and can be found on
> > +https://kernel.org in the pub/linux/kernel/v5.x/ directory.  Its 
> > development
> >  process is as follows:
>
> And here too I think we can just say "mainline" and that they can be
> found at https://kernel.org/ or in the repo.

Will modify.

> >- As soon as a new kernel is released a two weeks window is open,
> > @@ -277,21 +277,21 @@ mailing list about kernel releases:
> >   released according to perceived bug status, not according to a
> >   preconceived timeline."*
> >
> > -4.x.y -stable kernel tree
> > +5.x.y -stable kernel tree
> >  ~
> >
> >  Kernels with 3-part versions are -stable kernels. They contain
> >  relatively small and critical fixes for security problems or significant
> > -regressions discovered in a given 4.x kernel.
> > +regressions discovered in a given 5.x kernel.
>
> Here too, especially since most of the outstanding stable kernels won't
> be 5.x for a long time.

Yes. Actually, I hesitated too when I was changing "4.x.y -stable kernel tree"
to "5.x.y -stable kernel tree" :)
Using "stable trees" instead will be better.

> >  This is the recommended branch for users who want the most recent stable
> >  kernel and are not interested in helping test development/experimental
> >  versions.
> >
> > -If no 4.x.y kernel is available, then the highest numbered 4.x
> > +If no 5.x.y kernel is available, then the highest numbered 5.x
> >  kernel is the current stable kernel.
>
> ...and this, I believe, is misleading at best.  I'd just take that
> sentence out.

Yes, I'll delete it.

> > -4.x.y are maintained by the "stable" team , and
> > +5.x.y are maintained by the "stable" team , and
> >  are released as needs dictate.  The normal release period is approximately
> >  two weeks, but it can be longer if there are no pressing problems.  A
> >  security-related problem, instead, can cause a release to happen almost
> > @@ -326,10 +326,10 @@ revisions to it, and maintainers can mark patches as 
> > under review,
> >  accepted, or rejected.  Most of these patchwork sites are listed at
> >  https://patchwork.kernel.org/.
> >
> > -4.x -next kernel tree for integration tests
> > +5.x -next kernel tree for integration tests
> >  ~~~
> >
> > -Before updates from subsystem trees are merged into the mainline 4.x
> > +Before updates from subsystem trees are merged into the mainline 5.x
> >  tree, they need to be integration-tested.  For this purpose, a sp

Re: [PATCH v10 00/12] arm64: untag user pointers passed to the kernel

2019-02-26 Thread Andrey Konovalov
On Fri, Feb 22, 2019 at 11:55 PM Dave Hansen  wrote:
>
> On 2/22/19 4:53 AM, Andrey Konovalov wrote:
> > The following testing approaches has been taken to find potential issues
> > with user pointer untagging:
> >
> > 1. Static testing (with sparse [3] and separately with a custom static
> >analyzer based on Clang) to track casts of __user pointers to integer
> >types to find places where untagging needs to be done.
>
> First of all, it's really cool that you took this approach.  Sounds like
> there was a lot of systematic work to fix up the sites in the existing
> codebase.
>
> But, isn't this a _bit_ fragile going forward?  Folks can't just "make
> sparse" to find issues with missing untags.

Yes, this static approach can only be used as a hint to find some
places where untagging is needed, but certainly not all.

> This seems like something
> where we would ideally add an __tagged annotation (or something) to the
> source tree and then have sparse rules that can look for missed untags.

This has been suggested before, search for __untagged here [1].
However there are many places in the kernel where a __user pointer is
casted into unsigned long and passed further. I'm not sure if it's
possible apply a __tagged/__untagged kind of attribute to non-pointer
types, is it?

[1] https://patchwork.kernel.org/patch/10581535/


Re: [RFC][PATCH 0/3] arm64 relaxed ABI

2019-02-26 Thread Kevin Brodsky

On 25/02/2019 18:02, Szabolcs Nagy wrote:

On 25/02/2019 16:57, Catalin Marinas wrote:

On Tue, Feb 19, 2019 at 06:38:31PM +, Szabolcs Nagy wrote:

i think these rules work for the cases i care about, a more
tricky question is when/how to check for the new syscall abi
and when/how the TCR_EL1.TBI0 setting may be turned off.

I don't think turning TBI0 off is critical (it's handy for PAC with
52-bit VA but then it's short-lived if you want more security features
like MTE).

yes, i made a mistake assuming TBI0 off is
required for (or at least compatible with) MTE.

if TBI0 needs to be on for MTE then some of my
analysis is wrong, and i expect TBI0 to be on
in the foreseeable future.


consider the following cases (tb == top byte):

binary 1: user tb = any, syscall tb = 0
   tbi is on, "legacy binary"

binary 2: user tb = any, syscall tb = any
   tbi is on, "new binary using tb"
   for backward compat it needs to check for new syscall abi.

binary 3: user tb = 0, syscall tb = 0
   tbi can be off, "new binary",
   binary is marked to indicate unused tb,
   kernel may turn tbi off: additional pac bits.

binary 4: user tb = mte, syscall tb = mte
   like binary 3, but with mte, "new binary using mte"

so this should be "like binary 2, but with mte".


   does it have to check for new syscall abi?
   or MTE HWCAP would imply it?
   (is it possible to use mte without new syscall abi?)

I think MTE HWCAP should imply it.


in userspace we want most binaries to be like binary 3 and 4
eventually, i.e. marked as not-relying-on-tbi, if a dso is
loaded that is unmarked (legacy or new tb user), then either
the load fails (e.g. if mte is already used? or can we turn
mte off at runtime?) or tbi has to be enabled (prctl? does
this work with pac? or multi-threads?).

We could enable it via prctl. That's the plan for MTE as well (in
addition maybe to some ELF flag).


as for checking the new syscall abi: i don't see much semantic
difference between AT_HWCAP and AT_FLAGS (either way, the user
has to check a feature flag before using the feature of the
underlying system and it does not matter much if it's a syscall
abi feature or cpu feature), but i don't see anything wrong
with AT_FLAGS if the kernel prefers that.

The AT_FLAGS is aimed at capturing binary 2 case above, i.e. the
relaxation of the syscall ABI to accept tb = any. The MTE support will
have its own AT_HWCAP, likely in addition to AT_FLAGS. Arguably,
AT_FLAGS is either redundant here if MTE implies it (and no harm in
keeping it around) or the meaning is different: a tb != 0 may be checked
by the kernel against the allocation tag (i.e. get_user() could fail,
the tag is not entirely ignored).


the discussion here was mostly about binary 2,

That's because passing tb != 0 into the syscall ABI is the main blocker
here that needs clearing out before merging the MTE support. There is,
of course, a variation of binary 1 for MTE:

binary 5: user tb = mte, syscall tb = 0

but this requires a lot of C lib changes to support properly.

yes, i don't think we want to do that.

but it's ok to have both syscall tbi AT_FLAGS and MTE HWCAP.


but for
me the open question is if we can make binary 3/4 work.
(which requires some elf binary marking, that is recognised
by the kernel and dynamic loader, and efficient handling of
the TBI0 bit, ..if it's not possible, then i don't see how
mte will be deployed).

If we ignore binary 3, we can keep TBI0 = 1 permanently, whether we have
MTE or not.


and i guess on the kernel side the open question is if the
rules 1/2/3/4 can be made to work in corner cases e.g. when
pointers embedded into structs are passed down in ioctl.

We've been trying to track these down since last summer and we came to
the conclusion that it should be (mostly) fine for the non-weird memory
described above.

i think an interesting case is when userspace passes
a pointer to the kernel and later gets it back,
which is why i proposed rule 4 (kernel has to keep
the tag then).

but i wonder what's the right thing to do for sp
(user can malloc thread/sigalt/makecontext stack
which will be mte tagged in practice with mte on)
does tagged sp work? should userspace untag the
stack memory before setting it up as a stack?
(but then user pointers to that allocation may get
broken..)


Tagged SP does work, and it is actually a good idea (it avoids using the default tag 
for the stack). It would be quite easy for the kernel to tag the initial SP and the 
stack on execve(). For other stacks, it is up to userspace, as you say, and would be 
made easier by making it possible to choose how a mapping should be tagged by the 
kernel via a new mmap() flag. Some software that makes too many assumptions on the 
address of stack variables will be disturbed by a tagged SP, but this should be 
fairly rare.


In any case, I don't think this impacts this ABI proposal (beyond the fact that 
passing tagged pointers to the stack needs to be allowed).


Kevin


Re: [PATCH v10 00/12] arm64: untag user pointers passed to the kernel

2019-02-26 Thread Dave Hansen
On 2/26/19 9:18 AM, Andrey Konovalov wrote:
>> This seems like something
>> where we would ideally add an __tagged annotation (or something) to the
>> source tree and then have sparse rules that can look for missed untags.
> This has been suggested before, search for __untagged here [1].
> However there are many places in the kernel where a __user pointer is
> casted into unsigned long and passed further. I'm not sure if it's
> possible apply a __tagged/__untagged kind of attribute to non-pointer
> types, is it?
> 
> [1] https://patchwork.kernel.org/patch/10581535/

I believe we have sparse checking __GFP_* flags.  We also have a gfp_t
for them and I'm unsure whether the sparse support is tied to _that_ or
whether it's just by tagging the type itself as being part of a discrete
address space.


[PATCH v3] Documentation/process/howto: Update for 4.x -> 5.x versioning

2019-02-26 Thread Zenghui Yu
As linux-5.0 is coming up soon, the howto.rst document can be
updated for the new kernel version. Instead of changing all 4.x
references to 5.x, this time we git rid of all explicit version
numbers and rework some kernel trees' name to keep the docs
current and real.

Signed-off-by: Zenghui Yu 
---
 Documentation/process/howto.rst | 47 +++--
 1 file changed, 22 insertions(+), 25 deletions(-)

 Change since v2:
 - remove explicit version numbers and rework kernel trees' name
   by Jon's suggestions

diff --git a/Documentation/process/howto.rst b/Documentation/process/howto.rst
index f16242b..ad2b6c8 100644
--- a/Documentation/process/howto.rst
+++ b/Documentation/process/howto.rst
@@ -235,22 +235,21 @@ Linux kernel development process currently consists of a 
few different
 main kernel "branches" and lots of different subsystem-specific kernel
 branches.  These different branches are:
 
-  - main 4.x kernel tree
-  - 4.x.y -stable kernel tree
-  - subsystem specific kernel trees and patches
-  - the 4.x -next kernel tree for integration tests
+  - Linus's mainline tree
+  - Various stable trees with multiple major numbers
+  - Subsystem-specific trees
+  - linux-next integration testing tree
 
-4.x kernel tree
-~~~
+Mainline tree
+~
 
-4.x kernels are maintained by Linus Torvalds, and can be found on
-https://kernel.org in the pub/linux/kernel/v4.x/ directory.  Its development
-process is as follows:
+Mainline tree are maintained by Linus Torvalds, and can be found at
+https://kernel.org or in the repo.  Its development process is as follows:
 
   - As soon as a new kernel is released a two weeks window is open,
 during this period of time maintainers can submit big diffs to
 Linus, usually the patches that have already been included in the
--next kernel for a few weeks.  The preferred way to submit big changes
+linux-next for a few weeks.  The preferred way to submit big changes
 is using git (the kernel's source management tool, more information
 can be found at https://git-scm.com/) but plain patches are also just
 fine.
@@ -277,21 +276,19 @@ mailing list about kernel releases:
released according to perceived bug status, not according to a
preconceived timeline."*
 
-4.x.y -stable kernel tree
-~
+Various stable trees with multiple major numbers
+
 
 Kernels with 3-part versions are -stable kernels. They contain
 relatively small and critical fixes for security problems or significant
-regressions discovered in a given 4.x kernel.
+regressions discovered in a given major mainline release, with the first
+2-part of version number are the same correspondingly.
 
 This is the recommended branch for users who want the most recent stable
 kernel and are not interested in helping test development/experimental
 versions.
 
-If no 4.x.y kernel is available, then the highest numbered 4.x
-kernel is the current stable kernel.
-
-4.x.y are maintained by the "stable" team , and
+Stable trees are maintained by the "stable" team , and
 are released as needs dictate.  The normal release period is approximately
 two weeks, but it can be longer if there are no pressing problems.  A
 security-related problem, instead, can cause a release to happen almost
@@ -301,8 +298,8 @@ The file 
:ref:`Documentation/process/stable-kernel-rules.rst https://patchwork.kernel.org/.
 
-4.x -next kernel tree for integration tests
-~~~
+linux-next integration testing tree
+~~~
 
-Before updates from subsystem trees are merged into the mainline 4.x
-tree, they need to be integration-tested.  For this purpose, a special
+Before updates from subsystem trees are merged into the mainline tree,
+they need to be integration-tested.  For this purpose, a special
 testing repository exists into which virtually all subsystem trees are
 pulled on an almost daily basis:
 
https://git.kernel.org/?p=linux/kernel/git/next/linux-next.git
 
-This way, the -next kernel gives a summary outlook onto what will be
+This way, the linux-next gives a summary outlook onto what will be
 expected to go into the mainline kernel at the next merge period.
-Adventurous testers are very welcome to runtime-test the -next kernel.
+Adventurous testers are very welcome to runtime-test the linux-next.
 
 
 Bug Reporting
-- 
2.7.4



Re: [PATCH 04/14] x86 topology: Add CPUID.1F multi-die/package support

2019-02-26 Thread Dave Hansen
On 2/25/19 10:20 PM, Len Brown wrote:
> -/* leaf 0xb sub-leaf types */
> +/* extended topology sub-leaf types */
>  #define INVALID_TYPE 0
>  #define SMT_TYPE 1
>  #define CORE_TYPE2
> +#define DIE_TYPE 5

Looking in the SDM, Vol. 3A "8.9.1 Hierarchical Mapping of Shared
Resources", there are a _couple_ of new levels: Die, Tile and Module.
But, this patch only covers Dies.

Was there a reason for that?

I wonder if we'll end up with different (better) infrastructure if we do
these all at once instead of hacking them in one at a time.


Re: [PATCH v10 00/12] arm64: untag user pointers passed to the kernel

2019-02-26 Thread Luc Van Oostenryck
On Tue, Feb 26, 2019 at 06:18:25PM +0100, Andrey Konovalov wrote:
> On Fri, Feb 22, 2019 at 11:55 PM Dave Hansen  wrote:
> >
> > On 2/22/19 4:53 AM, Andrey Konovalov wrote:
> > > The following testing approaches has been taken to find potential issues
> > > with user pointer untagging:
> > >
> > > 1. Static testing (with sparse [3] and separately with a custom static
> > >analyzer based on Clang) to track casts of __user pointers to integer
> > >types to find places where untagging needs to be done.
> >
> > First of all, it's really cool that you took this approach.  Sounds like
> > there was a lot of systematic work to fix up the sites in the existing
> > codebase.
> >
> > But, isn't this a _bit_ fragile going forward?  Folks can't just "make
> > sparse" to find issues with missing untags.
> 
> Yes, this static approach can only be used as a hint to find some
> places where untagging is needed, but certainly not all.
> 
> > This seems like something
> > where we would ideally add an __tagged annotation (or something) to the
> > source tree and then have sparse rules that can look for missed untags.
> 
> This has been suggested before, search for __untagged here [1].
> However there are many places in the kernel where a __user pointer is
> casted into unsigned long and passed further. I'm not sure if it's
> possible apply a __tagged/__untagged kind of attribute to non-pointer
> types, is it?
> 
> [1] https://patchwork.kernel.org/patch/10581535/

It's something that should need to be added to sparse since it's
different from what sparse already have (the existing __bitwise and
concept of address-space doesn't seem to do the job here).

-- Luc Van Oostenryck


Re: [PATCH v2 00/11] LSM documentation update

2019-02-26 Thread Kees Cook
On Tue, Feb 26, 2019 at 12:49 PM Denis Efremov  wrote:
> Recent "New LSM Hooks" discussion has led me to the
> thought that it might be a good idea to slightly
> update the current documentation. The patchset adds
> nothing new to the documentation, only fixes the old
> description of hooks to reflect their current state.
>
> V2 adds the clarification on arguments for some hooks.
> The format of the documentation is also slightly updated
> for better html. However, there are still 10 hooks without
> documentation at all. I think that this should be fixed
> separatedly.
>
> Denis Efremov (11):
>   LSM: fix documentation for sb_copy_data hook
>   LSM: fix documentation for the syslog hook
>   LSM: fix documentation for the socket_post_create hook
>   LSM: fix documentation for the task_setscheduler hook
>   LSM: fix documentation for the socket_getpeersec_dgram hook
>   LSM: fix documentation for the path_chmod hook
>   LSM: fix documentation for the audit_* hooks
>   LSM: fix documentation for the msg_queue_* hooks
>   LSM: fix documentation for the sem_* hooks
>   LSM: fix documentation for the shm_* hooks
>   LSM: lsm_hooks.h: fix documentation format
>
>  include/linux/lsm_hooks.h | 170 ++
>  1 file changed, 81 insertions(+), 89 deletions(-)

Awesome; thanks! This fixes several warnings in "make htmldocs":

./include/linux/lsm_hooks.h:1783: warning: Function parameter or
member 'task_setioprio' not described in 'security_list_options'
./include/linux/lsm_hooks.h:1783: warning: Function parameter or
member 'task_getioprio' not described in 'security_list_options'
./include/linux/lsm_hooks.h:1783: warning: Function parameter or
member 'task_movememory' not described in 'security_list_options'
./include/linux/lsm_hooks.h:1783: warning: Function parameter or
member 'secmark_refcount_inc' not described in 'security_list_options'
./include/linux/lsm_hooks.h:1783: warning: Function parameter or
member 'secmark_refcount_dec' not described in 'security_list_options'

So, for the series:

Acked-by: Kees Cook 

If you want more work, I do notice the following warnings are still present:

./include/linux/lsm_hooks.h:1775: warning: Function parameter or
member 'quotactl' not described in 'security_list_options'
./include/linux/lsm_hooks.h:1775: warning: Function parameter or
member 'quota_on' not described in 'security_list_options'
./include/linux/lsm_hooks.h:1775: warning: Function parameter or
member 'sb_free_mnt_opts' not described in 'security_list_options'
./include/linux/lsm_hooks.h:1775: warning: Function parameter or
member 'sb_eat_lsm_opts' not described in 'security_list_options'
./include/linux/lsm_hooks.h:1775: warning: Function parameter or
member 'sb_kern_mount' not described in 'security_list_options'
./include/linux/lsm_hooks.h:1775: warning: Function parameter or
member 'sb_show_options' not described in 'security_list_options'
./include/linux/lsm_hooks.h:1775: warning: Function parameter or
member 'sb_add_mnt_opt' not described in 'security_list_options'
./include/linux/lsm_hooks.h:1775: warning: Function parameter or
member 'd_instantiate' not described in 'security_list_options'
./include/linux/lsm_hooks.h:1775: warning: Function parameter or
member 'getprocattr' not described in 'security_list_options'
./include/linux/lsm_hooks.h:1775: warning: Function parameter or
member 'setprocattr' not described in 'security_list_options'

:)

-- 
Kees Cook


Re: [RFC v10 0/4] pstore/block: new support logger for block devices

2019-02-26 Thread liaoweixiong


On 2019-02-26 19:20, Greg Kroah-Hartman wrote:
> On Tue, Feb 26, 2019 at 02:33:41PM +0800, liaoweixiong wrote:
>> Why should we need pstore_block?
>> 1. Most embedded intelligent equipment have no persistent ram, which
>> increases costs. We perfer to cheaper solutions, like block devices.
>> In fast, there is already a sample for block device logger in driver
>> MTD (drivers/mtd/mtdoops.c).
>> 2. Do not any equipment have battery, which means that it lost all data
>> on general ram if power failure. Pstore has little to do for these
>> equipments.
>>
>> [PATCH v10]
> 
> Why are you still labeling these as "RFC"?  No one should actually be
> applying a Request For Comments patchset, as you obviously are not
> thinking it is ready to be merged :(
> 
> After 10 revisions, I hope you are confident in this patchset :)
> 

I'm confident in this patchset :) . It is first time for me to submit
RFC patches, i just don't know i should change the label to PATCH. Thank
you for reminding me.

> thanks,
> 
> greg k-h
> 

-- 
liaoweixiong


[PATCH v5 01/10] arm64: Provide a command line to disable spectre_v2 mitigation

2019-02-26 Thread Jeremy Linton
There are various reasons, including bencmarking, to disable spectrev2
mitigation on a machine. Provide a command-line to do so.

Signed-off-by: Jeremy Linton 
Cc: Jonathan Corbet 
Cc: linux-doc@vger.kernel.org
---
 Documentation/admin-guide/kernel-parameters.txt |  8 
 arch/arm64/kernel/cpu_errata.c  | 13 +
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 858b6c0b9a15..4d4d6a9537ae 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2842,10 +2842,10 @@
check bypass). With this option data leaks are possible
in the system.
 
-   nospectre_v2[X86,PPC_FSL_BOOK3E] Disable all mitigations for the 
Spectre variant 2
-   (indirect branch prediction) vulnerability. System may
-   allow data leaks with this option, which is equivalent
-   to spectre_v2=off.
+   nospectre_v2[X86,PPC_FSL_BOOK3E,ARM64] Disable all mitigations for
+   the Spectre variant 2 (indirect branch prediction)
+   vulnerability. System may allow data leaks with this
+   option.
 
nospec_store_bypass_disable
[HW] Disable all mitigations for the Speculative Store 
Bypass vulnerability
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 9950bb0cbd52..d2b2c69d31bb 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -220,6 +220,14 @@ static void qcom_link_stack_sanitization(void)
 : "=&r" (tmp));
 }
 
+static bool __nospectre_v2;
+static int __init parse_nospectre_v2(char *str)
+{
+   __nospectre_v2 = true;
+   return 0;
+}
+early_param("nospectre_v2", parse_nospectre_v2);
+
 static void
 enable_smccc_arch_workaround_1(const struct arm64_cpu_capabilities *entry)
 {
@@ -231,6 +239,11 @@ enable_smccc_arch_workaround_1(const struct 
arm64_cpu_capabilities *entry)
if (!entry->matches(entry, SCOPE_LOCAL_CPU))
return;
 
+   if (__nospectre_v2) {
+   pr_info_once("spectrev2 mitigation disabled by command line 
option\n");
+   return;
+   }
+
if (psci_ops.smccc_version == SMCCC_VERSION_1_0)
return;
 
-- 
2.20.1



[PATCH v11 2/4] pstore/blk: add blkoops for pstore_blk

2019-02-26 Thread liaoweixiong
blkoops is a sample for pstore/blk. It can only record oops, excluding
panics as no read/write apis for panic registered. It support settings
on Kconfg/module parameters. It can record oops log even power failure
if "PSTORE_BLKOOPS_BLKDEV" on Kconfig or "blkdev" on module parameter
is valid. Otherwise, it can only record data to ram buffer, which will
be dropped when reboot.

Signed-off-by: liaoweixiong 
---
 MAINTAINERS|   2 +-
 fs/pstore/Kconfig  | 114 ++
 fs/pstore/Makefile |   2 +
 fs/pstore/blkoops.c| 198 +
 include/linux/pstore_blk.h |  14 +++-
 5 files changed, 325 insertions(+), 5 deletions(-)
 create mode 100644 fs/pstore/blkoops.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 51029a4..4e9242a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12318,7 +12318,7 @@ F:  drivers/firmware/efi/efi-pstore.c
 F: drivers/acpi/apei/erst.c
 F: Documentation/admin-guide/ramoops.rst
 F: Documentation/devicetree/bindings/reserved-memory/ramoops.txt
-K: \b(pstore|ramoops)
+K: \b(pstore|ramoops|blkoops)
 
 PTP HARDWARE CLOCK SUPPORT
 M: Richard Cochran 
diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index defcb75..7dfe00b 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -160,3 +160,117 @@ config PSTORE_BLK
help
  This enables panic and oops message to be logged to a block dev
  where it can be read back at some later point.
+
+config PSTORE_BLKOOPS
+   tristate "pstore block with oops logger"
+   depends on PSTORE_BLK
+   help
+ This is a sample for pstore block with oops logger.
+
+ It CANNOT record panic log as no read/write apis for panic registered.
+
+ It CAN record oops log even power failure if
+ "PSTORE_BLKOOPS_BLKDEV" on Kconfig or "block-device" on dts or
+ "blkdev" on module parameter is valid.
+
+ Otherwise, it can only record data to ram buffer, which will be
+ dropped when reboot.
+
+ NOTE that, there are three ways to set parameters of blkoops and
+ prioritize according to configuration flexibility. That is
+ Kconfig < device tree < module parameters. It means that the value can
+ be overwritten by higher priority settings.
+ 1. Kconfig
+It just sets a default value.
+ 2. device tree
+It is set on device tree, which will overwrites value from Kconfig,
+but can also be overwritten by module parameters.
+ 3. module parameters
+It is the first priority. Take care of that blkoops will take lower
+priority settings if higher priority one do not set.
+
+config PSTORE_BLKOOPS_DMESG_SIZE
+   int "dmesg size in kbytes for blkoops"
+   depends on PSTORE_BLKOOPS
+   default 64
+   help
+ This just sets size of dmesg (dmesg_size) for pstore/blk. The value
+ must be a multiple of 4096.
+
+ NOTE that, there are three ways to set parameters of blkoops and
+ prioritize according to configuration flexibility. That is
+ Kconfig < device tree < module parameters. It means that the value can
+ be overwritten by higher priority settings.
+ 1. Kconfig
+It just sets a default value.
+ 2. device tree
+It is set on device tree, which will overwrites value from Kconfig,
+but can also be overwritten by module parameters.
+ 3. module parameters
+It is the first priority. Take care of that blkoops will take lower
+priority settings if higher priority one do not set.
+
+config PSTORE_BLKOOPS_TOTAL_SIZE
+   int "total size in kbytes for blkoops"
+   depends on PSTORE_BLKOOPS
+   default 0
+   help
+ The total size in kbytes pstore/blk can use. It must be less than or
+ equal to size of block device if @blkdev valid. If @total_size is zero
+ with @blkdev, @total_size will be set to equal to size of @blkdev.
+ The value must be a multiple of 4096.
+
+ NOTE that, there are three ways to set parameters of blkoops and
+ prioritize according to configuration flexibility. That is
+ Kconfig < device tree < module parameters. It means that the value can
+ be overwritten by higher priority settings.
+ 1. Kconfig
+It just sets a default value.
+ 2. device tree
+It is set on device tree, which will overwrites value from Kconfig,
+but can also be overwritten by module parameters.
+ 3. module parameters
+It is the first priority. Take care of that blkoops will take lower
+priority settings if higher priority one do not set.
+
+config PSTORE_BLKOOPS_BLKDEV
+   string "block device for blkoops"
+   depends on PSTORE_BLKOOPS
+   default ""
+   help
+ This just sets bloc

[PATCH v11 4/4] Documentation: pstore/blk: create document for pstore_blk

2019-02-26 Thread liaoweixiong
The document, at Documentation/admin-guide/pstore-block.rst,
tells user how to use pstore_blk and the attentions about panic
read/write

Signed-off-by: liaoweixiong 
---
 Documentation/admin-guide/pstore-block.rst | 233 +
 MAINTAINERS|   1 +
 fs/pstore/Kconfig  |   4 +
 3 files changed, 238 insertions(+)
 create mode 100644 Documentation/admin-guide/pstore-block.rst

diff --git a/Documentation/admin-guide/pstore-block.rst 
b/Documentation/admin-guide/pstore-block.rst
new file mode 100644
index 000..a828274
--- /dev/null
+++ b/Documentation/admin-guide/pstore-block.rst
@@ -0,0 +1,233 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Pstore block oops/panic logger
+==
+
+Introduction
+
+
+Pstore block (pstore_blk) is an oops/panic logger that write its logs to block
+device before the system crashes. Pstore_blk needs block device driver
+registering a partition path of the block device, like /dev/mmcblk0p7 for mmc
+driver, and read/write APIs for this partition when on panic.
+
+Pstore block concepts
+-
+
+Pstore block begins at function ``blkz_register``, by which block driver
+registers to pstore_blk. Note that, block driver should register to pstore_blk
+after block device has registered. Block driver transfers a structure
+``blkz_info`` which is defined in *linux/pstore_blk.h*.
+
+The following key members of ``struct blkz_info`` may be of interest to you.
+
+blkdev
+~~
+
+The block device to use. Most of the time, it is a partition of block device.
+It's ok to keep it as NULL if you passing ``read`` and ``write`` in blkz_info 
as
+``blkdev`` is used by blkz_default_general_read/write. If both of ``blkdev``,
+``read`` and ``write`` are NULL, no block device is effective and the data will
+be saved in ddr buffer.
+
+It accept the following variants:
+
+1.  device number in hexadecimal represents itself no
+   leading 0x, for example b302.
+#. /dev/ represents the device number of disk
+#. /dev/ represents the device number of partition - device
+   number of disk plus the partition number
+#. /dev/p - same as the above, that form is used when disk
+   name of partitioned disk ends on a digit.
+#. PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF representing the unique id of
+   a partition if the partition table provides it. The UUID may be either an
+   EFI/GPT UUID, or refer to an MSDOS partition using the format -PP,
+   where  is a zero-filled hex representation of the 32-bit
+   "NT disk signature", and PP is a zero-filled hex representation of the
+   1-based partition number.
+#. PARTUUID=/PARTNROFF= to select a partition in relation to a
+   partition with a known unique id.
+#. : major and minor number of the device separated by a colon.
+
+See more on section **read/write**.
+
+total_size
+~~
+
+The total size in bytes of block device used for pstore_blk. It **MUST** be 
less
+than or equal to size of block device if ``blkdev`` valid. It **MUST** be a
+multiple of 4096. If ``total_size`` is zero with ``blkdev``, ``total_size`` 
will be
+set to equal to size of ``blkdev``.
+
+The block device area is divided into many chunks, and each event writes a 
chunk
+of information.
+
+dmesg_size
+~~
+
+The chunk size in bytes for dmesg(oops/panic). It **MUST** be a multiple of
+SECTOR_SIZE (Most of the time, the SECTOR_SIZE is 512). If you don't need 
dmesg,
+you are safely to set it to 0.
+
+NOTE that, the remaining space, except ``pmsg_size`` and others, belongs to
+dmesg. It means that there are multiple chunks for dmesg.
+
+Psotre_blk will log to dmesg chunks one by one, and always overwrite the oldest
+chunk if no free chunk.
+
+pmsg_size
+~
+
+The chunk size in bytes for pmsg. It **MUST** be a multiple of SECTOR_SIZE 
(Most
+of the time, the SECTOR_SIZE is 512). If you don't need pmsg, you are safely to
+set it to 0.
+
+There is only one chunk for pmsg.
+
+Pmsg is a user space accessible pstore object. Writes to */dev/pmsg0* are
+appended to the chunk. On reboot the contents are available in
+/sys/fs/pstore/pmsg-pstore-blk-0.
+
+dump_oops
+~
+
+Dumping both oopses and panics can be done by setting 1 in the ``dump_oops``
+member while setting 0 in that variable dumps only the panics.
+
+read/write
+~~
+
+They are general ``read/write`` APIs. It is safely and recommended to ignore 
it,
+but set ``blkdev``.
+
+These general APIs are used all the time expect panic. The ``read`` API is
+usually used to recover data from block device, and the ``write`` API is 
usually
+to flush new data and erase to block device.
+
+Pstore_blk will temporarily hold all new data before block device is ready. If
+you ignore both of ``read/write`` and ``blkdev``, the old data will be lost.
+
+NOTE that, the general APIs must check whether the block device is ready if
+self-defined.
+
+panic_read/panic_write
+~~

[PATCH v11 0/4] pstore/block: new support logger for block devices

2019-02-26 Thread liaoweixiong
Why should we need pstore_block?
1. Most embedded intelligent equipment have no persistent ram, which
increases costs. We perfer to cheaper solutions, like block devices.
In fast, there is already a sample for block device logger in driver
MTD (drivers/mtd/mtdoops.c).
2. Do not any equipment have battery, which means that it lost all data
on general ram if power failure. Pstore has little to do for these
equipments.

[PATCH v11]
Change patchset label from RFC to PATCH

[PATCH v10]
Cancel DT support for blkoops temporarily.
On patch 1:
1. pstore/blk should unlink PSTORE_BLKDEV when unregister.
On patch 2:
1. cancel DT support temporarily. I will submit other patches to support DT
   when DT maintainers acked.
2. add spin lock to protect blkz_info when modify panic operations.
3. change default value of total size on Kconfig from 1024 to 0.

[PATCH v9]
On patch 1:
1. rename part_path/part_size, members of blkz_info, to blkdev/total_size
2. if total_size is zero, get size from @blkdev
3. support multiple variants for @blkdev, such as partuuid, major with minor,
   and /dev/. See details on Documentation.
4. get size from block device
5. add depends on CONFIG_BLOCK
On patch 2:
1. update document
On patch 3:
1. update codes for new blkzone. Blkoops support insmod without total_size.
   for example: "insmod ./blkoops.ko blkdev=93:6" (major:minor).
2. use late_initcalls rather than module_init, to avoid block device not ready.
3. support for block driver to add panic apis to blkoops. By this, block
   driver can do the least work that just provides panic operations.
On patch 5:
1. update document

[PATCH v8]
On patch 2:
1. move DT to /bindings/pstore
2. Delete details for kernel.

[PATCH v7]
On patch 1:
1. Fix line over 80 characters.
On patch 2:
1. Insert a separate patch for DT bindings.

[PATCH v6]
On patch 1:
1. Fix according to email from Kees Cook, including spelling mistakes,
   explicit overflow test, none of the zeroing etc.
2. Do not recover data but metadata of dmesg when panic.
3. No need to take recovery when do erase.
4. Do not use "blkoops" for blkzone any more because "blkoops" is used for
   other module now. (rename blkbuf to blkoops)
On patch 2:
1. Rename blkbuf to blkoops.
2. Add Kconfig/device tree/module parameters settings for blkoops.
3. Add document for device tree.
On patch 3:
1. Blkoops support pmsg.
2. Fix description for new version patch.
On patch 4:
1. Fix description for new version patch.

[PATCH v5]
On patch 1:
1. rename pstore/rom to pstore/blk
2. Do not allocate any memory in the write path of panic. So, use local
array instead in function romz_recover_dmesg_meta.
3. Add C header file "linux/fs.h" to fix implicit declaration of function
   'filp_open','kernel_read'...
On patch 3:
1. If panic, do not recover pmsg but flush if it is dirty.
2. Fix erase pmsg failed.
On patch 4:
1. Create a document for pstore/blk

[PATCH v4]
On patch 1:
1. Fix always true condition '(--i >= 0) => (0-u32max >= 0)' in function
   romz_init_zones by defining variable i to 'int' rahter than
   'unsigned int'.
2. To make codes more easily to read, we use macro READ_NEXT_ZONE for
   return value of romz_dmesg_read if it need to read next zone.
   Moveover, we assign READ_NEXT_ZONE -1024 rather than 0.
3. Add 'FLUSH_META' to 'enum romz_flush_mode' and rename 'NOT_FLUSH' to
   'FLUSH_NONE'
4. Function romz_zone_write work badly with FLUSH_PART mode as badly
   address and offset to write.
On patch 3:
NEW SUPPORT psmg for pstore_rom.

[PATCH v3]
On patch 1:
Fix build as module error for undefined 'vfs_read' and 'vfs_write'
Both of 'vfs_read' and 'vfs_write' haven't be exproted yet, so we use
'kernel_read' and 'kernel_write' instead.

[PATCH v2]
On patch 1:
Fix build as module error for redefinition of 'romz_unregister' and
'romz_register'

[PATCH v1]
On patch 1:
Core codes of pstore_rom, which works well on allwinner(sunxi) platform.
On patch 2:
A sample for pstore_rom, using general ram rather than block device.

liaoweixiong (4):
  pstore/blk: new support logger for block devices
  pstore/blk: add blkoops for pstore_blk
  pstore/blk: support pmsg for pstore block
  Documentation: pstore/blk: create document for pstore_blk

 Documentation/admin-guide/pstore-block.rst |  233 ++
 MAINTAINERS|3 +-
 fs/pstore/Kconfig  |  147 
 fs/pstore/Makefile |5 +
 fs/pstore/blkoops.c|  208 +
 fs/pstore/blkzone.c| 1242 
 include/linux/pstore_blk.h |   87 ++
 7 files changed, 1924 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/admin-guide/pstore-block.rst
 create mode 100644 fs/pstore/blkoops.c
 create mode 100644 fs/pstore/blkzone.c
 create mode 100644 include/linux/pstore_blk.h

-- 
1.9.1



[PATCH v11 1/4] pstore/blk: new support logger for block devices

2019-02-26 Thread liaoweixiong
pstore_blk is similar to pstore_ram, but dump log to block devices
rather than persistent ram.

Why should we need pstore_blk?
1. Most embedded intelligent equipment have no persistent ram, which
increases costs. We perfer to cheaper solutions, like block devices.
In fact, there is already a sample for block device logger in driver
MTD (drivers/mtd/mtdoops.c).
2. Do not any equipment have battery, which means that it lost all data
on general ram if power failure. Pstore has little to do for these
equipments.

pstore_blk can only dump Oops/Panic log to block devices. It only
supports dmesg now. To make pstore_blk work, the block driver should
provide the block device and the read/write apis when on panic.

pstore_blk begins at 'blkz_register', by witch block device can register
a block device to pstore_blk. Then pstore_blk divide and manage the
block device as zones, which is similar to pstore_ram.

Recommend that, block driver register pstore_blk after block device is
ready.

pstore_blk works well on allwinner(sunxi) platform.

Signed-off-by: liaoweixiong 
---
 fs/pstore/Kconfig  |8 +
 fs/pstore/Makefile |3 +
 fs/pstore/blkzone.c| 1031 
 include/linux/pstore_blk.h |   80 
 4 files changed, 1122 insertions(+)
 create mode 100644 fs/pstore/blkzone.c
 create mode 100644 include/linux/pstore_blk.h

diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index 8b3ba27..defcb75 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -152,3 +152,11 @@ config PSTORE_RAM
  "ramoops.ko".
 
  For more information, see Documentation/admin-guide/ramoops.rst.
+
+config PSTORE_BLK
+   tristate "Log panic/oops to a block device"
+   depends on PSTORE
+   depends on BLOCK
+   help
+ This enables panic and oops message to be logged to a block dev
+ where it can be read back at some later point.
diff --git a/fs/pstore/Makefile b/fs/pstore/Makefile
index 967b589..0ee2fc8 100644
--- a/fs/pstore/Makefile
+++ b/fs/pstore/Makefile
@@ -12,3 +12,6 @@ pstore-$(CONFIG_PSTORE_PMSG)  += pmsg.o
 
 ramoops-objs += ram.o ram_core.o
 obj-$(CONFIG_PSTORE_RAM)   += ramoops.o
+
+obj-$(CONFIG_PSTORE_BLK) += pstore_blk.o
+pstore_blk-y += blkzone.o
diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
new file mode 100644
index 000..cba55b3
--- /dev/null
+++ b/fs/pstore/blkzone.c
@@ -0,0 +1,1031 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *
+ * blkzone.c: Block device Oops/Panic logger
+ *
+ * Copyright (C) 2019 liaoweixiong 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define MODNAME "pstore-blk"
+#define pr_fmt(fmt) MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define PSTORE_BLKDEV "/dev/pstore-blk"
+
+/**
+ * struct blkz_head - head of zone to flush to storage
+ *
+ * @sig: signature to indicate header (BLK_SIG xor BLKZONE-type value)
+ * @datalen: length of data in @data
+ * @data: zone data.
+ */
+struct blkz_buffer {
+#define BLK_SIG (0x43474244) /* DBGC */
+   uint32_t sig;
+   atomic_t datalen;
+   uint8_t data[];
+};
+
+/**
+ * struct blkz_dmesg_header: dmesg information
+ *
+ * @magic: magic num for dmesg header
+ * @time: trigger time
+ * @compressed: whether conpressed
+ * @count: oops/panic counter
+ * @reason: identify oops or panic
+ */
+struct blkz_dmesg_header {
+#define DMESG_HEADER_MAGIC 0x4dfc3ae5
+   uint32_t magic;
+   struct timespec64 time;
+   bool compressed;
+   uint32_t counter;
+   enum kmsg_dump_reason reason;
+   uint8_t data[0];
+};
+
+/**
+ * struct blkz_zone - zone information
+ * @off:
+ * zone offset of block device
+ * @type:
+ * frontent type for this zone
+ * @name:
+ * frontent name for this zone
+ * @buffer:
+ * pointer to data buffer managed by this zone
+ * @buffer_size:
+ * bytes in @buffer->data
+ * @should_recover:
+ * should recover from storage
+ * @dirty:
+ * mark whether the data in @buffer are dirty (not flush to storage yet)
+ */
+struct blkz_zone {
+   unsigned long off;
+   const char *name;
+   enum pstore_type_id type;
+
+   struct blkz_buffer *buffer;
+   size_t buffer_size;
+   bool should_recover;
+   atomic_t dirty;
+};
+
+struct blkz_context {
+   struct blkz_zone **dbzs;/* dmesg block zones */
+   unsigned int dmesg_max_cnt;
+   unsigned int dmesg_read_cnt;
+   unsigned int dmesg_write_cnt;
+   /*
+* 

[PATCH v11 3/4] pstore/blk: support pmsg for pstore block

2019-02-26 Thread liaoweixiong
To enable pmsg, just set pmsg_size when block device register blkzone.

Signed-off-by: liaoweixiong 
---
 fs/pstore/Kconfig  |  21 
 fs/pstore/blkoops.c|  10 ++
 fs/pstore/blkzone.c| 253 +
 include/linux/pstore_blk.h |   1 +
 4 files changed, 264 insertions(+), 21 deletions(-)

diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
index 7dfe00b..b417bf5 100644
--- a/fs/pstore/Kconfig
+++ b/fs/pstore/Kconfig
@@ -210,6 +210,27 @@ config PSTORE_BLKOOPS_DMESG_SIZE
 It is the first priority. Take care of that blkoops will take lower
 priority settings if higher priority one do not set.
 
+config PSTORE_BLKOOPS_PMSG_SIZE
+   int "pmsg size in kbytes for blkoops"
+   depends on PSTORE_BLKOOPS
+   default 64
+   help
+ This just sets size of pmsg (pmsg_size) for pstore/blk. The value must
+ be a multiple of 4096. Pmsg work only if "blkdev" is set.
+
+ NOTE that, there are three ways to set parameters of blkoops and
+ prioritize according to configuration flexibility. That is
+ Kconfig < device tree < module parameters. It means that the value can
+ be overwritten by higher priority settings.
+ 1. Kconfig
+It just sets a default value.
+ 2. device tree
+It is set on device tree, which will overwrites value from Kconfig,
+but can also be overwritten by module parameters.
+ 3. module parameters
+It is the first priority. Take care of that blkoops will take lower
+priority settings if higher priority one do not set.
+
 config PSTORE_BLKOOPS_TOTAL_SIZE
int "total size in kbytes for blkoops"
depends on PSTORE_BLKOOPS
diff --git a/fs/pstore/blkoops.c b/fs/pstore/blkoops.c
index 22c0c84..05140fd 100644
--- a/fs/pstore/blkoops.c
+++ b/fs/pstore/blkoops.c
@@ -30,6 +30,10 @@
 module_param(dmesg_size, long, 0400);
 MODULE_PARM_DESC(dmesg_size, "demsg size in kbytes");
 
+static long pmsg_size = -1;
+module_param(pmsg_size, long, 0400);
+MODULE_PARM_DESC(pmsg_size, "pmsg size in kbytes");
+
 static long total_size = -1;
 module_param(total_size, long, 0400);
 MODULE_PARM_DESC(total_size, "total size in kbytes");
@@ -47,11 +51,13 @@ struct blkz_info blkz_info = {
 
 struct blkoops_info {
unsigned long dmesg_size;
+   unsigned long pmsg_size;
unsigned long total_size;
const char *blkdev;
 };
 struct blkoops_info blkoops_info = {
.dmesg_size = CONFIG_PSTORE_BLKOOPS_DMESG_SIZE * 1024,
+   .pmsg_size = CONFIG_PSTORE_BLKOOPS_PMSG_SIZE * 1024,
.total_size = CONFIG_PSTORE_BLKOOPS_TOTAL_SIZE * 1024,
.blkdev = CONFIG_PSTORE_BLKOOPS_BLKDEV,
 };
@@ -104,6 +110,7 @@ static int blkoops_probe(struct platform_device *pdev)
 
check_size(total_size, 4096);
check_size(dmesg_size, 4096);
+   check_size(pmsg_size, 4096);
 
 #undef check_size
 
@@ -112,6 +119,7 @@ static int blkoops_probe(struct platform_device *pdev)
 * through /sys/module/blkoops/parameters/
 */
dmesg_size = blkz_info.dmesg_size;
+   pmsg_size = blkz_info.pmsg_size;
total_size = blkz_info.total_size;
if (blkz_info.blkdev)
strncpy(blkdev, blkz_info.blkdev, 80 - 1);
@@ -156,6 +164,8 @@ void blkoops_register_dummy(void)
info->blkdev = (const char *)blkdev;
if (dmesg_size >= 0)
info->dmesg_size = (unsigned long)dmesg_size * 1024;
+   if (pmsg_size >= 0)
+   info->pmsg_size = (unsigned long)pmsg_size * 1024;
} else if (info->total_size > 0 || strlen(info->blkdev)) {
pr_info("using kconfig value\n");
} else {
diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c
index cba55b3..cd3d4ed 100644
--- a/fs/pstore/blkzone.c
+++ b/fs/pstore/blkzone.c
@@ -40,12 +40,14 @@
  *
  * @sig: signature to indicate header (BLK_SIG xor BLKZONE-type value)
  * @datalen: length of data in @data
+ * @start: offset into @data where the beginning of the stored bytes begin
  * @data: zone data.
  */
 struct blkz_buffer {
 #define BLK_SIG (0x43474244) /* DBGC */
uint32_t sig;
atomic_t datalen;
+   atomic_t start;
uint8_t data[];
 };
 
@@ -78,6 +80,9 @@ struct blkz_dmesg_header {
  * frontent name for this zone
  * @buffer:
  * pointer to data buffer managed by this zone
+ * @oldbuf:
+ * pointer to old data buffer. It is used for single zone such as pmsg,
+ * saving the old buffer.
  * @buffer_size:
  * bytes in @buffer->data
  * @should_recover:
@@ -91,6 +96,7 @@ struct blkz_zone {
enum pstore_type_id type;
 
struct blkz_buffer *buffer;
+   struct blkz_buffer *oldbuf;
size_t buffer_size;
bool should_recover;
atomic_t dirty;
@@ -98,8 +104,10 @@ struct blkz_zone {
 
 struct blkz_context {
struct blkz_zo