date:20171227

Re: [RFC PATCH bpf-next v2 4/4] error-injection: Support fault injection framework

2017-12-27 Thread Masami Hiramatsu

On Tue, 26 Dec 2017 18:12:56 -0800
Alexei Starovoitov  wrote:

> On Tue, Dec 26, 2017 at 04:48:25PM +0900, Masami Hiramatsu wrote:
> > Support in-kernel fault-injection framework via debugfs.
> > This allows you to inject a conditional error to specified
> > function using debugfs interfaces.
> > 
> > Signed-off-by: Masami Hiramatsu 
> > ---
> >  Documentation/fault-injection/fault-injection.txt |5 +
> >  kernel/Makefile   |1 
> >  kernel/fail_function.c|  169 
> > +
> >  lib/Kconfig.debug |   10 +
> >  4 files changed, 185 insertions(+)
> >  create mode 100644 kernel/fail_function.c
> > 
> > diff --git a/Documentation/fault-injection/fault-injection.txt 
> > b/Documentation/fault-injection/fault-injection.txt
> > index 918972babcd8..6243a588dd71 100644
> > --- a/Documentation/fault-injection/fault-injection.txt
> > +++ b/Documentation/fault-injection/fault-injection.txt
> > @@ -30,6 +30,11 @@ o fail_mmc_request
> >injects MMC data errors on devices permitted by setting
> >debugfs entries under /sys/kernel/debug/mmc0/fail_mmc_request
> >  
> > +o fail_function
> > +
> > +  injects error return on specific functions by setting debugfs entries
> > +  under /sys/kernel/debug/fail_function. No boot option supported.
> 
> I like it.
> Could you document it a bit better?

Yes, I will do in next series.

> In particular retval is configurable, but without an example no one
> will be able to figure out how to use it.

Ah, right. BTW, as I pointed in the covermail, should we store the
expected error value range into the injectable list? e.g.

ALLOW_ERROR_INJECTION(open_ctree, -1, -MAX_ERRNO)

And provide APIs to check/get it.

const struct error_range *ei_get_error_range(unsigned long addr);


> 
> I think you can drop RFC tag from the next version of these patches.
> Thanks!

Thank you, I'll fix some errors came from configurations, and resend it.


Thanks!


-- 
Masami Hiramatsu

Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop

2017-12-27 Thread Dou Liyang


Hi Alexandru,

At 12/24/2017 04:01 AM, Alexandru Chirvasitu wrote:

On Sat, Dec 23, 2017 at 02:32:52PM +0100, Thomas Gleixner wrote:

On Sat, 23 Dec 2017, Dexuan Cui wrote:


From: Alexandru Chirvasitu [mailto:achirva...@gmail.com]
Sent: Friday, December 22, 2017 14:29

The output of that precise command run just now on a freshly-compiled
copy of that commit is attached.

On Fri, Dec 22, 2017 at 09:31:28PM +, Dexuan Cui wrote:

From: Alexandru Chirvasitu [mailto:achirva...@gmail.com]
Sent: Friday, December 22, 2017 06:21

In the absence of logs, the best I can do at the moment is attach a
picture of the screen I am presented with on the  boot
attempt.
Alex


The panic happens in irq_matrix_assign_system+0x4e/0xd0 in your picture.
IMO we should find which line of code causes the panic. I suppose
"objdump -D kernel/irq/matrix.o" can help to do that.

Thanks,
-- Dexuan


The BUG_ON panic happens at line 147:
BUG_ON(!test_and_clear_bit(bit, cm->alloc_map));



There are 2 bugs in your laptop:

  1. Hard lockups on both CPUs after login
  2. panic with "apic=debug"

For the 2th bug, please try the following patch(need Thomas confirmation
:) ) in Linux 4.15-rc5. I think it can fix the panic.

If the 2th bug fixed, let's back to the 1th bug:

Is Linus current head 4.15-rc5 bad as well?

If yes, Please using "apic=debug" and give the dmesg log.

Thanks,
dou.

8<---

irq/matrix: Remove the overused BUGON() in irq_matrix_assign_system()

Currently, x86 marks the preallocated legacy interrupts when initializing
IRQ(native_init_IRQ), but will clear them if they are not activated in
vector_configure_legacy().

So, in irq_matrix_assign_system(), replacing an legacy vector which may
not allocated in a cpumap->alloc_map[] with a system vector will trigger
the BUGON();

Remove the BUGON().

Signed-off-by: Dou Liyang 
---
 kernel/irq/matrix.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
index 0ba0dd8863a7..876cbeab9ca2 100644
--- a/kernel/irq/matrix.c
+++ b/kernel/irq/matrix.c
@@ -143,11 +143,12 @@ void irq_matrix_assign_system(struct irq_matrix 
*m, unsigned int bit,

BUG_ON(m->online_maps > 1 || (m->online_maps && !replace));

set_bit(bit, m->system_map);
-   if (replace) {
-   BUG_ON(!test_and_clear_bit(bit, cm->alloc_map));
+
+   if (replace && test_and_clear_bit(bit, cm->alloc_map)){
cm->allocated--;
m->total_allocated--;
}
+
if (bit >= m->alloc_start && bit < m->alloc_end)
m->systembits_inalloc++;

--

Re: [PATCH 10/11 v3] ARM: s3c24xx/s3c64xx: constify gpio_led

2017-12-27 Thread Krzysztof Kozlowski

On Tue, Dec 26, 2017 at 7:50 PM, Arvind Yadav  wrote:
> gpio_led are not supposed to change at runtime.
> struct gpio_led_platform_data working with const gpio_led
> provided by . So mark the non-const structs
> as const.
>
> Signed-off-by: Arvind Yadav 
> ---
> changes in v2:
>   The GPIO LED driver can be built as a module, it can
>   be loaded after the init sections have gone away.
>   So removed '__initconst'.
> changes in v3:
>  Description was missing.
>
>  arch/arm/mach-s3c24xx/mach-h1940.c| 2 +-
>  arch/arm/mach-s3c24xx/mach-rx1950.c   | 2 +-
>  arch/arm/mach-s3c64xx/mach-hmt.c  | 2 +-
>  arch/arm/mach-s3c64xx/mach-smartq5.c  | 2 +-
>  arch/arm/mach-s3c64xx/mach-smartq7.c  | 2 +-
>  arch/arm/mach-s3c64xx/mach-smdk6410.c | 2 +-
>  6 files changed, 6 insertions(+), 6 deletions(-)

There were few build errors reported by kbuild for your patches. Are
you sure that you compiled every file you touch?

Best regards,
Krzysztof

[PATCH] ext4: Remove repeated test in ext4_file_read_iter.

2017-12-27 Thread Sean Fu

generic_file_read_iter has done the count test.
So ext4_file_read_iter don't need to test the count repeatedly.

Signed-off-by: Sean Fu 
---
 fs/ext4/file.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index a0ae27b..87ca13e 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -67,9 +67,6 @@ static ssize_t ext4_file_read_iter(struct kiocb *iocb, struct 
iov_iter *to)
if 
(unlikely(ext4_forced_shutdown(EXT4_SB(file_inode(iocb->ki_filp)->i_sb
return -EIO;
 
-   if (!iov_iter_count(to))
-   return 0; /* skip atime */
-
 #ifdef CONFIG_FS_DAX
if (IS_DAX(file_inode(iocb->ki_filp)))
return ext4_dax_read_iter(iocb, to);
-- 
2.6.2

Re: [PATCH 1/2] MIPS: math-emu: Do not export function `srl128`

2017-12-27 Thread Mathieu Malaterre

On Tue, Dec 26, 2017 at 7:10 PM, Aleksandar Markovic
 wrote:
>> > Fix non-fatal warning:
>> >
>> > arch/mips/math-emu/dp_maddf.c:19:6: warning: no previous prototype for 
>> > ‘srl128’ [-Wmissing-prototypes]
>> >  void srl128(u64 *hptr, u64 *lptr, int count)
>> >
>> > Signed-off-by: Mathieu Malaterre 
>> > ---
>> >  arch/mips/math-emu/dp_maddf.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/arch/mips/math-emu/dp_maddf.c b/arch/mips/math-emu/dp_maddf.c
>> > index 7ad79ed411f5..0e2278a47f43 100644
>> > --- a/arch/mips/math-emu/dp_maddf.c
>> > +++ b/arch/mips/math-emu/dp_maddf.c
>> > @@ -16,7 +16,7 @@
>> >
>> >
>> >  /* 128 bits shift right logical with rounding. */
>> > -void srl128(u64 *hptr, u64 *lptr, int count)
>> > +static void srl128(u64 *hptr, u64 *lptr, int count)
>> >  {
>> >  u64 low;
>> >
>> > --
>> > 2.11.0
>>
>> Acked-by: Aleksandar Markovic 
>
> However, there is an already submitted patch: (the code change is identical)
>
> https://www.linux-mips.org/archives/linux-mips/2017-11/msg00044.html
>
> Status of that patch on patchwork is "Accepted".

Sorry did not realized you sent it already.

Thanks

Re: [PATCH v2] arm64: dts: Hi3660: Fix up psci state id

2017-12-27 Thread Vincent Guittot

Hi Leo

On 25 December 2017 at 03:22, Leo Yan  wrote:
> Hi Vincent,
>
> [ + John, Kevin Wang ]
>
> On Fri, Dec 22, 2017 at 03:22:51PM +0100, Vincent Guittot wrote:
>> Hi Leo,
>>
>> Sorry for jumping late in the discussion but should  we also remove
>> the NAP state from the property cpu-idle-states of the CPUs because
>> this state not supported by the platform at least for now and may be
>> not in a near future ?
>
> Thanks for bringing up this.
>
> I don't want to hide anything for patch discussion :) this patch is to
> resolve the PSCI parameter mismatching issue between kernel and ARM-TF
> and it's not used to resolve the bug for CPU_NAP, so I didn't mention
> the CPU_NAP malfunction issue to avoid complex discussion context.
>
> I want to keep CPU_NAP state and track bug for CPU_NAP fixing; if we
> remove this state, I suspect we might have no chance to enable it
> anymore. Finally this is up to Hisilicon colleague decision and if they
> have time to fix this.
>
> I will offline to check with Daniel and Kevin for this; and if we
> finally decide to remove it we can commit extra patch for this later,
> how about you think?

I would prefer to remove it right now.
Removing NAP from c-state table makes the hikey960 working correctly;
I mean even with current ATF and current state id. So it's the best
solution to the NAP problem IMO and I don't see the benefit of keeping
NAP in the table until this state has been fixed. This will just add
uncertainties in the behavior of the board.
I don't see why you can't re-add it once it has been fixed.

>
>> Then, I have another question regarding the update of the
>> psci-suspend-parameter. These changes implies an update of the psci
>> firmawre which means that we will now have 2 different firmware
>> version compatible with 2 different dt.
>>
>> Is there any way to check that the ATF on the board is the one that
>> compatible with the parameter with something like a version ? I
>> currently use the previous firmware which works fine with current
>> kernel and dt binding once the NAP state is removed from the table.
>> When moving on recent kernel, I will have to take care of updating the
>> firmware and if i need to go back on a previous kernel, i will have to
>> make sure that i have the right ATF version. This make a lot of chance
>> of having the wrong configuration
>
> AFAIK, we cannot distinguish the PSCI parameter by PSCI version or

And that's my main concern because this adds a new possible regression
factor when switching between different kernel version

> ARM-TF version number; alternatively one simple way for checking ARM-TF
> is we can get commit ID (e.g. 83df7ce) from the ARM-TF log; so any
> ARM-TF commit ID is newer than the patch fdae60b6ba27: "Hikey960:
> Change to use recommended power state id format" should apply this
> kernel patch.
>
> NOTICE:  BL1: Booting BL31
> NOTICE:  BL31: v1.4(debug):v1.4-441-g83df7ce-dirty
> NOTICE:  BL31: Built : 17:31:35, Dec 22 2017
>
> BTW, I hope we can upgrade Linux kernel and ARM-TF to latest code base
> to avoid compatible issue; for Android offical releasing it uses the
> old PSCI parameters with Hisilicon legacy booting images, so they can
> work well, but if someone uses ARM-TF mainline code + Android kernel
> 4.4/4.9, there must have compatible issue.
>
> I am monitoring the integration ARM-TF/UEFI into Android on Hikey960,
> we need backport this patch onto Android kernel 4.4/4.9 ASAP after
> integration ARM-TF/UEFI.
>
> Thanks,
> Leo Yan
>
>> Regards,
>> Vincent
>>
>> On 12 December 2017 at 10:12, Leo Yan  wrote:
>> > Thanks a lot for Vincent Guittot careful work to find bug for 'CPU_NAP'
>> > idle state.  From ftrace log we can observe CA73 CPUs can be easily
>> > waken up from 'CPU_NAP' state but the 'waken up' CPUs doesn't handle
>> > anything and sleep again; so there have tons of trace events for CA73
>> > CPUs entering and exiting idle state.
>> >
>> > On Hi3660 CA73 has retention state 'CPU_NAP' for CPU idle, this state we
>> > set its psci parameter as '0x001' and from this parameter it can
>> > calculate state id is 1.  Unfortunately ARM trusted firmware (ARM-TF)
>> > takes 1 as a invalid value for state id, so the CPU cannot enter idle
>> > state and directly bail out to kernel.
>> >
>> > We want to create good practice for psci parameters platform definition,
>> > so review the psci specification. The spec "ARM Power State Coordination
>> > Interface - Platform Design Document (ARM DEN 0022D)" recommends state
>> > ID in chapter "6.5 Recommended StateID Encoding".  The recommended power
>> > state IDs can be presented by below listed values; and it divides into
>> > three fields, every field can use 4 bits to present power states
>> > corresponding to core level, cluster level and system level:
>> >   0: Run
>> >   1: Standby
>> >   2: Retention
>> >   3: Powerdown
>> >
>> > This commit changes psci parameter to compliance with the suggested
>> > state ID in the doc.  Except we change 'CPU

Re: [GIT PULL] tee dynamic shm for v4.16

2017-12-27 Thread Jens Wiklander

On Mon, Dec 25, 2017 at 01:22:18PM -0800, thomas zeng wrote:
> 
> 
> On 2017年12月21日 08:30, Arnd Bergmann wrote:
> > On Fri, Dec 15, 2017 at 2:21 PM, Jens Wiklander
> >  wrote:
> > > Hello arm-soc maintainers,
> > > 
> > > Please pull these tee driver changes. This implements support for dynamic
> > > shared memory support in OP-TEE. More specifically is enables mapping of
> > > user space memory in secure world to be used as shared memory.
> > > 
> > > This has been reviewed and refined by the OP-TEE community at various
> > > places on Github during the last year. An earlier version of this pull
> > > request is used in the latest OP-TEE release (2.6.0). This has also been
> > > reviewed recently at the kernel mailing lists, with all comments from
> > > Mark Rutland  and Yury Norov
> > >  addressed as far as I can tell.
> > > 
> > > This isn't a bugfix so I'm aiming for the next merge window.
> > Given that Mark and Yury reviewed this, I'm assuming this is all
> > good and have now merged it. However I missed the entire discussion
> > about it, so I have one question about the implementation:
> > 
> > What happens when user space passes a buffer that is not
> > backed by regular memory but instead is something it has itself
> > mapped from a device with special page attributes or physical
> > properties? Could this be inconsistent when optee and user
> > space disagree on the caching attributes? Can you get into
> > trouble if you pass an area from a device that is read-only
> > in user space but writable from secure world?

Read-only memory is dealt with by calling get_user_pages_fast() with
the 'write' parameter set to 1.

Mismatch in cache attributes isn't addressed though. This is something
that should be checked in the OP-TEE driver, typically
drivers/tee/optee/core.c.

I would like to add another patch on top of this patch series to guard
against cache attributes which aren't normal cached memory. So far I
haven't been able to find a nice way of doing that, I'd appreciate any
advice of idea of how to deal with this.

> 
> Just recently, we have started to kick the tires of these "shm" related Gen
> Tee Driver patches.  And we have in the past encountered real world
> scenarios requiring some of the shared memory regions to be marked as
> "normal IC=0 and OC=0" in EL2 or SEL1, or else HW would misbehave. We worked
> around by hacking the boot code but that works if the regions are
> pre-allocated. Since now these regions can also be managed dynamically, we
> definitely agree with Arnd Bergmann that the dynamic registration SMC
> commands, and potention the SHM IOCTL commands, must convey cache
> intentions. Is it possible to take this requirement into consideration, in
> this iteration or the follow on?

I'd be happy to discuss using different cache attributes outside this
patch series. We have so far avoided specifying cache attributes by
calling it normal cached memory. Now that we have one use case we're
able take the next step here.

Thanks,
Jens

Re: [PATCH 2/2] MIPS: math-emu: Declare ys variable as possibly unused

2017-12-27 Thread Mathieu Malaterre

Aleksandar,

On Tue, Dec 26, 2017 at 4:12 PM, Aleksandar Markovic
 wrote:
>> Fix non-fatal warning:
>>
>> arch/mips/math-emu/sp_fdp.c: In function ‘ieee754sp_fdp’:
>> arch/mips/math-emu/ieee754int.h:60:31: warning: variable ‘ys’ set but not 
>> used [-Wunused-but-set-variable]
>>   unsigned int ym; int ye; int ys; int yc
>>^
>> arch/mips/math-emu/sp_fdp.c:37:2: note: in expansion of macro ‘COMPYSP’
>>   COMPYSP;
>>   ^~~
>>
>> Signed-off-by: Mathieu Malaterre 
>> ---
>>  arch/mips/math-emu/ieee754int.h | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/mips/math-emu/ieee754int.h 
>> b/arch/mips/math-emu/ieee754int.h
>> index 06ac0e2ac7ac..cb8f04cd24bf 100644
>> --- a/arch/mips/math-emu/ieee754int.h
>> +++ b/arch/mips/math-emu/ieee754int.h
>> @@ -57,7 +57,7 @@ static inline int ieee754_class_nan(int xc)
>>  unsigned int xm; int xe; int xs __maybe_unused; int xc
>>
>>  #define COMPYSP \
>> -   unsigned int ym; int ye; int ys; int yc
>> +   unsigned int ym; int ye; int ys __maybe_unused; int yc
>>
>>  #define COMPZSP \
>>  unsigned int zm; int ze; int zs; int zc
>
> This will silence the warning, but will do it for all future cases of unused
> ys too - in other words, it may well silence even useful, valid warnings.
> Also, this introduces an inconsistency among COMPXSP, COMPYSP, and COMPZSP
> macros.
>
> A better solution would be to reduce the scope of ys, so that it is always
> used, if declared. Instead of this code segment (in 
> arch/mips/math-emu/sp_fdp.c):
>
> union ieee754sp ieee754sp_fdp(union ieee754dp x)
> {
> union ieee754sp y;
> u32 rm;
>
> COMPXDP;
> COMPYSP;
>
> EXPLODEXDP;
>
> ieee754_clearcx();
>
> FLUSHXDP;
>
> switch (xc) {
> case IEEE754_CLASS_SNAN:
> x = ieee754dp_nanxcpt(x);
> EXPLODEXDP;
> /* Fall through.  */
> case IEEE754_CLASS_QNAN:
> y = ieee754sp_nan_fdp(xs, xm);
> if (!ieee754_csr.nan2008) {
> EXPLODEYSP;
> if (!ieee754_class_nan(yc))
> y = ieee754sp_indef();
> }
> return y;
>
>
> ... should be the following: (COMPYSP is moved to a smaller code block)
>
> union ieee754sp ieee754sp_fdp(union ieee754dp x)
> {
> union ieee754sp y;
> u32 rm;
>
> COMPXDP;
>
> EXPLODEXDP;
>
> ieee754_clearcx();
>
> FLUSHXDP;
>
> switch (xc) {
> case IEEE754_CLASS_SNAN:
> x = ieee754dp_nanxcpt(x);
> EXPLODEXDP;
> /* Fall through.  */
> case IEEE754_CLASS_QNAN:
> {
> COMPYSP;
>
> y = ieee754sp_nan_fdp(xs, xm);
> if (!ieee754_csr.nan2008) {
> EXPLODEYSP;
> if (!ieee754_class_nan(yc))
> y = ieee754sp_indef();
> }
> return y;
> }
>


Thanks for the suggestion. However the sign bit is still not used, so
the warning is still there. Just for clarity did you see that:

 #define COMPXSP \
   unsigned int xm; int xe; int xs __maybe_unused; int xc

I'll try to give it some more thoughts, and come up with something
hopefully working.

-M

Re: [PATCH] x86/cpu, x86/pti: Do not enable PTI on AMD processors

2017-12-27 Thread Dave Hansen

On 12/26/2017 09:43 PM, Tom Lendacky wrote:
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -923,8 +923,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 
> *c)
>  
>   setup_force_cpu_cap(X86_FEATURE_ALWAYS);
>  
> - /* Assume for now that ALL x86 CPUs are insecure */
> - setup_force_cpu_bug(X86_BUG_CPU_INSECURE);
> + if (c->x86_vendor != X86_VENDOR_AMD)
> + setup_force_cpu_bug(X86_BUG_CPU_INSECURE);

Does this disable it in a way that it can be turned back on via the
kernel command-line?

This is a rather wide class of issues and I would rather not just
hard-code it in a way that we say one vendor has never and will never be
affected.

[PATCH v6 0/8] add support for relative references in special sections

2017-12-27 Thread Ard Biesheuvel

This adds support for emitting special sections such as initcall arrays,
PCI fixups and tracepoints as relative references rather than absolute
references. This reduces the size by 50% on 64-bit architectures, but
more importantly, it removes the need for carrying relocation metadata
for these sections in relocatables kernels (e.g., for KASLR) that need
to fix up these absolute references at boot time. On arm64, this reduces
the vmlinux footprint of such a reference by 8x (8 byte absolute reference
+ 24 byte RELA entry vs 4 byte relative reference)

Patch #2 was sent out before as a single patch. This series supersedes
the previous submission. This version makes relative ksymtab entries
dependent on the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS rather
than trying to infer from kbuild test robot replies for which architectures
it should be blacklisted.

Patch #1 introduces the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS,
and sets it for the main architectures that are expected to benefit the
most from this feature, i.e., 64-bit architectures or ones that use
runtime relocations.

Patches #3 - #5 implement relative references for initcalls, PCI fixups
and tracepoints, respectively, all of which produce sections with order
~1000 entries on an arm64 defconfig kernel with tracing enabled. This
means we save about 28 KB of vmlinux space for each of these patches.

Patches #6 - #8 have been added in v5, and implement relative references
in jump tables for arm64 and x86. On arm64, this results in significant
space savings (650+ KB on a typical distro kernel). On x86, the savings
are not as impressive, but still worthwhile. (Note that these patches
do not rely on CONFIG_HAVE_ARCH_PREL32_RELOCATIONS, given that the
inline asm that is emitted is already per-arch)

For the arm64 kernel, all patches combined reduce the memory footprint of
vmlinux by about 1.3 MB (using a config copied from Ubuntu that has KASLR
enabled), of which ~1 MB is the size reduction of the RELA section in .init,
and the remaining 300 KB is reduction of .text/.data.

Branch:
git://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git 
relative-special-sections-v6

Changes since v5:
- add missing jump_label prototypes to s390 jump_label.h (#6)
- fix inverted condition in call to jump_entry_is_module_init() (#6)

Changes since v4:
- add patches to convert x86 and arm64 to use relative references for jump
  tables (#6 - #8)
- rename PCI patch and add Bjorn's ack (#4)
- rebase onto v4.15-rc5

Changes since v3:
- fix module unload issue in patch #5 reported by Jessica, by reusing the
  updated routine for_each_tracepoint_range() for the quiescent check at
  module unload time; this requires this routine to be moved before
  tracepoint_module_going() in kernel/tracepoint.c
- add Jessica's ack to #2
- rebase onto v4.14-rc1

Changes since v2:
- Revert my slightly misguided attempt to appease checkpatch, which resulted
  in needless churn and worse code. This v3 is based on v1 with a few tweaks
  that were actually reasonable checkpatch warnings: unnecessary braces (as
  pointed out by Ingo) and other minor whitespace misdemeanors.

Changes since v1:
- Remove checkpatch errors to the extent feasible: in some cases, this
  involves moving extern declarations into C files, and switching to
  struct definitions rather than typedefs. Some errors are impossible
  to fix: please find the remaining ones after the diffstat.
- Used 'int' instead if 'signed int' for the various offset fields: there
  is no ambiguity between architectures regarding its signedness (unlike
  'char')
- Refactor the different patches to be more uniform in the way they define
  the section entry type and accessors in the .h file, and avoid the need to
  add #ifdefs to the C code.

Cc: "H. Peter Anvin" 
Cc: Ralf Baechle 
Cc: Arnd Bergmann 
Cc: Heiko Carstens 
Cc: Kees Cook 
Cc: Will Deacon 
Cc: Michael Ellerman 
Cc: Thomas Garnier 
Cc: Thomas Gleixner 
Cc: "Serge E. Hallyn" 
Cc: Bjorn Helgaas 
Cc: Benjamin Herrenschmidt 
Cc: Russell King 
Cc: Paul Mackerras 
Cc: Catalin Marinas 
Cc: "David S. Miller" 
Cc: Petr Mladek 
Cc: Ingo Molnar 
Cc: James Morris 
Cc: Andrew Morton 
Cc: Nicolas Pitre 
Cc: Josh Poimboeuf 
Cc: Steven Rostedt 
Cc: Martin Schwidefsky 
Cc: Sergey Senozhatsky 
Cc: Linus Torvalds 
Cc: Jessica Yu 

Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: x...@kernel.org

Ard Biesheuvel (8):
  arch: enable relative relocations for arm64, power, x86, s390 and x86
  module: use relative references for __ksymtab entries
  init: allow initcall tables to be emitted using relative references
  PCI: Add support for relative addressing in quirk tables
  kernel: tracepoints: add support for relative references
  kernel/jump_label: abstract jump_entry member accessors
  arm64/kernel: jump_label: use relative references
  x86/kern

[PATCH] perf test shell: Add -D to check dynamic symbols for ubuntu/debian

2017-12-27 Thread Li Zhijian

On Ubuntu and Debian, we can't find any symbol including "inet_pton" from 'nm 
-g'
root@vm-lkp-nex04-8G-5 ~# nm -g /lib/x86_64-linux-gnu/libc-2.25.so | grep 
inet_pton
nm: /lib/x86_64-linux-gnu/libc-2.25.so: no symbols

it looks libc.so has different symbol compositions at different distros

Usage: nm [option(s)] [file(s)]
 List symbols in [file(s)] (a.out by default).
 The options are:
...snip...
  -D, --dynamic  Display dynamic symbols instead of normal symbols
  --defined-only Display only defined symbols
  -e (ignored)
  -f, --format=FORMATUse the output format FORMAT.  FORMAT can be `bsd',
   `sysv' or `posix'.  The default is `bsd'
  -g, --extern-only  Display only external symbols

I tested both debian/ubuntu and RHEL, they work as expected

CC: Thomas Richter 
CC: Arnaldo Carvalho de Melo 
Signed-off-by: Li Zhijian 
---
 tools/perf/tests/shell/trace+probe_libc_inet_pton.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/tests/shell/trace+probe_libc_inet_pton.sh 
b/tools/perf/tests/shell/trace+probe_libc_inet_pton.sh
index 8b3da21..f939bd6 100755
--- a/tools/perf/tests/shell/trace+probe_libc_inet_pton.sh
+++ b/tools/perf/tests/shell/trace+probe_libc_inet_pton.sh
@@ -11,7 +11,7 @@
 . $(dirname $0)/lib/probe.sh
 
 libc=$(grep -w libc /proc/self/maps | head -1 | sed -r 
's/.*[[:space:]](\/.*)/\1/g')
-nm -g $libc 2>/dev/null | fgrep -q inet_pton || exit 254
+nm -gD $libc 2>/dev/null | fgrep -q inet_pton || exit 254
 
 trace_libc_inet_pton_backtrace() {
idx=0
-- 
2.7.4

[PATCH v6 6/8] kernel/jump_label: abstract jump_entry member accessors

2017-12-27 Thread Ard Biesheuvel

In preparation of allowing architectures to use relative references
in jump_label entries [which can dramatically reduce the memory
footprint], introduce abstractions for references to the 'code' and
'key' members of struct jump_entry.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm/include/asm/jump_label.h | 27 ++
 arch/arm64/include/asm/jump_label.h   | 27 ++
 arch/mips/include/asm/jump_label.h| 27 ++
 arch/powerpc/include/asm/jump_label.h | 27 ++
 arch/s390/include/asm/jump_label.h| 20 +++
 arch/sparc/include/asm/jump_label.h   | 27 ++
 arch/tile/include/asm/jump_label.h| 27 ++
 arch/x86/include/asm/jump_label.h | 27 ++
 kernel/jump_label.c   | 38 +---
 9 files changed, 225 insertions(+), 22 deletions(-)

diff --git a/arch/arm/include/asm/jump_label.h 
b/arch/arm/include/asm/jump_label.h
index e12d7d096fc0..7b05b404063a 100644
--- a/arch/arm/include/asm/jump_label.h
+++ b/arch/arm/include/asm/jump_label.h
@@ -45,5 +45,32 @@ struct jump_entry {
jump_label_t key;
 };
 
+static inline jump_label_t jump_entry_code(const struct jump_entry *entry)
+{
+   return entry->code;
+}
+
+static inline struct static_key *jump_entry_key(const struct jump_entry *entry)
+{
+   return (struct static_key *)((unsigned long)entry->key & ~1UL);
+}
+
+static inline bool jump_entry_is_branch(const struct jump_entry *entry)
+{
+   return (unsigned long)entry->key & 1UL;
+}
+
+static inline bool jump_entry_is_module_init(const struct jump_entry *entry)
+{
+   return entry->code == 0;
+}
+
+static inline void jump_entry_set_module_init(struct jump_entry *entry)
+{
+   entry->code = 0;
+}
+
+#define jump_label_swapNULL
+
 #endif  /* __ASSEMBLY__ */
 #endif
diff --git a/arch/arm64/include/asm/jump_label.h 
b/arch/arm64/include/asm/jump_label.h
index 1b5e0e843c3a..9d6e46355c89 100644
--- a/arch/arm64/include/asm/jump_label.h
+++ b/arch/arm64/include/asm/jump_label.h
@@ -62,5 +62,32 @@ struct jump_entry {
jump_label_t key;
 };
 
+static inline jump_label_t jump_entry_code(const struct jump_entry *entry)
+{
+   return entry->code;
+}
+
+static inline struct static_key *jump_entry_key(const struct jump_entry *entry)
+{
+   return (struct static_key *)((unsigned long)entry->key & ~1UL);
+}
+
+static inline bool jump_entry_is_branch(const struct jump_entry *entry)
+{
+   return (unsigned long)entry->key & 1UL;
+}
+
+static inline bool jump_entry_is_module_init(const struct jump_entry *entry)
+{
+   return entry->code == 0;
+}
+
+static inline void jump_entry_set_module_init(struct jump_entry *entry)
+{
+   entry->code = 0;
+}
+
+#define jump_label_swapNULL
+
 #endif  /* __ASSEMBLY__ */
 #endif /* __ASM_JUMP_LABEL_H */
diff --git a/arch/mips/include/asm/jump_label.h 
b/arch/mips/include/asm/jump_label.h
index e77672539e8e..70df9293dc49 100644
--- a/arch/mips/include/asm/jump_label.h
+++ b/arch/mips/include/asm/jump_label.h
@@ -66,5 +66,32 @@ struct jump_entry {
jump_label_t key;
 };
 
+static inline jump_label_t jump_entry_code(const struct jump_entry *entry)
+{
+   return entry->code;
+}
+
+static inline struct static_key *jump_entry_key(const struct jump_entry *entry)
+{
+   return (struct static_key *)((unsigned long)entry->key & ~1UL);
+}
+
+static inline bool jump_entry_is_branch(const struct jump_entry *entry)
+{
+   return (unsigned long)entry->key & 1UL;
+}
+
+static inline bool jump_entry_is_module_init(const struct jump_entry *entry)
+{
+   return entry->code == 0;
+}
+
+static inline void jump_entry_set_module_init(struct jump_entry *entry)
+{
+   entry->code = 0;
+}
+
+#define jump_label_swapNULL
+
 #endif  /* __ASSEMBLY__ */
 #endif /* _ASM_MIPS_JUMP_LABEL_H */
diff --git a/arch/powerpc/include/asm/jump_label.h 
b/arch/powerpc/include/asm/jump_label.h
index 9a287e0ac8b1..412b2699c9f6 100644
--- a/arch/powerpc/include/asm/jump_label.h
+++ b/arch/powerpc/include/asm/jump_label.h
@@ -59,6 +59,33 @@ struct jump_entry {
jump_label_t key;
 };
 
+static inline jump_label_t jump_entry_code(const struct jump_entry *entry)
+{
+   return entry->code;
+}
+
+static inline struct static_key *jump_entry_key(const struct jump_entry *entry)
+{
+   return (struct static_key *)((unsigned long)entry->key & ~1UL);
+}
+
+static inline bool jump_entry_is_branch(const struct jump_entry *entry)
+{
+   return (unsigned long)entry->key & 1UL;
+}
+
+static inline bool jump_entry_is_module_init(const struct jump_entry *entry)
+{
+   return entry->code == 0;
+}
+
+static inline void jump_entry_set_module_init(struct jump_entry *entry)
+{
+   entry->code = 0;
+}
+
+#define jump_label_swapNULL
+
 #else
 #define ARCH_STATIC_BRANCH(LABEL, KEY) \
 1098:  nop;\
diff --git a/arch/s390/include/asm/jump_label.

[PATCH v6 5/8] kernel: tracepoints: add support for relative references

2017-12-27 Thread Ard Biesheuvel

To avoid the need for relocating absolute references to tracepoint
structures at boot time when running relocatable kernels (which may
take a disproportionate amount of space), add the option to emit
these tables as relative references instead.

Cc: Steven Rostedt 
Cc: Ingo Molnar 
Signed-off-by: Ard Biesheuvel 
---
 include/linux/tracepoint.h | 19 ++--
 kernel/tracepoint.c| 50 +++-
 2 files changed, 42 insertions(+), 27 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index a26ffbe09e71..d02bf1a695e8 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -228,6 +228,19 @@ extern void syscall_unregfunc(void);
return static_key_false(&__tracepoint_##name.key);  \
}
 
+#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
+#define __TRACEPOINT_ENTRY(name)\
+   asm("   .section \"__tracepoints_ptrs\", \"a\"   \n" \
+   "   .balign 4\n" \
+   "   .long " VMLINUX_SYMBOL_STR(__tracepoint_##name) " - .\n" \
+   "   .previous\n")
+#else
+#define __TRACEPOINT_ENTRY(name)\
+   static struct tracepoint * const __tracepoint_ptr_##name __used  \
+   __attribute__((section("__tracepoints_ptrs"))) = \
+   &__tracepoint_##name
+#endif
+
 /*
  * We have no guarantee that gcc and the linker won't up-align the tracepoint
  * structures, so we create an array of pointers that will be used for 
iteration
@@ -237,11 +250,9 @@ extern void syscall_unregfunc(void);
static const char __tpstrtab_##name[]\
__attribute__((section("__tracepoints_strings"))) = #name;   \
struct tracepoint __tracepoint_##name\
-   __attribute__((section("__tracepoints"))) =  \
+   __attribute__((section("__tracepoints"), used)) =\
{ __tpstrtab_##name, STATIC_KEY_INIT_FALSE, reg, unreg, NULL };\
-   static struct tracepoint * const __tracepoint_ptr_##name __used  \
-   __attribute__((section("__tracepoints_ptrs"))) = \
-   &__tracepoint_##name;
+   __TRACEPOINT_ENTRY(name);
 
 #define DEFINE_TRACE(name) \
DEFINE_TRACE_FN(name, NULL, NULL);
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 685c50ae6300..05649fef106c 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -327,6 +327,28 @@ int tracepoint_probe_unregister(struct tracepoint *tp, 
void *probe, void *data)
 }
 EXPORT_SYMBOL_GPL(tracepoint_probe_unregister);
 
+static void for_each_tracepoint_range(struct tracepoint * const *begin,
+   struct tracepoint * const *end,
+   void (*fct)(struct tracepoint *tp, void *priv),
+   void *priv)
+{
+   if (!begin)
+   return;
+
+   if (IS_ENABLED(CONFIG_HAVE_ARCH_PREL32_RELOCATIONS)) {
+   const int *iter;
+
+   for (iter = (const int *)begin; iter < (const int *)end; iter++)
+   fct((struct tracepoint *)((unsigned long)iter + *iter),
+   priv);
+   } else {
+   struct tracepoint * const *iter;
+
+   for (iter = begin; iter < end; iter++)
+   fct(*iter, priv);
+   }
+}
+
 #ifdef CONFIG_MODULES
 bool trace_module_has_bad_taint(struct module *mod)
 {
@@ -391,15 +413,9 @@ EXPORT_SYMBOL_GPL(unregister_tracepoint_module_notifier);
  * Ensure the tracer unregistered the module's probes before the module
  * teardown is performed. Prevents leaks of probe and data pointers.
  */
-static void tp_module_going_check_quiescent(struct tracepoint * const *begin,
-   struct tracepoint * const *end)
+static void tp_module_going_check_quiescent(struct tracepoint *tp, void *priv)
 {
-   struct tracepoint * const *iter;
-
-   if (!begin)
-   return;
-   for (iter = begin; iter < end; iter++)
-   WARN_ON_ONCE((*iter)->funcs);
+   WARN_ON_ONCE(tp->funcs);
 }
 
 static int tracepoint_module_coming(struct module *mod)
@@ -450,8 +466,9 @@ static void tracepoint_module_going(struct module *mod)
 * Called the going notifier before checking for
 * quiescence.
 */
-   tp_module_going_check_quiescent(mod->tracepoints_ptrs,
-   mod->tracepoints_ptrs + mod->num_tracepoints);
+   for_each_tracepoint_range(mod->tracepoints_ptrs,
+   mod->tracepoints_ptrs + mod->num_tracepoints,
+   tp_module_going_check_quiescent, NULL);
break;
}
}
@@ -503,1

[PATCH v6 3/8] init: allow initcall tables to be emitted using relative references

2017-12-27 Thread Ard Biesheuvel

Allow the initcall tables to be emitted using relative references that
are only half the size on 64-bit architectures and don't require fixups
at runtime on relocatable kernels.

Cc: Petr Mladek 
Cc: Sergey Senozhatsky 
Cc: Steven Rostedt 
Cc: James Morris 
Cc: "Serge E. Hallyn" 
Signed-off-by: Ard Biesheuvel 
---
 include/linux/init.h   | 44 +++-
 init/main.c| 32 +++---
 kernel/printk/printk.c |  4 +-
 security/security.c|  4 +-
 4 files changed, 53 insertions(+), 31 deletions(-)

diff --git a/include/linux/init.h b/include/linux/init.h
index ea1b31101d9e..125bbea99c6b 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -109,8 +109,24 @@
 typedef int (*initcall_t)(void);
 typedef void (*exitcall_t)(void);
 
-extern initcall_t __con_initcall_start[], __con_initcall_end[];
-extern initcall_t __security_initcall_start[], __security_initcall_end[];
+#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
+typedef signed int initcall_entry_t;
+
+static inline initcall_t initcall_from_entry(initcall_entry_t *entry)
+{
+   return (initcall_t)((unsigned long)entry + *entry);
+}
+#else
+typedef initcall_t initcall_entry_t;
+
+static inline initcall_t initcall_from_entry(initcall_entry_t *entry)
+{
+   return *entry;
+}
+#endif
+
+extern initcall_entry_t __con_initcall_start[], __con_initcall_end[];
+extern initcall_entry_t __security_initcall_start[], __security_initcall_end[];
 
 /* Used for contructor calls. */
 typedef void (*ctor_fn_t)(void);
@@ -160,9 +176,20 @@ extern bool initcall_debug;
  * as KEEP() in the linker script.
  */
 
-#define __define_initcall(fn, id) \
+#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
+#define ___define_initcall(fn, id, __sec)  \
+   __ADDRESSABLE(fn)   \
+   asm(".section   \"" #__sec ".init\", \"a\"  \n" \
+   "__initcall_" #fn #id ":\n" \
+   ".long "VMLINUX_SYMBOL_STR(fn) " - .\n" \
+   ".previous  \n");
+#else
+#define ___define_initcall(fn, id, __sec) \
static initcall_t __initcall_##fn##id __used \
-   __attribute__((__section__(".initcall" #id ".init"))) = fn;
+   __attribute__((__section__(#__sec ".init"))) = fn;
+#endif
+
+#define __define_initcall(fn, id) ___define_initcall(fn, id, .initcall##id)
 
 /*
  * Early initcalls run before initializing SMP.
@@ -201,13 +228,8 @@ extern bool initcall_debug;
 #define __exitcall(fn) \
static exitcall_t __exitcall_##fn __exit_call = fn
 
-#define console_initcall(fn)   \
-   static initcall_t __initcall_##fn   \
-   __used __section(.con_initcall.init) = fn
-
-#define security_initcall(fn)  \
-   static initcall_t __initcall_##fn   \
-   __used __section(.security_initcall.init) = fn
+#define console_initcall(fn)   ___define_initcall(fn,, .con_initcall)
+#define security_initcall(fn)  ___define_initcall(fn,, .security_initcall)
 
 struct obs_kernel_param {
const char *str;
diff --git a/init/main.c b/init/main.c
index 7b606fc48482..2cbe3c2804ab 100644
--- a/init/main.c
+++ b/init/main.c
@@ -845,18 +845,18 @@ int __init_or_module do_one_initcall(initcall_t fn)
 }
 
 
-extern initcall_t __initcall_start[];
-extern initcall_t __initcall0_start[];
-extern initcall_t __initcall1_start[];
-extern initcall_t __initcall2_start[];
-extern initcall_t __initcall3_start[];
-extern initcall_t __initcall4_start[];
-extern initcall_t __initcall5_start[];
-extern initcall_t __initcall6_start[];
-extern initcall_t __initcall7_start[];
-extern initcall_t __initcall_end[];
-
-static initcall_t *initcall_levels[] __initdata = {
+extern initcall_entry_t __initcall_start[];
+extern initcall_entry_t __initcall0_start[];
+extern initcall_entry_t __initcall1_start[];
+extern initcall_entry_t __initcall2_start[];
+extern initcall_entry_t __initcall3_start[];
+extern initcall_entry_t __initcall4_start[];
+extern initcall_entry_t __initcall5_start[];
+extern initcall_entry_t __initcall6_start[];
+extern initcall_entry_t __initcall7_start[];
+extern initcall_entry_t __initcall_end[];
+
+static initcall_entry_t *initcall_levels[] __initdata = {
__initcall0_start,
__initcall1_start,
__initcall2_start,
@@ -882,7 +882,7 @@ static char *initcall_level_names[] __initdata = {
 
 static void __init do_initcall_level(int level)
 {
-   initcall_t *fn;
+   initcall_entry_t *fn;
 
strcpy(initcall_command_line, saved_command_line);
parse_args(initcall_level_names[level],
@@ -892,7 +892,7 @@ static void __init do_initcall_level(int level)
   NULL, &repair_env_string);
 
for (fn = initcall_levels[level]; fn < initcall_levels[level+1]; fn++)
-   do_one_initcall(*fn);
+   do_one_ini

[PATCH v6 4/8] PCI: Add support for relative addressing in quirk tables

2017-12-27 Thread Ard Biesheuvel

Allow the PCI quirk tables to be emitted in a way that avoids absolute
references to the hook functions. This reduces the size of the entries,
and, more importantly, makes them invariant under runtime relocation
(e.g., for KASLR)

Acked-by: Bjorn Helgaas 
Signed-off-by: Ard Biesheuvel 
---
 drivers/pci/quirks.c | 13 ++---
 include/linux/pci.h  | 20 
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 10684b17d0bd..b6d51b4d5ce1 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3556,9 +3556,16 @@ static void pci_do_fixups(struct pci_dev *dev, struct 
pci_fixup *f,
 f->vendor == (u16) PCI_ANY_ID) &&
(f->device == dev->device ||
 f->device == (u16) PCI_ANY_ID)) {
-   calltime = fixup_debug_start(dev, f->hook);
-   f->hook(dev);
-   fixup_debug_report(dev, calltime, f->hook);
+   void (*hook)(struct pci_dev *dev);
+#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
+   hook = (void *)((unsigned long)&f->hook_offset +
+   f->hook_offset);
+#else
+   hook = f->hook;
+#endif
+   calltime = fixup_debug_start(dev, hook);
+   hook(dev);
+   fixup_debug_report(dev, calltime, hook);
}
 }
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c170c9250c8b..e8c34afb5d4a 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1792,7 +1792,11 @@ struct pci_fixup {
u16 device; /* You can use PCI_ANY_ID here of course */
u32 class;  /* You can use PCI_ANY_ID here too */
unsigned int class_shift;   /* should be 0, 8, 16 */
+#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
+   signed int hook_offset;
+#else
void (*hook)(struct pci_dev *dev);
+#endif
 };
 
 enum pci_fixup_pass {
@@ -1806,12 +1810,28 @@ enum pci_fixup_pass {
pci_fixup_suspend_late, /* pci_device_suspend_late() */
 };
 
+#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
+#define __DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class,  \
+   class_shift, hook)  \
+   __ADDRESSABLE(hook) \
+   asm(".section " #sec ", \"a\"   \n" \
+   ".balign16  \n" \
+   ".short "   #vendor ", " #device "  \n" \
+   ".long "#class ", " #class_shift "  \n" \
+   ".long "VMLINUX_SYMBOL_STR(hook) " - .  \n" \
+   ".previous  \n");
+#define DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class,\
+ class_shift, hook)\
+   __DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class,   \
+ class_shift, hook)
+#else
 /* Anonymous variables would be nice... */
 #define DECLARE_PCI_FIXUP_SECTION(section, name, vendor, device, class,
\
  class_shift, hook)\
static const struct pci_fixup __PASTE(__pci_fixup_##name,__LINE__) 
__used   \
__attribute__((__section__(#section), aligned((sizeof(void *)\
= { vendor, device, class, class_shift, hook };
+#endif
 
 #define DECLARE_PCI_FIXUP_CLASS_EARLY(vendor, device, class,   \
 class_shift, hook) \
-- 
2.11.0

[PATCH v6 8/8] x86/kernel: jump_table: use relative references

2017-12-27 Thread Ard Biesheuvel

Similar to the arm64 case, 64-bit x86 can benefit from using 32-bit
relative references rather than 64-bit absolute ones when emitting
struct jump_entry instances. Not only does this reduce the memory
footprint of the entries themselves by 50%, it also removes the need
for carrying relocation metadata on relocatable builds (i.e., for KASLR)
which saves a fair chunk of .init space as well (although the savings
are not as dramatic as on arm64)

Signed-off-by: Ard Biesheuvel 
---
 arch/x86/include/asm/jump_label.h | 35 +++-
 arch/x86/kernel/jump_label.c  | 59 ++--
 tools/objtool/special.c   |  4 +-
 3 files changed, 65 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/jump_label.h 
b/arch/x86/include/asm/jump_label.h
index 009ff2699d07..91c01af96907 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -36,8 +36,8 @@ static __always_inline bool arch_static_branch(struct 
static_key *key, bool bran
asm_volatile_goto("1:"
".byte " __stringify(STATIC_KEY_INIT_NOP) "\n\t"
".pushsection __jump_table,  \"aw\" \n\t"
-   _ASM_ALIGN "\n\t"
-   _ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t"
+   ".balign 4\n\t"
+   ".long 1b - ., %l[l_yes] - ., %c0 + %c1 - .\n\t"
".popsection \n\t"
: :  "i" (key), "i" (branch) : : l_yes);
 
@@ -52,8 +52,8 @@ static __always_inline bool arch_static_branch_jump(struct 
static_key *key, bool
".byte 0xe9\n\t .long %l[l_yes] - 2f\n\t"
"2:\n\t"
".pushsection __jump_table,  \"aw\" \n\t"
-   _ASM_ALIGN "\n\t"
-   _ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t"
+   ".balign 4\n\t"
+   ".long 1b - ., %l[l_yes] - ., %c0 + %c1 - .\n\t"
".popsection \n\t"
: :  "i" (key), "i" (branch) : : l_yes);
 
@@ -69,19 +69,26 @@ typedef u32 jump_label_t;
 #endif
 
 struct jump_entry {
-   jump_label_t code;
-   jump_label_t target;
-   jump_label_t key;
+   s32 code;
+   s32 target;
+   s32 key;
 };
 
 static inline jump_label_t jump_entry_code(const struct jump_entry *entry)
 {
-   return entry->code;
+   return (jump_label_t)&entry->code + entry->code;
+}
+
+static inline jump_label_t jump_entry_target(const struct jump_entry *entry)
+{
+   return (jump_label_t)&entry->target + entry->target;
 }
 
 static inline struct static_key *jump_entry_key(const struct jump_entry *entry)
 {
-   return (struct static_key *)((unsigned long)entry->key & ~1UL);
+   unsigned long key = (unsigned long)&entry->key + entry->key;
+
+   return (struct static_key *)(key & ~1UL);
 }
 
 static inline bool jump_entry_is_branch(const struct jump_entry *entry)
@@ -99,7 +106,7 @@ static inline void jump_entry_set_module_init(struct 
jump_entry *entry)
entry->code = 0;
 }
 
-#define jump_label_swapNULL
+void jump_label_swap(void *a, void *b, int size);
 
 #else  /* __ASSEMBLY__ */
 
@@ -114,8 +121,8 @@ static inline void jump_entry_set_module_init(struct 
jump_entry *entry)
.byte   STATIC_KEY_INIT_NOP
.endif
.pushsection __jump_table, "aw"
-   _ASM_ALIGN
-   _ASM_PTR.Lstatic_jump_\@, \target, \key
+   .balign 4
+   .long   .Lstatic_jump_\@ - ., \target - ., \key - .
.popsection
 .endm
 
@@ -130,8 +137,8 @@ static inline void jump_entry_set_module_init(struct 
jump_entry *entry)
 .Lstatic_jump_after_\@:
.endif
.pushsection __jump_table, "aw"
-   _ASM_ALIGN
-   _ASM_PTR.Lstatic_jump_\@, \target, \key + 1
+   .balign 4
+   .long   .Lstatic_jump_\@ - ., \target - ., \key - . + 1
.popsection
 .endm
 
diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index e56c95be2808..cc5034b42335 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -52,22 +52,24 @@ static void __jump_label_transform(struct jump_entry *entry,
 * Jump label is enabled for the first time.
 * So we expect a default_nop...
 */
-   if (unlikely(memcmp((void *)entry->code, default_nop, 5)
-!= 0))
-   bug_at((void *)entry->code, __LINE__);
+   if (unlikely(memcmp((void *)jump_entry_code(entry),
+   default_nop, 5) != 0))
+   bug_at((void *)jump_entry_code(entry),
+  __LINE__);
} else {
/*
 * ...otherwise expect an ideal_nop. Otherwise
 * something went horribly wrong.
 */
-   if (unlikely(memcmp((voi

[PATCH v6 7/8] arm64/kernel: jump_label: use relative references

2017-12-27 Thread Ard Biesheuvel

On a randomly chosen distro kernel build for arm64, vmlinux.o shows the
following sections, containing jump label entries, and the associated
RELA relocation records, respectively:

  ...
  [38088] __jump_table  PROGBITS   00e19f30
   0002ea10    WA   0 0 8
  [38089] .rela__jump_table RELA   01fd8bb0
   0008be30  0018   I  38178   38088 8
  ...

In other words, we have 190 KB worth of 'struct jump_entry' instances,
and 573 KB worth of RELA entries to relocate each entry's code, target
and key members. This means the RELA section occupies 10% of the .init
segment, and the two sections combined represent 5% of vmlinux's entire
memory footprint.

So let's switch from 64-bit absolute references to 32-bit relative
references: this reduces the size of the __jump_table by 50%, and gets
rid of the RELA section entirely.

Note that this requires some extra care in the sorting routine, given
that the offsets change when entries are moved around in the jump_entry
table.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/include/asm/jump_label.h | 27 
 arch/arm64/kernel/jump_label.c  | 22 +---
 2 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/jump_label.h 
b/arch/arm64/include/asm/jump_label.h
index 9d6e46355c89..5cec68616125 100644
--- a/arch/arm64/include/asm/jump_label.h
+++ b/arch/arm64/include/asm/jump_label.h
@@ -30,8 +30,8 @@ static __always_inline bool arch_static_branch(struct 
static_key *key, bool bran
 {
asm goto("1: nop\n\t"
 ".pushsection __jump_table,  \"aw\"\n\t"
-".align 3\n\t"
-".quad 1b, %l[l_yes], %c0\n\t"
+".align 2\n\t"
+".long 1b - ., %l[l_yes] - ., %c0 - .\n\t"
 ".popsection\n\t"
 :  :  "i"(&((char *)key)[branch]) :  : l_yes);
 
@@ -44,8 +44,8 @@ static __always_inline bool arch_static_branch_jump(struct 
static_key *key, bool
 {
asm goto("1: b %l[l_yes]\n\t"
 ".pushsection __jump_table,  \"aw\"\n\t"
-".align 3\n\t"
-".quad 1b, %l[l_yes], %c0\n\t"
+".align 2\n\t"
+".long 1b - ., %l[l_yes] - ., %c0 - .\n\t"
 ".popsection\n\t"
 :  :  "i"(&((char *)key)[branch]) :  : l_yes);
 
@@ -57,19 +57,26 @@ static __always_inline bool arch_static_branch_jump(struct 
static_key *key, bool
 typedef u64 jump_label_t;
 
 struct jump_entry {
-   jump_label_t code;
-   jump_label_t target;
-   jump_label_t key;
+   s32 code;
+   s32 target;
+   s32 key;
 };
 
 static inline jump_label_t jump_entry_code(const struct jump_entry *entry)
 {
-   return entry->code;
+   return (jump_label_t)&entry->code + entry->code;
+}
+
+static inline jump_label_t jump_entry_target(const struct jump_entry *entry)
+{
+   return (jump_label_t)&entry->target + entry->target;
 }
 
 static inline struct static_key *jump_entry_key(const struct jump_entry *entry)
 {
-   return (struct static_key *)((unsigned long)entry->key & ~1UL);
+   unsigned long key = (unsigned long)&entry->key + entry->key;
+
+   return (struct static_key *)(key & ~1UL);
 }
 
 static inline bool jump_entry_is_branch(const struct jump_entry *entry)
@@ -87,7 +94,7 @@ static inline void jump_entry_set_module_init(struct 
jump_entry *entry)
entry->code = 0;
 }
 
-#define jump_label_swapNULL
+void jump_label_swap(void *a, void *b, int size);
 
 #endif  /* __ASSEMBLY__ */
 #endif /* __ASM_JUMP_LABEL_H */
diff --git a/arch/arm64/kernel/jump_label.c b/arch/arm64/kernel/jump_label.c
index c2dd1ad3e648..2b8e459e91f7 100644
--- a/arch/arm64/kernel/jump_label.c
+++ b/arch/arm64/kernel/jump_label.c
@@ -25,12 +25,12 @@
 void arch_jump_label_transform(struct jump_entry *entry,
   enum jump_label_type type)
 {
-   void *addr = (void *)entry->code;
+   void *addr = (void *)jump_entry_code(entry);
u32 insn;
 
if (type == JUMP_LABEL_JMP) {
-   insn = aarch64_insn_gen_branch_imm(entry->code,
-  entry->target,
+   insn = aarch64_insn_gen_branch_imm(jump_entry_code(entry),
+  jump_entry_target(entry),
   AARCH64_INSN_BRANCH_NOLINK);
} else {
insn = aarch64_insn_gen_nop();
@@ -50,4 +50,20 @@ void arch_jump_label_transform_static(struct jump_entry 
*entry,
 */
 }
 
+void jump_label_swap(void *a, void *b, int size)
+{
+   long delta = (unsigned long)a - (unsigned long)b;
+   struct jump_entry *jea = a;
+   struct jump_entry *jeb = b;
+   struct jump_entry tmp = *jea;
+
+   jea->code   = jeb->code - delta;
+   jea->t

[GIT PULL] sound fixes for 4.15-rc6

2017-12-27 Thread Takashi Iwai

Linus,

please pull sound fixes for v4.15-rc6 from:

  git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git 
tags/sound-4.15-rc6

The topmost commit is 44be77c590f381bc629815ac789b8b15ecc4ddcf



sound fixes for 4.15-rc6

It seems that Santa overslept with a bunch of gifts; the majority of
changes here are various device-specific ASoC fixes, most notably the
revert of rcar IOMMU support and fsl_ssi AC97 fixes, but also lots of
small fixes for codecs.  Besides that, the usual HD-audio quirks and
fixes are included, too.



Abhijeet Kumar (1):
  ASoC: nau8825: fix issue that pop noise when start capture

Adam Thomson (2):
  ASoC: da7219: Correct IRQ level in DT binding example
  ASoC: da7218: Correct IRQ level in DT binding example

Alexandre Belloni (1):
  ASoC: atmel-classd: select correct Kconfig symbol

Andrew F. Davis (1):
  ASoC: tlv320aic31xx: Fix GPIO1 register definition

Bard Liao (1):
  ASoC: rt5645: reset RT5645_AD_DA_MIXER at probe

Ben Hutchings (1):
  ASoC: wm_adsp: Fix validation of firmware and coeff lengths

Brian Norris (1):
  ASoC: rt5514-spi: only enable wakeup when fully initialized

Guenter Roeck (1):
  ASoC: amd: Add error checking to probe function

Guneshwor Singh (1):
  ASoC: Intel: Skylake: Do not check dev_type for dmic link type

Hui Wang (3):
  ALSA: hda - Add MIC_NO_PRESENCE fixup for 2 HP machines
  ALSA: hda - fix headset mic detection issue on a Dell machine
  ALSA: hda - change the location for one mic on a Lenovo machine

Jiada Wang (2):
  ASoC: rsnd: ssiu: clear SSI_MODE for non TDM Extended modes
  ASoC: rsnd: ssi: fix race condition in rsnd_ssi_pointer_update

Johan Hovold (2):
  ASoC: da7218: fix fix child-node lookup
  ASoC: twl4030: fix child-node lookup

Kuninori Morimoto (2):
  ASoC: rcar: revert IOMMU support so far
  ASoC: rsnd: fixup ADG register mask

Maciej S. Szmigiero (2):
  ASoC: fsl_ssi: AC'97 ops need regmap, clock and cleaning up on failure
  ASoC: fsl_ssi: serialize AC'97 register access operations

Naveen Manohar (2):
  ASoC: Intel: kbl: Modify map for Headset Playback to fix pop-noise
  ASoC: Intel: Change kern log level to avoid unwanted messages

Nicolin Chen (1):
  ASoC: fsl_asrc: Fix typo in a field define

Srinivas Kandagatla (1):
  ASoC: codecs: msm8916-wcd: Fix supported formats

Stefan Potyra (1):
  ASoC: rockchip: disable clock on error

Takashi Iwai (2):
  ALSA: hda: Drop useless WARN_ON()
  ALSA: hda - Fix missing COEF init for ALC225/295/299

oder_ch...@realtek.com (3):
  ASoC: rt5514: Make sure the DMIC delay will be happened after normal 
SUPPLY widgets power on
  ASoC: rt5514: Add the sanity check for the driver_data in the resume 
function
  ASoC: rt5663: Fix the wrong result of the first jack detection

---
 Documentation/devicetree/bindings/sound/da7218.txt |  2 +-
 Documentation/devicetree/bindings/sound/da7219.txt |  2 +-
 sound/hda/hdac_i915.c  |  2 +-
 sound/pci/hda/patch_conexant.c | 29 
 sound/pci/hda/patch_realtek.c  | 14 +++-
 sound/soc/amd/acp-pcm-dma.c|  7 ++
 sound/soc/atmel/Kconfig|  2 +-
 sound/soc/codecs/da7218.c  |  2 +-
 sound/soc/codecs/msm8916-wcd-analog.c  |  2 +-
 sound/soc/codecs/msm8916-wcd-digital.c |  4 +-
 sound/soc/codecs/nau8825.c |  1 +
 sound/soc/codecs/rt5514-spi.c  | 15 ++--
 sound/soc/codecs/rt5514.c  |  2 +-
 sound/soc/codecs/rt5645.c  |  2 +
 sound/soc/codecs/rt5663.c  |  4 +
 sound/soc/codecs/rt5663.h  |  4 +
 sound/soc/codecs/tlv320aic31xx.h   |  2 +-
 sound/soc/codecs/twl4030.c |  4 +-
 sound/soc/codecs/wm_adsp.c | 12 +--
 sound/soc/fsl/fsl_asrc.h   |  4 +-
 sound/soc/fsl/fsl_ssi.c| 44 ---
 sound/soc/intel/boards/kbl_rt5663_max98927.c   |  2 +-
 .../soc/intel/boards/kbl_rt5663_rt5514_max98927.c  |  2 +-
 sound/soc/intel/skylake/skl-nhlt.c | 15 ++--
 sound/soc/intel/skylake/skl-topology.c |  2 +-
 sound/soc/rockchip/rockchip_spdif.c| 18 +++--
 sound/soc/sh/rcar/adg.c|  6 +-
 sound/soc/sh/rcar/core.c   |  4 +-
 sound/soc/sh/rcar/dma.c| 86 ++
 sound/soc/sh/rcar/ssi.c| 16 ++--
 sound/soc/sh/rcar/ssiu.c   |  5 +-
 31 files changed, 173 insertions(+), 143 deletions(-)

diff --git a/Documentation/devicetree/bindings/sou

[PATCH v6 1/8] arch: enable relative relocations for arm64, power, x86, s390 and x86

2017-12-27 Thread Ard Biesheuvel

Before updating certain subsystems to use place relative 32-bit
relocations in special sections, to save space  and reduce the
number of absolute relocations that need to be processed at runtime
by relocatable kernels, introduce the Kconfig symbol and define it
for some architectures that should be able to support and benefit
from it.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Signed-off-by: Ard Biesheuvel 
---
 arch/Kconfig| 10 ++
 arch/arm64/Kconfig  |  1 +
 arch/arm64/kernel/vmlinux.lds.S |  2 +-
 arch/powerpc/Kconfig|  1 +
 arch/s390/Kconfig   |  1 +
 arch/x86/Kconfig|  1 +
 6 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 400b9e1b2f27..dbc036a7bd1b 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -959,4 +959,14 @@ config REFCOUNT_FULL
  against various use-after-free conditions that can be used in
  security flaw exploits.
 
+config HAVE_ARCH_PREL32_RELOCATIONS
+   bool
+   help
+ May be selected by an architecture if it supports place-relative
+ 32-bit relocations, both in the toolchain and in the module loader,
+ in which case relative references can be used in special sections
+ for PCI fixup, initcalls etc which are only half the size on 64 bit
+ architectures, and don't require runtime relocation on relocatable
+ kernels.
+
 source "kernel/gcov/Kconfig"
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c9a7e9e1414f..66c7b9ab2a3d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -89,6 +89,7 @@ config ARM64
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
+   select HAVE_ARCH_PREL32_RELOCATIONS
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 7da3e5c366a0..49ae5b43fe2b 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -156,7 +156,7 @@ SECTIONS
CON_INITCALL
SECURITY_INITCALL
INIT_RAM_FS
-   *(.init.rodata.* .init.bss) /* from the EFI stub */
+   *(.init.rodata.* .init.bss .init.discard.*) /* EFI stub */
}
.exit.data : {
ARM_EXIT_KEEP(EXIT_DATA)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index c51e6ce42e7a..e172478e2ae7 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -177,6 +177,7 @@ config PPC
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if COMPAT
+   select HAVE_ARCH_PREL32_RELOCATIONS
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_TRACEHOOK
select ARCH_HAS_STRICT_KERNEL_RWX   if ((PPC_BOOK3S_64 || PPC32) && 
!RELOCATABLE && !HIBERNATION)
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 829c67986db7..ed29d1ebecd9 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -129,6 +129,7 @@ config S390
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_JUMP_LABEL
select CPU_NO_EFFICIENT_FFS if !HAVE_MARCH_Z9_109_FEATURES
+   select HAVE_ARCH_PREL32_RELOCATIONS
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_SOFT_DIRTY
select HAVE_ARCH_TRACEHOOK
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d4fc98c50378..9f2bb853aedb 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -115,6 +115,7 @@ config X86
select HAVE_ARCH_MMAP_RND_BITS  if MMU
select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if MMU && COMPAT
select HAVE_ARCH_COMPAT_MMAP_BASES  if MMU && COMPAT
+   select HAVE_ARCH_PREL32_RELOCATIONS
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
-- 
2.11.0

[PATCH v6 2/8] module: use relative references for __ksymtab entries

2017-12-27 Thread Ard Biesheuvel

An ordinary arm64 defconfig build has ~64 KB worth of __ksymtab
entries, each consisting of two 64-bit fields containing absolute
references, to the symbol itself and to a char array containing
its name, respectively.

When we build the same configuration with KASLR enabled, we end
up with an additional ~192 KB of relocations in the .init section,
i.e., one 24 byte entry for each absolute reference, which all need
to be processed at boot time.

Given how the struct kernel_symbol that describes each entry is
completely local to module.c (except for the references emitted
by EXPORT_SYMBOL() itself), we can easily modify it to contain
two 32-bit relative references instead. This reduces the size of
the __ksymtab section by 50% for all 64-bit architectures, and
gets rid of the runtime relocations entirely for architectures
implementing KASLR, either via standard PIE linking (arm64) or
using custom host tools (x86).

Note that the binary search involving __ksymtab contents relies
on each section being sorted by symbol name. This is implemented
based on the input section names, not the names in the ksymtab
entries, so this patch does not interfere with that.

Given that the use of place-relative relocations requires support
both in the toolchain and in the module loader, we cannot enable
this feature for all architectures. So make it dependent on whether
CONFIG_HAVE_ARCH_PREL32_RELOCATIONS is defined.

Cc: Arnd Bergmann 
Cc: Andrew Morton 
Cc: Ingo Molnar 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Nicolas Pitre 
Acked-by: Jessica Yu 
Signed-off-by: Ard Biesheuvel 
---
 arch/x86/include/asm/Kbuild   |  1 +
 arch/x86/include/asm/export.h |  5 ---
 include/asm-generic/export.h  | 12 -
 include/linux/compiler.h  | 11 +
 include/linux/export.h| 46 +++-
 kernel/module.c   | 33 +++---
 6 files changed, 84 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/Kbuild b/arch/x86/include/asm/Kbuild
index 5d6a53fd7521..3e8a88dcaa1d 100644
--- a/arch/x86/include/asm/Kbuild
+++ b/arch/x86/include/asm/Kbuild
@@ -9,5 +9,6 @@ generated-y += xen-hypercalls.h
 generic-y += clkdev.h
 generic-y += dma-contiguous.h
 generic-y += early_ioremap.h
+generic-y += export.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
diff --git a/arch/x86/include/asm/export.h b/arch/x86/include/asm/export.h
deleted file mode 100644
index 2a51d66689c5..
--- a/arch/x86/include/asm/export.h
+++ /dev/null
@@ -1,5 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifdef CONFIG_64BIT
-#define KSYM_ALIGN 16
-#endif
-#include 
diff --git a/include/asm-generic/export.h b/include/asm-generic/export.h
index 719db1968d81..97ce606459ae 100644
--- a/include/asm-generic/export.h
+++ b/include/asm-generic/export.h
@@ -5,12 +5,10 @@
 #define KSYM_FUNC(x) x
 #endif
 #ifdef CONFIG_64BIT
-#define __put .quad
 #ifndef KSYM_ALIGN
 #define KSYM_ALIGN 8
 #endif
 #else
-#define __put .long
 #ifndef KSYM_ALIGN
 #define KSYM_ALIGN 4
 #endif
@@ -25,6 +23,16 @@
 #define KSYM(name) name
 #endif
 
+.macro __put, val, name
+#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
+   .long   \val - ., \name - .
+#elif defined(CONFIG_64BIT)
+   .quad   \val, \name
+#else
+   .long   \val, \name
+#endif
+.endm
+
 /*
  * note on .section use: @progbits vs %progbits nastiness doesn't matter,
  * since we immediately emit into those sections anyway.
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 52e611ab9a6c..fe752d365334 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -327,4 +327,15 @@ static __always_inline void __write_once_size(volatile 
void *p, void *res, int s
compiletime_assert(__native_word(t),\
"Need native word sized stores/loads for atomicity.")
 
+/*
+ * Force the compiler to emit 'sym' as a symbol, so that we can reference
+ * it from inline assembler. Necessary in case 'sym' could be inlined
+ * otherwise, or eliminated entirely due to lack of references that are
+ * visibile to the compiler.
+ */
+#define __ADDRESSABLE(sym) \
+   static void *__attribute__((section(".discard.text"), used))\
+   __PASTE(__discard_##sym, __LINE__)(void)\
+   { return (void *)&sym; }\
+
 #endif /* __LINUX_COMPILER_H */
diff --git a/include/linux/export.h b/include/linux/export.h
index 1a1dfdb2a5c6..5112d0c41512 100644
--- a/include/linux/export.h
+++ b/include/linux/export.h
@@ -24,12 +24,6 @@
 #define VMLINUX_SYMBOL_STR(x) __VMLINUX_SYMBOL_STR(x)
 
 #ifndef __ASSEMBLY__
-struct kernel_symbol
-{
-   unsigned long value;
-   const char *name;
-};
-
 #ifdef MODULE
 extern struct module __this_module;
 #define THIS_MODULE (&__this_module)
@@ -60,17 +54,47 @@ extern struct module __this_module;
 #define __CRC_SYMBOL(sym, sec)
 #endif
 
+#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
+#include 
+/*
+ * Emit the ksymtab entry as a pair

Re: [PATCH V8 3/3] OPP: Allow "opp-hz" and "opp-microvolt" to contain magic values

2017-12-27 Thread Viresh Kumar

On 26-12-17, 14:29, Rob Herring wrote:
> On Mon, Dec 18, 2017 at 03:51:30PM +0530, Viresh Kumar wrote:

> > +On some platforms the exact frequency or voltage may be hidden from the OS 
> > by
> > +the firmware and the "opp-hz" or the "opp-microvolt" properties may contain
> > +magic values that represent the frequency or voltage in a firmware 
> > dependent
> > +way, for example an index of an array in the firmware.
> 
> I'm still not convinced this is a good idea.

You were kind-of a few days back :)

lkml.kernel.org/r/CAL_JsqK-qtAaM_Ou5NtxcWR3F_q=8rmpjum-vqgtkhbtwe5...@mail.gmail.com

So here is the deal:

- I proposed "domain-performance-state" property for this stuff
  initially.
- But Kevin didn't like that and proposed reusing "opp-hz" and
  "opp-microvolt", which we all agreed to multiple times..
- And we are back to the same discussion now and its painful and time
  killing for all of us.

TBH, I don't have too strong preferences about any of the suggestions
you guys have and I need you guys to tell me what binding changes to
do here and I will do that.

> If you have firmware 
> partially managing things, then I think we should have platform specific 
> bindings or drivers. 

What about the initial idea then, like "performance-state" for the
power domains ? All platforms will anyway replicate that binding only.

> This is complex enough I'm not taking silence from Stephen as an okay.

Sure, but I am not sure how to make him speak :)

-- 
viresh

RE: [PATCH 2/2 v4] scsi: ufs: introduce sysfs entries exposing UFS health info

2017-12-27 Thread Avri Altman



> -Original Message-
> From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi-
> ow...@vger.kernel.org] On Behalf Of Greg Kroah-Hartman
> Sent: Thursday, December 21, 2017 10:00 AM
> To: Jaegeuk Kim 
> Cc: linux-kernel@vger.kernel.org; linux-s...@vger.kernel.org; Jaegeuk Kim
> 
> Subject: Re: [PATCH 2/2 v4] scsi: ufs: introduce sysfs entries exposing UFS
> health info
> 
> On Wed, Dec 20, 2017 at 02:13:25PM -0800, Jaegeuk Kim wrote:
> > This patch adds a new sysfs group, namely health, via:
> >
> >/sys/devices/soc/X.ufshc/health/
As device health is just one piece of information out of the device management,
I think that you should address this in a more comprehensive way,
And set hooks for much more device info:
Allow access to device descriptors, attributes and flags.
The attributes and flags should be placed in separate subfolders
The LUN specific descriptors and attributes should be placed in a luns 
subfolder, and then per descriptor / attribute type
You might also would like to consider differentiating read and write - to 
control those type of accesses as well.

Cheers,
Avri

Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT

2017-12-27 Thread Brice Goglin

Le 22/12/2017 à 23:53, Dan Williams a écrit :
> On Thu, Dec 21, 2017 at 12:31 PM, Brice Goglin  wrote:
>> Le 20/12/2017 à 23:41, Ross Zwisler a écrit :
> [..]
>> Hello
>>
>> I can confirm that HPC runtimes are going to use these patches (at least
>> all runtimes that use hwloc for topology discovery, but that's the vast
>> majority of HPC anyway).
>>
>> We really didn't like KNL exposing a hacky SLIT table [1]. We had to
>> explicitly detect that specific crazy table to find out which NUMA nodes
>> were local to which cores, and to find out which NUMA nodes were
>> HBM/MCDRAM or DDR. And then we had to hide the SLIT values to the
>> application because the reported latencies didn't match reality. Quite
>> annoying.
>>
>> With Ross' patches, we can easily get what we need:
>> * which NUMA nodes are local to which CPUs? /sys/devices/system/node/
>> can only report a single local node per CPU (doesn't work for KNL and
>> upcoming architectures with HBM+DDR+...)
>> * which NUMA nodes are slow/fast (for both bandwidth and latency)
>> And we can still look at SLIT under /sys/devices/system/node if really
>> needed.
>>
>> And of course having this in sysfs is much better than parsing ACPI
>> tables that are only accessible to root :)
> On this point, it's not clear to me that we should allow these sysfs
> entries to be world readable. Given /proc/iomem now hides physical
> address information from non-root we at least need to be careful not
> to undo that with new sysfs HMAT attributes. Once you need to be root
> for this info, is parsing binary HMAT vs sysfs a blocker for the HPC
> use case?

I don't think it would be a blocker.

> Perhaps we can enlist /proc/iomem or a similar enumeration interface
> to tell userspace the NUMA node and whether the kernel thinks it has
> better or worse performance characteristics relative to base
> system-RAM, i.e. new IORES_DESC_* values. I'm worried that if we start
> publishing absolute numbers in sysfs userspace will default to looking
> for specific magic numbers in sysfs vs asking the kernel for memory
> that has performance characteristics relative to base "System RAM". In
> other words the absolute performance information that the HMAT
> publishes is useful to the kernel, but it's not clear that userspace
> needs that vs a relative indicator for making NUMA node preference
> decisions.

Some HPC users will benchmark the machine to discovery actual
performance numbers anyway.
However, most users won't do this. They will want to know relative
performance of different nodes. If you normalize HMAT values by dividing
them with system-RAM values, that's likely OK. If you just say "that
node is faster than system RAM", it's not precise enough.

Brice

[PATCH] Device tree binding for Avago APDS990X light sensor

2017-12-27 Thread Pavel Machek

From: Filip Matijević 

This prepares binding for light sensor used in Nokia N9. 

Signed-off-by: Filip Matijević 
Signed-off-by: Pavel machek 

---

Patches to convert APDS990X driver to device tree and to switch to iio
are available.

diff --git a/Documentation/devicetree/bindings/misc/avago-apds990x.txt 
b/Documentation/devicetree/bindings/misc/avago-apds990x.txt
new file mode 100644
index 000..e038146
--- /dev/null
+++ b/Documentation/devicetree/bindings/misc/avago-apds990x.txt
@@ -0,0 +1,39 @@
+Avago APDS990X driver
+
+Required properties:
+- compatible: "avago,apds990x"
+- reg: address on the I2C bus
+- interrupts: external interrupt line number
+- Vdd-supply: power supply for VDD
+- Vled-supply: power supply for LEDA
+- ga: Glass attenuation
+- cf1: Clear channel factor 1
+- irf1: IR channel factor 1
+- cf2: Clear channel factor 2
+- irf2: IR channel factor 2
+- df: Device factor
+- pdrive: IR current, one of APDS_IRLED_CURR_XXXmA values
+- ppcount: Proximity pulse count
+
+Example (Nokia N9):
+
+   als_ps@39 {
+   compatible = "avago,apds990x";
+   reg = <0x39>;
+
+   interrupt-parent = <&gpio3>;
+   interrupts = <19 10>; /* gpio_83, IRQF_TRIGGER_FALLING | 
IRQF_TRIGGER_LOW */
+
+   Vdd-supply = <&vaux1>;
+   Vled-supply = <&vbat>;
+
+   ga  = <168834>;
+   cf1 = <4096>;
+   irf1= <7824>;
+   cf2 = <877>;
+   irf2= <1575>;
+   df  = <52>;
+
+   pdrive  = <0x2>; /* APDS_IRLED_CURR_25mA */
+   ppcount = <5>;
+   };

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

[PATCH v5 0/2] fs: fat: add ioctl to modify fat filesystem partion volume label

2017-12-27 Thread ChenGuanqiao

The FAT filesystem partition volume label can be read with
FAT_IOCTL_GET_VOLUME_LABEL and written with FAT_IOCTL_SET_VOLUME_LABEL.

FAT volume label (volume name) is exactly same stored in boot sector and root
directory. Thus, the boot sector just needs to be upgrade when the label
writing.

v5:
1. find the volume label entry through the scan function.
2. the volume label only retains the d-characters (reference from Ecma-107).

v4:
1. read/write volume label from/to the location of the respective version.
2. correct volume label check reference from mkfs.fat.
3. fixed some code issue.

v3:
1. write volume label both boot sector and root directory.

v2:
1. add filesystem version check.
2. add diretory permissions check.
3. add volume label string check.
4. fixed part of return value.
5. fixed some indent issue.
6. remove sync_dirty_buffer().

ChenGuanqiao (2):
  fs: fat: Add fat filesystem partition volume label in local structure
  fs: fat: add ioctl method in fat filesystem driver

 fs/fat/dir.c  |  29 +++
 fs/fat/fat.h  |   2 +
 fs/fat/file.c | 116 ++
 fs/fat/inode.c|  15 --
 include/uapi/linux/msdos_fs.h |   2 +
 5 files changed, 161 insertions(+), 3 deletions(-)

-- 
2.11.0

[PATCH v5 2/2] fs: fat: add ioctl method in fat filesystem driver

2017-12-27 Thread ChenGuanqiao

Signed-off-by: ChenGuanqiao 
---
 fs/fat/file.c | 116 ++
 1 file changed, 116 insertions(+)

diff --git a/fs/fat/file.c b/fs/fat/file.c
index 4724cc9ad650..517941c7bce4 100644
--- a/fs/fat/file.c
+++ b/fs/fat/file.c
@@ -15,11 +15,35 @@
 #include 
 #include 
 #include 
+#include 
 #include "fat.h"
 
 static long fat_fallocate(struct file *file, int mode,
  loff_t offset, loff_t len);
 
+/* the characters in this field shall be d-characters, and unused byte shall 
be set to 0x20. */
+static int fat_format_d_characters(char *label, unsigned long len)
+{
+   int i;
+
+   for (i=0; ivol_id, user_attr);
 }
 
+static int fat_ioctl_get_volume_label(struct inode *inode,
+ u8 __user *vol_label)
+{
+   int err = 0;
+   struct fat_slot_info sinfo;
+
+   err = fat_scan_volume_label(inode, &sinfo);
+   if (err)
+   goto out;
+
+   if (copy_to_user(vol_label, sinfo.de->name, MSDOS_NAME))
+   err = -EFAULT;
+
+   brelse(sinfo.bh);
+out:
+   return err;
+}
+
+static int fat_ioctl_set_volume_label(struct file *file,
+ u8 __user *vol_label)
+{
+   int err = 0;
+   u8 label[MSDOS_NAME];
+   struct buffer_head *bh;
+   struct fat_boot_sector *b;
+   struct fat_slot_info sinfo;
+   struct inode *inode = file_inode(file);
+   struct super_block *sb = inode->i_sb;
+   struct msdos_sb_info *sbi = MSDOS_SB(sb);
+
+   if (copy_from_user(label, vol_label, sizeof(label))) {
+   err = -EFAULT;
+   goto out;
+   }
+
+   fat_format_d_characters(label, sizeof(label));
+   err = mnt_want_write_file(file);
+   if (err)
+   goto out;
+
+   /* Update sector's vol_label */
+   bh = sb_bread(sb, 0);
+   if (bh == NULL) {
+   fat_msg(sb, KERN_ERR,
+   "unable to read boot sector to write volume label");
+   err = -EIO;
+   goto out_drop_file;
+   }
+
+   b = (struct fat_boot_sector *)bh->b_data;
+
+   lock_buffer(bh);
+   if (sbi->fat_bits == 32)
+   memcpy(b->fat32.vol_label, label, sizeof(label));
+   else
+   memcpy(b->fat16.vol_label, label, sizeof(label));
+
+   mark_buffer_dirty(bh);
+   unlock_buffer(bh);
+   err = sync_dirty_buffer(bh);
+   brelse(bh);
+   if (err)
+   goto out_drop_file;
+
+   /* updates root directory's vol_label */
+   err = fat_scan_volume_label(inode, &sinfo);
+   if (err)
+   goto out_drop_file;
+
+   bh = sinfo.bh;
+   lock_buffer(bh);
+   memcpy(sinfo.de->name, label, sizeof(sinfo.de->name));
+   mark_buffer_dirty(bh);
+   unlock_buffer(bh);
+   err = sync_dirty_buffer(bh);
+   brelse(bh);
+   if (err)
+   goto out_drop_file;
+
+   memcpy(sbi->vol_label, label, sizeof(sbi->vol_label));
+
+out_drop_file:
+   mnt_drop_write_file(file);
+ out:
+   return err;
+}
+
 long fat_generic_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 {
struct inode *inode = file_inode(filp);
u32 __user *user_attr = (u32 __user *)arg;
+   u8 __user *user_vol_label = (u8 __user *)arg;
 
switch (cmd) {
case FAT_IOCTL_GET_ATTRIBUTES:
@@ -133,6 +245,10 @@ long fat_generic_ioctl(struct file *filp, unsigned int 
cmd, unsigned long arg)
return fat_ioctl_set_attributes(filp, user_attr);
case FAT_IOCTL_GET_VOLUME_ID:
return fat_ioctl_get_volume_id(inode, user_attr);
+   case FAT_IOCTL_GET_VOLUME_LABEL:
+   return fat_ioctl_get_volume_label(inode, user_vol_label);
+   case FAT_IOCTL_SET_VOLUME_LABEL:
+   return fat_ioctl_set_volume_label(filp, user_vol_label);
default:
return -ENOTTY; /* Inappropriate ioctl for device */
}
-- 
2.11.0

[PATCH v5 1/2] fs: fat: Add fat filesystem partition volume label in local structure

2017-12-27 Thread ChenGuanqiao

1. Read volume label whe the fat system driver load.
2. Add interface to scan volume label entry.

Signed-off-by: ChenGuanqiao 
---
 fs/fat/dir.c  | 29 +
 fs/fat/fat.h  |  2 ++
 fs/fat/inode.c| 15 ---
 include/uapi/linux/msdos_fs.h |  2 ++
 4 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/fs/fat/dir.c b/fs/fat/dir.c
index 81cecbe6d7cf..b369953979e3 100644
--- a/fs/fat/dir.c
+++ b/fs/fat/dir.c
@@ -881,6 +881,35 @@ static int fat_get_short_entry(struct inode *dir, loff_t 
*pos,
return -ENOENT;
 }
 
+static int fat_get_volume_label_entry(struct inode *dir, loff_t *pos,
+  struct buffer_head **bh,
+  struct msdos_dir_entry **de)
+{
+   while (fat_get_entry(dir, pos, bh, de) >= 0)
+   if (((*de)->attr & ATTR_VOLUME) && (*de)->attr != ATTR_EXT)
+   return 0;
+   return -ENOENT;
+}
+
+int fat_scan_volume_label(struct inode *dir, struct fat_slot_info *sinfo)
+{
+   struct super_block *sb = dir->i_sb;
+
+   sinfo->slot_off = 0;
+   sinfo->bh = NULL;
+   while (fat_get_volume_label_entry(dir, &sinfo->slot_off,
+ &sinfo->bh, &sinfo->de) >= 0) {
+   sinfo->slot_off -= sizeof(*sinfo->de);
+   sinfo->nr_slots = 1;
+   sinfo->i_pos = fat_make_i_pos(sb, sinfo->bh, sinfo->de);
+
+   return 0;
+   }
+
+   return -ENOENT;
+}
+EXPORT_SYMBOL_GPL(fat_scan_volume_label);
+
 /*
  * The ".." entry can not provide the "struct fat_slot_info" information
  * for inode, nor a usable i_pos. So, this function provides some information
diff --git a/fs/fat/fat.h b/fs/fat/fat.h
index 051dac1ce3be..9e8d525d52c9 100644
--- a/fs/fat/fat.h
+++ b/fs/fat/fat.h
@@ -85,6 +85,7 @@ struct msdos_sb_info {
int dir_per_block;/* dir entries per block */
int dir_per_block_bits;   /* log2(dir_per_block) */
unsigned int vol_id;/*volume ID*/
+   char vol_label[11]; /*volume label*/
 
int fatent_shift;
const struct fatent_operations *fatent_ops;
@@ -299,6 +300,7 @@ extern int fat_dir_empty(struct inode *dir);
 extern int fat_subdirs(struct inode *dir);
 extern int fat_scan(struct inode *dir, const unsigned char *name,
struct fat_slot_info *sinfo);
+extern int fat_scan_volume_label(struct inode *dir, struct fat_slot_info 
*sinfo);
 extern int fat_scan_logstart(struct inode *dir, int i_logstart,
 struct fat_slot_info *sinfo);
 extern int fat_get_dotdot_entry(struct inode *dir, struct buffer_head **bh,
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 30c52394a7ad..e73379a41d49 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -45,12 +45,14 @@ struct fat_bios_param_block {
 
u8  fat16_state;
u32 fat16_vol_id;
+   u8  fat16_vol_label[11];
 
u32 fat32_length;
u32 fat32_root_cluster;
u16 fat32_info_sector;
u8  fat32_state;
u32 fat32_vol_id;
+   u8  fat32_vol_label[11];
 };
 
 static int fat_default_codepage = CONFIG_FAT_DEFAULT_CODEPAGE;
@@ -1460,12 +1462,16 @@ static int fat_read_bpb(struct super_block *sb, struct 
fat_boot_sector *b,
 
bpb->fat16_state = b->fat16.state;
bpb->fat16_vol_id = get_unaligned_le32(b->fat16.vol_id);
+   memcpy(bpb->fat16_vol_label, b->fat16.vol_label,
+  sizeof(bpb->fat16_vol_label));
 
bpb->fat32_length = le32_to_cpu(b->fat32.length);
bpb->fat32_root_cluster = le32_to_cpu(b->fat32.root_cluster);
bpb->fat32_info_sector = le16_to_cpu(b->fat32.info_sector);
bpb->fat32_state = b->fat32.state;
bpb->fat32_vol_id = get_unaligned_le32(b->fat32.vol_id);
+   memcpy(bpb->fat32_vol_label, b->fat32.vol_label,
+  sizeof(bpb->fat32_vol_label));
 
/* Validate this looks like a FAT filesystem BPB */
if (!bpb->fat_reserved) {
@@ -1723,11 +1729,14 @@ int fat_fill_super(struct super_block *sb, void *data, 
int silent, int isvfat,
brelse(fsinfo_bh);
}
 
-   /* interpret volume ID as a little endian 32 bit integer */
-   if (sbi->fat_bits == 32)
+   /* interpret volume ID and label as a little endian 32 bit integer */
+   if (sbi->fat_bits == 32) {
sbi->vol_id = bpb.fat32_vol_id;
-   else /* fat 16 or 12 */
+   memcpy(sbi->vol_label, bpb.fat32_vol_label, 
sizeof(sbi->vol_label));
+   } else { /* fat 16 or 12 */
sbi->vol_id = bpb.fat16_vol_id;
+   memcpy(sbi->vol_label, bpb.fat16_vol_label, 
sizeof(sbi->vol_label));
+   }
 
sbi->dir_per_block = sb->s_blocksize / sizeof(struct msdos_dir_entry);
sbi->dir_per_block_bits = ffs(sbi->dir_per_block) - 1;
diff --git a/include/uapi/linux/msd

[PATCH] bq24190: Simplify code in property_is_writeable

2017-12-27 Thread Pavel Machek

Simplify function that should be trivial.

Signed-off-by: Pavel machek 

diff --git a/drivers/power/supply/bq24190_charger.c 
b/drivers/power/supply/bq24190_charger.c
index 35ff406..4ea8f0a 100644
--- a/drivers/power/supply/bq24190_charger.c
+++ b/drivers/power/supply/bq24190_charger.c
@@ -1193,8 +1193,6 @@ static int bq24190_charger_set_property(struct 
power_supply *psy,
 static int bq24190_charger_property_is_writeable(struct power_supply *psy,
enum power_supply_property psp)
 {
-   int ret;
-
switch (psp) {
case POWER_SUPPLY_PROP_ONLINE:
case POWER_SUPPLY_PROP_TEMP_ALERT_MAX:
@@ -1202,13 +1200,10 @@ static int bq24190_charger_property_is_writeable(struct 
power_supply *psy,
case POWER_SUPPLY_PROP_CONSTANT_CHARGE_CURRENT:
case POWER_SUPPLY_PROP_CONSTANT_CHARGE_VOLTAGE:
case POWER_SUPPLY_PROP_INPUT_CURRENT_LIMIT:
-   ret = 1;
-   break;
+   return 1;
default:
-   ret = 0;
+   return 0;
}
-
-   return ret;
 }
 
 static void bq24190_input_current_limit_work(struct work_struct *work)

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

[RFC PATCH v2 0/2] Reduce IOTLB flush when pass-through dGPU devices

2017-12-27 Thread Suravee Suthikulpanit

From: Suravee Suthikulpanit 

Currently, when pass-through dGPU to a guest VM, there are thousands
of IOTLB flush commands sent from IOMMU to end-point-device. This cause
performance issue when launching new VMs, and could cause IOTLB invalidate
time-out issue on certain dGPUs.

This can be avoided by adopting the new fast IOTLB flush APIs.

Cc: Alex Williamson 
Cc: Joerg Roedel 

Changes from V1: (https://lkml.org/lkml/2017/11/17/764)

  * Rebased on top of v4.15-rc5

  * Patch 1/2: Fix iommu_tlb_range_add() size parameter to use unmapped
instead of len. (per Alex)

  * Patch 1/2: Use a list to keep track unmapped IOVAs for VFIO remote
unpinning. Although, I am still not sure if using a list is the best
way to keep track the IOVAs. (per Alex)

  * Patch 2/2: Fix logic due to missing spin unlock. (per Tom)

Suravee Suthikulpanit (2):
  vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs
  iommu/amd: Add support for fast IOTLB flushing

 drivers/iommu/amd_iommu.c   | 73 -
 drivers/iommu/amd_iommu_init.c  |  7 
 drivers/iommu/amd_iommu_types.h |  7 
 drivers/vfio/vfio_iommu_type1.c | 89 +++--
 4 files changed, 163 insertions(+), 13 deletions(-)

-- 
1.8.3.1

[RFC PATCH v2 2/2] iommu/amd: Add support for fast IOTLB flushing

2017-12-27 Thread Suravee Suthikulpanit

Implement the newly added IOTLB flushing interface for AMD IOMMU.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd_iommu.c   | 73 -
 drivers/iommu/amd_iommu_init.c  |  7 
 drivers/iommu/amd_iommu_types.h |  7 
 3 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 7d5eb00..42fe365 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -129,6 +129,12 @@ struct dma_ops_domain {
 static struct iova_domain reserved_iova_ranges;
 static struct lock_class_key reserved_rbtree_key;
 
+struct amd_iommu_flush_entries {
+   struct list_head list;
+   unsigned long iova;
+   size_t size;
+};
+
 /
  *
  * Helper functions
@@ -3043,7 +3049,6 @@ static size_t amd_iommu_unmap(struct iommu_domain *dom, 
unsigned long iova,
unmap_size = iommu_unmap_page(domain, iova, page_size);
mutex_unlock(&domain->api_lock);
 
-   domain_flush_tlb_pde(domain);
domain_flush_complete(domain);
 
return unmap_size;
@@ -3163,6 +3168,69 @@ static bool amd_iommu_is_attach_deferred(struct 
iommu_domain *domain,
return dev_data->defer_attach;
 }
 
+static void amd_iommu_flush_iotlb_all(struct iommu_domain *domain)
+{
+   struct protection_domain *dom = to_pdomain(domain);
+
+   domain_flush_tlb_pde(dom);
+}
+
+static void amd_iommu_iotlb_range_add(struct iommu_domain *domain,
+ unsigned long iova, size_t size)
+{
+   struct amd_iommu_flush_entries *entry, *p;
+   unsigned long flags;
+   bool found = false;
+
+   spin_lock_irqsave(&amd_iommu_flush_list_lock, flags);
+   list_for_each_entry(p, &amd_iommu_flush_list, list) {
+   if (iova != p->iova)
+   continue;
+
+   if (size > p->size) {
+   p->size = size;
+   pr_debug("%s: update range: iova=%#lx, size = %#lx\n",
+__func__, p->iova, p->size);
+   }
+   found = true;
+   break;
+   }
+
+   if (!found) {
+   entry = kzalloc(sizeof(struct amd_iommu_flush_entries),
+   GFP_KERNEL);
+   if (entry) {
+   pr_debug("%s: new range: iova=%lx, size=%#lx\n",
+__func__, iova, size);
+
+   entry->iova = iova;
+   entry->size = size;
+   list_add(&entry->list, &amd_iommu_flush_list);
+   }
+   }
+   spin_unlock_irqrestore(&amd_iommu_flush_list_lock, flags);
+}
+
+static void amd_iommu_iotlb_sync(struct iommu_domain *domain)
+{
+   struct protection_domain *pdom = to_pdomain(domain);
+   struct amd_iommu_flush_entries *entry, *next;
+   unsigned long flags;
+
+   /* Note:
+* Currently, IOMMU driver just flushes the whole IO/TLB for
+* a given domain. So, just remove entries from the list here.
+*/
+   spin_lock_irqsave(&amd_iommu_flush_list_lock, flags);
+   list_for_each_entry_safe(entry, next, &amd_iommu_flush_list, list) {
+   list_del(&entry->list);
+   kfree(entry);
+   }
+   spin_unlock_irqrestore(&amd_iommu_flush_list_lock, flags);
+
+   domain_flush_tlb_pde(pdom);
+}
+
 const struct iommu_ops amd_iommu_ops = {
.capable = amd_iommu_capable,
.domain_alloc = amd_iommu_domain_alloc,
@@ -3181,6 +3249,9 @@ static bool amd_iommu_is_attach_deferred(struct 
iommu_domain *domain,
.apply_resv_region = amd_iommu_apply_resv_region,
.is_attach_deferred = amd_iommu_is_attach_deferred,
.pgsize_bitmap  = AMD_IOMMU_PGSIZES,
+   .flush_iotlb_all = amd_iommu_flush_iotlb_all,
+   .iotlb_range_add = amd_iommu_iotlb_range_add,
+   .iotlb_sync = amd_iommu_iotlb_sync,
 };
 
 /*
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 6fe2d03..e8f8cee 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -185,6 +185,12 @@ struct ivmd_header {
 bool amd_iommu_force_isolation __read_mostly;
 
 /*
+ * IOTLB flush list
+ */
+LIST_HEAD(amd_iommu_flush_list);
+spinlock_t amd_iommu_flush_list_lock;
+
+/*
  * List of protection domains - used during resume
  */
 LIST_HEAD(amd_iommu_pd_list);
@@ -2490,6 +2496,7 @@ static int __init early_amd_iommu_init(void)
__set_bit(0, amd_iommu_pd_alloc_bitmap);
 
spin_lock_init(&amd_iommu_pd_lock);
+   spin_lock_init(&amd_iommu_flush_list_lock);
 
/*
 * now the data structures are allocated and basically initialized
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index f6b24c7..c3f4a7e 1006

[RFC PATCH v2 1/2] vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs

2017-12-27 Thread Suravee Suthikulpanit

VFIO IOMMU type1 currently upmaps IOVA pages synchronously, which requires
IOTLB flushing for every unmapping. This results in large IOTLB flushing
overhead when handling pass-through devices has a large number of mapped
IOVAs.

This can be avoided by using the new IOTLB flushing interface.

Cc: Alex Williamson 
Cc: Joerg Roedel 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/vfio/vfio_iommu_type1.c | 89 +++--
 1 file changed, 77 insertions(+), 12 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index e30e29a..f000844 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -102,6 +102,13 @@ struct vfio_pfn {
atomic_tref_count;
 };
 
+struct vfio_regions{
+   struct list_head list;
+   dma_addr_t iova;
+   phys_addr_t phys;
+   size_t len;
+};
+
 #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\
(!list_empty(&iommu->domain_list))
 
@@ -479,6 +486,40 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, 
dma_addr_t iova,
return unlocked;
 }
 
+/*
+ * Generally, VFIO needs to unpin remote pages after each IOTLB flush.
+ * Therefore, when using IOTLB flush sync interface, VFIO need to keep track
+ * of these regions (currently using a list).
+ *
+ * This value specifies maximum number of regions for each IOTLB flush sync.
+ */
+#define VFIO_IOMMU_TLB_SYNC_MAX512
+
+static long vfio_sync_and_unpin(struct vfio_dma *dma, struct vfio_domain 
*domain,
+   struct list_head *regions, bool do_accounting)
+{
+   long unlocked = 0;
+   struct vfio_regions *entry, *next;
+
+   iommu_tlb_sync(domain->domain);
+
+   list_for_each_entry_safe(entry, next, regions, list) {
+   unlocked += vfio_unpin_pages_remote(dma,
+   entry->iova,
+   entry->phys >> PAGE_SHIFT,
+   entry->len >> PAGE_SHIFT,
+   false);
+   list_del(&entry->list);
+   kfree(entry);
+   }
+
+   if (do_accounting) {
+   vfio_lock_acct(dma->task, -unlocked, NULL);
+   return 0;
+   }
+   return unlocked;
+}
+
 static int vfio_pin_page_external(struct vfio_dma *dma, unsigned long vaddr,
  unsigned long *pfn_base, bool do_accounting)
 {
@@ -653,7 +694,10 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
struct vfio_dma *dma,
 {
dma_addr_t iova = dma->iova, end = dma->iova + dma->size;
struct vfio_domain *domain, *d;
+   struct list_head unmapped_regions;
+   struct vfio_regions *entry;
long unlocked = 0;
+   int cnt = 0;
 
if (!dma->size)
return 0;
@@ -661,6 +705,8 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
struct vfio_dma *dma,
if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu))
return 0;
 
+   INIT_LIST_HEAD(&unmapped_regions);
+
/*
 * We use the IOMMU to track the physical addresses, otherwise we'd
 * need a much more complicated tracking system.  Unfortunately that
@@ -698,24 +744,36 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
struct vfio_dma *dma,
break;
}
 
-   unmapped = iommu_unmap(domain->domain, iova, len);
-   if (WARN_ON(!unmapped))
+   entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+   if (!entry)
break;
 
-   unlocked += vfio_unpin_pages_remote(dma, iova,
-   phys >> PAGE_SHIFT,
-   unmapped >> PAGE_SHIFT,
-   false);
+   unmapped = iommu_unmap_fast(domain->domain, iova, len);
+   if (WARN_ON(!unmapped)) {
+   kfree(entry);
+   break;
+   }
+
+   iommu_tlb_range_add(domain->domain, iova, unmapped);
+   entry->iova = iova;
+   entry->phys = phys;
+   entry->len  = unmapped;
+   list_add_tail(&entry->list, &unmapped_regions);
+   cnt++;
iova += unmapped;
 
+   if (cnt >= VFIO_IOMMU_TLB_SYNC_MAX) {
+   unlocked += vfio_sync_and_unpin(dma, domain, 
&unmapped_regions,
+   do_accounting);
+   cnt = 0;
+   }
cond_resched();
}
 
+   if (cnt)
+   unlocked += vfio_sync_and_unpin(dma, domain, &unmapped_regions,
+   do_account

[PATCH] ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE

2017-12-27 Thread Gang He

If we can't get inode lock immediately in the function
ocfs2_inode_lock_with_page() when reading a page, we should not
return directly here, since this will lead to a softlockup problem.
The method is to get a blocking lock and immediately unlock before
returning, this can avoid CPU resource waste due to lots of retries,
and benefits fairness in getting lock among multiple nodes, increase
efficiency in case modifying the same file frequently from multiple
nodes.
The softlockup problem looks like,
Kernel panic - not syncing: softlockup: hung tasks
CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
  
  dump_stack+0x5c/0x82
  panic+0xd5/0x21e
  watchdog_timer_fn+0x208/0x210
  ? watchdog_park_threads+0x70/0x70
  __hrtimer_run_queues+0xcc/0x200
  hrtimer_interrupt+0xa6/0x1f0
  smp_apic_timer_interrupt+0x34/0x50
  apic_timer_interrupt+0x96/0xa0
  
 RIP: 0010:unlock_page+0x17/0x30
 RSP: :af154080bc88 EFLAGS: 0246 ORIG_RAX: ff10
 RAX: dead0100 RBX: f21e009f5300 RCX: 0004
 RDX: dead00ff RSI: 0202 RDI: f21e009f5300
 RBP:  R08:  R09: af154080bb00
 R10: af154080bc30 R11: 0040 R12: 993749a39518
 R13:  R14: f21e009f5300 R15: f21e009f5300
  ocfs2_inode_lock_with_page+0x25/0x30 [ocfs2]
  ocfs2_readpage+0x41/0x2d0 [ocfs2]
  ? pagecache_get_page+0x30/0x200
  filemap_fault+0x12b/0x5c0
  ? recalc_sigpending+0x17/0x50
  ? __set_task_blocked+0x28/0x70
  ? __set_current_blocked+0x3d/0x60
  ocfs2_fault+0x29/0xb0 [ocfs2]
  __do_fault+0x1a/0xa0
  __handle_mm_fault+0xbe8/0x1090
  handle_mm_fault+0xaa/0x1f0
  __do_page_fault+0x235/0x4b0
  trace_do_page_fault+0x3c/0x110
  async_page_fault+0x28/0x30
 RIP: 0033:0x7fa75ded638e
 RSP: 002b:7ffd6657db18 EFLAGS: 00010287
 RAX: 55c7662fb700 RBX: 0001 RCX: 55c7662fb700
 RDX: 1770 RSI: 7fa75e909000 RDI: 55c7662fb700
 RBP: 0003 R08: 000e R09: 
 R10: 0483 R11: 7fa75ded61b0 R12: 7fa75e90a770
 R13: 000e R14: 1770 R15: 

Fixes: 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock")
Signed-off-by: Gang He 
---
 fs/ocfs2/dlmglue.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 4689940..5193218 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -2486,6 +2486,15 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
ret = ocfs2_inode_lock_full(inode, ret_bh, ex, OCFS2_LOCK_NONBLOCK);
if (ret == -EAGAIN) {
unlock_page(page);
+   /*
+* If we can't get inode lock immediately, we should not return
+* directly here, since this will lead to a softlockup problem.
+* The method is to get a blocking lock and immediately unlock
+* before returning, this can avoid CPU resource waste due to
+* lots of retries, and benefits fairness in getting lock.
+*/
+   if (ocfs2_inode_lock(inode, ret_bh, ex) == 0)
+   ocfs2_inode_unlock(inode, ex);
ret = AOP_TRUNCATED_PAGE;
}
 
-- 
1.8.5.6

Re: [v3,21/27] watchdog: replace devm_ioremap_nocache with devm_ioremap

2017-12-27 Thread Yisheng Xie

Hi Guenter,

On 2017/12/27 1:28, Guenter Roeck wrote:
> On Sat, Dec 23, 2017 at 07:01:37PM +0800, Yisheng Xie wrote:
>> > Default ioremap is ioremap_nocache, so devm_ioremap has the same
>> > function with devm_ioremap_nocache, which can just be killed to
>> > save the size of devres.o
>> > 
>> > This patch is to use use devm_ioremap instead of devm_ioremap_nocache,
>> > which should not have any function change but prepare for killing
>> > devm_ioremap_nocache.
>> > 
> I don't have issues with the patch itself - on mips, the definitions
> _are_ the same - but with the description. It is not universally correct
> that the definitions are the same. Please update and resubmit.
> 

Right, there are 4 archs have different meaning of ioremap. And presently, I
still do not know why.

For this patch itself, maybe I can update and resubmit.

Thanks and Merry Christmas.
Yisheng

> Guenter
>

Re: [PATCH] clk: mediatek: adjust dependency of reset.c to avoid unexpectedly being built

2017-12-27 Thread Sean Wang

On Tue, 2017-12-26 at 17:19 -0800, Stephen Boyd wrote:
> On 12/26, sean.w...@mediatek.com wrote:
> > From: Sean Wang 
> > 
> > commit 74cb0d6dde8 ("clk: mediatek: fixup test-building of MediaTek clock
> > drivers") can let the build system looking into the directory where the
> > clock drivers resides and then allow test-building the drivers.
> > 
> > But the change also gives rise to certain incorrect behavior which is
> > reset.c being built even not depending on either COMPILE_TEST or
> > ARCH_MEDIATEK alternative dependency. To get rid of reset.c being built
> > unexpectedly on the other platforms, it would be a good change that the
> > file should be built depending on its own specific configuration rather
> > than just on generic RESET_CONTROLLER one.
> > 
> > Signed-off-by: Sean Wang 
> > Cc: Jean Delvare 
> 
> I've typically seen vendor Kconfigs select the RESET_CONTROLLER
> framework if the vendor Kconfig is enabled. Any reason that same
> method isn't followed here?
> 

I just thought explicit dependency added in Kconfig seems a little good
no matter how the vendor Kconfig forces to select.

But, I believe reset controller is always present on every mediatek SoC,
at least it can be found on infracfg and pericfg subsystem, which is
really fundamental hardware block. So, it's still quite reasonable to
add "RESET_CONTROLLER" to vendor Kconfig.

Once we did it in vendor Kconfig, the Kconfig maybe could become
something like that.

config RESET_MEDIATEK
   bool "MediaTek Reset Driver"
   depends on ARCH_MEDIATEK || (RESET_CONTROLLER && COMPILE_TEST)
   help
 This enables the reset controller driver used on MediaTek SoCs.

where COMPILE_TEST still has to depend on RESET_CONTROLLER to avoid any
compiling error.

I'll make the next version based on above and relevant vendor Kconfig
changes

Sean

[PATCH] PCI: exynos: remove the deprecated phy codes

2017-12-27 Thread Jaehoon Chung

pci-exynos had updated to use the PHY framework.
(drivers/phy/samsung/phy-exynos-pcie.c)
Removed the depreccated codes relevant to phy in pci-exynos.c.
Instead, use the phy-exynos-pcie.c file.

Modified the binding documentation.

Signed-off-by: Jaehoon Chung 
---
 .../bindings/pci/samsung,exynos5440-pcie.txt   |  58 ++
 drivers/pci/dwc/pci-exynos.c   | 219 ++---
 2 files changed, 22 insertions(+), 255 deletions(-)

diff --git a/Documentation/devicetree/bindings/pci/samsung,exynos5440-pcie.txt 
b/Documentation/devicetree/bindings/pci/samsung,exynos5440-pcie.txt
index 34a11bfbfb60..651d957d1051 100644
--- a/Documentation/devicetree/bindings/pci/samsung,exynos5440-pcie.txt
+++ b/Documentation/devicetree/bindings/pci/samsung,exynos5440-pcie.txt
@@ -6,9 +6,6 @@ and thus inherits all the common properties defined in 
designware-pcie.txt.
 Required properties:
 - compatible: "samsung,exynos5440-pcie"
 - reg: base addresses and lengths of the PCIe controller,
-   the PHY controller, additional register for the PHY controller.
-   (Registers for the PHY controller are DEPRECATED.
-Use the PHY framework.)
 - reg-names : First name should be set to "elbi".
And use the "config" instead of getting the configuration address space
from "ranges".
@@ -23,49 +20,8 @@ For other common properties, refer to
 
 Example:
 
-SoC-specific DT Entry:
+SoC-specific DT Entry (with using PHY framework):
 
-   pcie@29 {
-   compatible = "samsung,exynos5440-pcie", "snps,dw-pcie";
-   reg = <0x29 0x1000
-   0x27 0x1000
-   0x271000 0x40>;
-   interrupts = <0 20 0>, <0 21 0>, <0 22 0>;
-   clocks = <&clock 28>, <&clock 27>;
-   clock-names = "pcie", "pcie_bus";
-   #address-cells = <3>;
-   #size-cells = <2>;
-   device_type = "pci";
-   ranges = <0x0800 0 0x4000 0x4000 0 0x1000   /* 
configuration space */
- 0x8100 0 0  0x40001000 0 0x0001   /* 
downstream I/O */
- 0x8200 0 0x40011000 0x40011000 0 0x1ffef000>; /* 
non-prefetchable memory */
-   #interrupt-cells = <1>;
-   interrupt-map-mask = <0 0 0 0>;
-   interrupt-map = <0 0 0 0 &gic GIC_SPI 21 IRQ_TYPE_LEVEL_HIGH>;
-   num-lanes = <4>;
-   };
-
-   pcie@2a {
-   compatible = "samsung,exynos5440-pcie", "snps,dw-pcie";
-   reg = <0x2a 0x1000
-   0x272000 0x1000
-   0x271040 0x40>;
-   interrupts = <0 23 0>, <0 24 0>, <0 25 0>;
-   clocks = <&clock 29>, <&clock 27>;
-   clock-names = "pcie", "pcie_bus";
-   #address-cells = <3>;
-   #size-cells = <2>;
-   device_type = "pci";
-   ranges = <0x0800 0 0x6000 0x6000 0 0x1000   /* 
configuration space */
- 0x8100 0 0  0x60001000 0 0x0001   /* 
downstream I/O */
- 0x8200 0 0x60011000 0x60011000 0 0x1ffef000>; /* 
non-prefetchable memory */
-   #interrupt-cells = <1>;
-   interrupt-map-mask = <0 0 0 0>;
-   interrupt-map = <0 0 0 0 &gic GIC_SPI 24 IRQ_TYPE_LEVEL_HIGH>;
-   num-lanes = <4>;
-   };
-
-With using PHY framework:
pcie_phy0: pcie-phy@27 {
...
reg = <0x27 0x1000>, <0x271000 0x40>;
@@ -74,13 +30,21 @@ With using PHY framework:
};
 
pcie@29 {
-   ...
+   compatible = "samsung,exynos5440-pcie", "snps,dw-pcie";
reg = <0x29 0x1000>, <0x4000 0x1000>;
reg-names = "elbi", "config";
+   clocks = <&clock 28>, <&clock 27>;
+   clock-names = "pcie", "pcie_bus";
+   #address-cells = <3>;
+   #size-cells = <2>;
+   device_type = "pci";
phys = <&pcie_phy0>;
ranges = <0x8100 0 0  0x60001000 0 0x0001
  0x8200 0 0x60011000 0x60011000 0 0x1ffef000>;
-   ...
+   #interrupt-cells = <1>;
+   interrupt-map-mask = <0 0 0 0>;
+   interrupt-map = <0 0 0 0 &gic GIC_SPI 21 IRQ_TYPE_LEVEL_HIGH>;
+   num-lanes = <4>;
};
 
 Board-specific DT Entry:
diff --git a/drivers/pci/dwc/pci-exynos.c b/drivers/pci/dwc/pci-exynos.c
index 5596fdedbb94..56f32aeebd0a 100644
--- a/drivers/pci/dwc/pci-exynos.c
+++ b/drivers/pci/dwc/pci-exynos.c
@@ -55,49 +55,8 @@
 #define PCIE_ELBI_SLV_ARMISC   0x120
 #define PCIE_ELBI_SLV_DBI_ENABLE   BIT(21)
 
-/* PCIe Purple registers */
-#define PCIE_PHY_GLOBAL_RESET  0x000
-#define PCIE_PHY_COMMON_RESET  0x004
-#define PCIE_PH

Re: [PATCH 4/4] KVM: nVMX: initialize more non-shadowed fields in prepare_vmcs02_full

2017-12-27 Thread Paolo Bonzini

On 25/12/2017 04:09, Wanpeng Li wrote:
> 2017-12-21 20:43 GMT+08:00 Paolo Bonzini :
>> These fields are also simple copies of the data in the vmcs12 struct.
>> For some of them, prepare_vmcs02 was skipping the copy when the field
>> was unused.  In prepare_vmcs02_full, we copy them always as long as the
>> field exists on the host, because the corresponding execution control
>> might be one of the shadowed fields.
> 
> Why we don't need to copy them always before the patchset?

Before these patches, we only copy them if the corresponding processor
control is enabled.  For example, we only copy the EOI exit bitmap if
APICv is enabled by L1.  Here we could have

   write to EOI exit bitmap
   vmlaunch (calls prepare_vmcs02_full)
   enable APICv (but EOI exit bitmap fields are clean)
   vmresume (doesn't call prepare_vmcs02_full)

The vmresume doesn't call prepare_vmcs02_full, so the EOI exit bitmap
must be copied every time prepare_vmcs02_full runs.

Paolo

Re: [next] ath10k: wmi: remove redundant integer fc

2017-12-27 Thread Kalle Valo

Colin Ian King  wrote:

> Variable fc is being assigned but never used, so remove it. Cleans
> up the clang warning:
> 
> warning: Value stored to 'fc' is never read
> 
> Signed-off-by: Colin Ian King 
> Signed-off-by: Kalle Valo 

Patch applied to ath-next branch of ath.git, thanks.

a0709dfd7ff8 ath10k: wmi: remove redundant integer fc

-- 
https://patchwork.kernel.org/patch/10119831/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Re: wil6210: fix build warnings without CONFIG_PM

2017-12-27 Thread Kalle Valo

Arnd Bergmann  wrote:

> The #ifdef checks are hard to get right, in this case some functions
> should have been left inside a CONFIG_PM_SLEEP check as seen by this
> message:
> 
> drivers/net/wireless/ath/wil6210/pcie_bus.c:489:12: error: 
> 'wil6210_pm_resume' defined but not used [-Werror=unused-function]
> drivers/net/wireless/ath/wil6210/pcie_bus.c:484:12: error: 
> 'wil6210_pm_suspend' defined but not used [-Werror=unused-function]
> 
> Using an __maybe_unused is easier here, so I'm replacing all the
> other #ifdef in this file as well for consistency.
> 
> Fixes: 94162666cd51 ("wil6210: run-time PM when interface down")
> Signed-off-by: Arnd Bergmann 
> Signed-off-by: Kalle Valo 

Patch applied to ath-next branch of ath.git, thanks.

203dab8395d9 wil6210: fix build warnings without CONFIG_PM

-- 
https://patchwork.kernel.org/patch/10119565/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Re: [PATCH v4] hwrng: exynos - add Samsung Exynos True RNG driver

2017-12-27 Thread Łukasz Stelmach

It was <2017-12-22 pią 19:30>, when Philippe Ombredanne wrote:
> On Fri, Dec 22, 2017 at 5:38 PM, Łukasz Stelmach  
> wrote:
>> It was <2017-12-22 pią 14:34>, when Philippe Ombredanne wrote:
>>> Łukasz,
>>>
>>> On Fri, Dec 22, 2017 at 2:23 PM, Łukasz Stelmach  
>>> wrote:
 Add support for True Random Number Generator found in Samsung Exynos
 5250+ SoCs.

 Signed-off-by: Łukasz Stelmach 
 Reviewed-by: Krzysztof Kozlowski 
>>>
>>> 
>>>
 --- /dev/null
 +++ b/drivers/char/hw_random/exynos-trng.c
 @@ -0,0 +1,245 @@
 +/*
 + * RNG driver for Exynos TRNGs
 + *
 + * Author: Łukasz Stelmach 
 + *
 + * Copyright 2017 (c) Samsung Electronics Software, Inc.
 + *
 + * Based on the Exynos PRNG driver drivers/crypto/exynos-rng by
 + * Krzysztof Kozłowski 
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License as published by
 + * the Free Software Foundation;
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + * GNU General Public License for more details.
 + */
>>>
>>>
>>> Would you mind using the new SPDX tags documented in Thomas patch set
>>> [1] rather than this fine but longer legalese?
>>>
>>> And if you could spread the word to others in your team this would be very 
>>> nice.
>>> See also this fine article posted by Mauro on the Samsung Open Source
>>> Group Blog [2]
>>> Thank you!
>>
>> Cool! We've been using SPDX to tag RPM packages in Tizen for three years or
>> more. ;-)
>
> Very nice! any pubic pointers?
 ^

I assume you request an URL of a publicly available web-page ;-)

https://wiki.tizen.org/Packaging/Guidelines#License_Tag

-- 
Łukasz Stelmach
Samsung R&D Institute Poland
Samsung Electronics


signature.asc
Description: PGP signature

Re: [Ocfs2-devel] [PATCH] ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE

2017-12-27 Thread piaojun

Hi Gang,

Do you mean that too many retrys in loop cast losts of CPU-time and
block page-fault interrupt? We should not add any delay in
ocfs2_fault(), right? And I still feel a little confused why your
method can solve this problem.

thanks,
Jun

On 2017/12/27 17:29, Gang He wrote:
> If we can't get inode lock immediately in the function
> ocfs2_inode_lock_with_page() when reading a page, we should not
> return directly here, since this will lead to a softlockup problem.
> The method is to get a blocking lock and immediately unlock before
> returning, this can avoid CPU resource waste due to lots of retries,
> and benefits fairness in getting lock among multiple nodes, increase
> efficiency in case modifying the same file frequently from multiple
> nodes.
> The softlockup problem looks like,
> Kernel panic - not syncing: softlockup: hung tasks
> CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> Call Trace:
>   
>   dump_stack+0x5c/0x82
>   panic+0xd5/0x21e
>   watchdog_timer_fn+0x208/0x210
>   ? watchdog_park_threads+0x70/0x70
>   __hrtimer_run_queues+0xcc/0x200
>   hrtimer_interrupt+0xa6/0x1f0
>   smp_apic_timer_interrupt+0x34/0x50
>   apic_timer_interrupt+0x96/0xa0
>   
>  RIP: 0010:unlock_page+0x17/0x30
>  RSP: :af154080bc88 EFLAGS: 0246 ORIG_RAX: ff10
>  RAX: dead0100 RBX: f21e009f5300 RCX: 0004
>  RDX: dead00ff RSI: 0202 RDI: f21e009f5300
>  RBP:  R08:  R09: af154080bb00
>  R10: af154080bc30 R11: 0040 R12: 993749a39518
>  R13:  R14: f21e009f5300 R15: f21e009f5300
>   ocfs2_inode_lock_with_page+0x25/0x30 [ocfs2]
>   ocfs2_readpage+0x41/0x2d0 [ocfs2]
>   ? pagecache_get_page+0x30/0x200
>   filemap_fault+0x12b/0x5c0
>   ? recalc_sigpending+0x17/0x50
>   ? __set_task_blocked+0x28/0x70
>   ? __set_current_blocked+0x3d/0x60
>   ocfs2_fault+0x29/0xb0 [ocfs2]
>   __do_fault+0x1a/0xa0
>   __handle_mm_fault+0xbe8/0x1090
>   handle_mm_fault+0xaa/0x1f0
>   __do_page_fault+0x235/0x4b0
>   trace_do_page_fault+0x3c/0x110
>   async_page_fault+0x28/0x30
>  RIP: 0033:0x7fa75ded638e
>  RSP: 002b:7ffd6657db18 EFLAGS: 00010287
>  RAX: 55c7662fb700 RBX: 0001 RCX: 55c7662fb700
>  RDX: 1770 RSI: 7fa75e909000 RDI: 55c7662fb700
>  RBP: 0003 R08: 000e R09: 
>  R10: 0483 R11: 7fa75ded61b0 R12: 7fa75e90a770
>  R13: 000e R14: 1770 R15: 
> 
> Fixes: 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock")
> Signed-off-by: Gang He 
> ---
>  fs/ocfs2/dlmglue.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 4689940..5193218 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -2486,6 +2486,15 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>   ret = ocfs2_inode_lock_full(inode, ret_bh, ex, OCFS2_LOCK_NONBLOCK);
>   if (ret == -EAGAIN) {
>   unlock_page(page);
> + /*
> +  * If we can't get inode lock immediately, we should not return
> +  * directly here, since this will lead to a softlockup problem.
> +  * The method is to get a blocking lock and immediately unlock
> +  * before returning, this can avoid CPU resource waste due to
> +  * lots of retries, and benefits fairness in getting lock.
> +  */
> + if (ocfs2_inode_lock(inode, ret_bh, ex) == 0)
> + ocfs2_inode_unlock(inode, ex);
>   ret = AOP_TRUNCATED_PAGE;
>   }
>  
>

Re: [PATCH] backlight: otm3225a: add support for ORISE OTM3225A LCD SoC

2017-12-27 Thread Felix Brack

Hello Jingoo,

Many thanks for taking the time to review my patch! Your suggestions
will be implemented in v2 which I will post soon.

kind regards, Felix

On 22.12.2017 18:23, Jingoo Han wrote:
> On Wednesday, December 20, 2017 12:58 PM, Felix Brack wrote:
>>
>> This patch adds a LCD driver supporting the OTM3225A LCD SoC
>> from ORISE Technology. This device can drive TFT LC panels having a
>> resolution of 240x320 pixels. After initializing the OTM3225A using
>> it's SPI interface it switches to use 16-bib RGB as external
>> display interface.
>>
>> Signed-off-by: Felix Brack 
>> ---
>>  drivers/video/backlight/Kconfig|   7 ++
>>  drivers/video/backlight/Makefile   |   1 +
>>  drivers/video/backlight/otm3225a.c | 210
>> +
>>  3 files changed, 218 insertions(+)
>>  create mode 100644 drivers/video/backlight/otm3225a.c
>>
>> diff --git a/drivers/video/backlight/Kconfig
>> b/drivers/video/backlight/Kconfig
>> index 4e1d2ad..06e187b 100644
>> --- a/drivers/video/backlight/Kconfig
>> +++ b/drivers/video/backlight/Kconfig
>> @@ -150,6 +150,13 @@ config LCD_HX8357
>>If you have a HX-8357 LCD panel, say Y to enable its LCD control
>>driver.
>>
>> +  config LCD_OTM3225A
>> +tristate "ORISE Technology OTM3225A support"
>> +depends on SPI
>> +help
>> +  If you have a panel based on the OTM3225A controller
>> +  chip then say y to include a driver for it.
>> +
>>  endif # LCD_CLASS_DEVICE
>>
>>  #
>> diff --git a/drivers/video/backlight/Makefile
>> b/drivers/video/backlight/Makefile
>> index 8905129..b177b91 100644
>> --- a/drivers/video/backlight/Makefile
>> +++ b/drivers/video/backlight/Makefile
>> @@ -17,6 +17,7 @@ obj-$(CONFIG_LCD_S6E63M0)  += s6e63m0.o
>>  obj-$(CONFIG_LCD_TDO24M)+= tdo24m.o
>>  obj-$(CONFIG_LCD_TOSA)  += tosa_lcd.o
>>  obj-$(CONFIG_LCD_VGG2432A4) += vgg2432a4.o
>> +obj-$(CONFIG_LCD_OTM3225A)  += otm3225a.o
> 
> All entries of Kconfig was alphasorted 4 years ago for reducing
> patch collisions. So please add it in alphabetical order as below.
> 
> @@ -13,6 +13,7 @@ obj-$(CONFIG_LCD_LD9040)  += ld9040.o
>  obj-$(CONFIG_LCD_LMS283GF05)   += lms283gf05.o
>  obj-$(CONFIG_LCD_LMS501KF03)   += lms501kf03.o
>  obj-$(CONFIG_LCD_LTV350QV) += ltv350qv.o
> +obj-$(CONFIG_LCD_OTM3225A) += otm3225a.o
>  obj-$(CONFIG_LCD_PLATFORM) += platform_lcd.o
> 
> 
>>
>>  obj-$(CONFIG_BACKLIGHT_88PM860X)+= 88pm860x_bl.o
>>  obj-$(CONFIG_BACKLIGHT_AAT2870) += aat2870_bl.o
>> diff --git a/drivers/video/backlight/otm3225a.c
>> b/drivers/video/backlight/otm3225a.c
>> new file mode 100644
>> index 000..0de75f8
>> --- /dev/null
>> +++ b/drivers/video/backlight/otm3225a.c
>> @@ -0,0 +1,210 @@
>> +/*
>> + * Driver for ORISE Technology OTM3225A SOC for TFT LCD
>> + *
>> + * Copyright (C) 2014-2017, EETS GmbH, Felix Brack 
> 
> Please change the year of copyright as below.
> 
> + * Copyright (C) 2017, EETS GmbH, Felix Brack 
> 
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> +
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> +
>> + * This driver implements a lcd device for the ORISE OTM3225A display
>> + * controller. The control interface to the display is SPI and the
>> display's
>> + * memory is updated over the 16-bit RGB interface.
>> + * The main source of information for writing this driver was provided by
>> the
>> + * OTM3225A datasheet from ORISE Technology. Some information arise from
>> the
>> + * ILI9328 datasheet from ILITEK as well as from the datasheets and
>> sample code
>> + * provided by Crystalfontz America Inc. who sells the CFAF240320A-032T,
>> a 3.2"
>> + * TFT LC display using the OTM3225A controller.
>> + */
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
> 
> So please add these headers in alphabetical order
> for readability.
> 
>> +
>> +#define OTM3225A_INDEX_REG  0x70
>> +#define OTM3225A_DATA_REG   0x72
>> +
>> +struct otm3225a_data {
>> +struct spi_device *spi;
>> +struct lcd_device *ld;
>> +int power;
>> +};
>> +
>> +struct otm3225a_spi_instruction {
>> +unsigned char reg;  /* register to write */
>> +unsigned short value;   /* data to write to 'reg' */
>> +unsigned short delay;   /* delay in ms after write */
>> +};
>> +
>> +static struct otm3225a_spi_instruction display_init[] = {
>> +{ 0x01, 0x, 0 }, { 0x02, 0x0700, 0 }, { 0x03, 0x50A0, 0 },
>> +{ 0x04, 0x, 0 },

Re: [PATCH v5 03/78] xarray: Add the xa_lock to the radix_tree_root

2017-12-27 Thread Kirill A. Shutemov

On Tue, Dec 26, 2017 at 07:43:40PM -0800, Matthew Wilcox wrote:
> On Tue, Dec 26, 2017 at 07:54:40PM +0300, Kirill A. Shutemov wrote:
> > On Fri, Dec 15, 2017 at 02:03:35PM -0800, Matthew Wilcox wrote:
> > > From: Matthew Wilcox 
> > > 
> > > This results in no change in structure size on 64-bit x86 as it fits in
> > > the padding between the gfp_t and the void *.
> > 
> > The patch does more than described in the subject and commit message. At 
> > first
> > I was confused why do you need to touch idr here. It took few minutes to 
> > figure
> > it out.
> > 
> > Could you please add more into commit message about lockname and xa_ locking
> > interface since you introduce it here?
> 
> Sure!  How's this?
> 
> xarray: Add the xa_lock to the radix_tree_root
> 
> This results in no change in structure size on 64-bit x86 as it fits in
> the padding between the gfp_t and the void *.
> 
> Initialising the spinlock requires a name for the benefit of lockdep,
> so RADIX_TREE_INIT() now needs to know the name of the radix tree it's
> initialising, and so do IDR_INIT() and IDA_INIT().
> 
> Also add the xa_lock() and xa_unlock() family of wrappers to make it
> easier to use the lock.  If we could rely on -fplan9-extensions in
> the compiler, we could avoid all of this syntactic sugar, but that
> wasn't added until gcc 4.6.
> 

Looks great, thanks.

-- 
 Kirill A. Shutemov

Re: You will definetely be interested...

2017-12-27 Thread Sra. Angel Rania

Hi Dear,

Reading your profile has given me courage in search of a reasponsable
and trust worthy Fellow. The past has treated me so awfully but now I
am ready to move on despite of my health condition. I will like to
have a sincere and important discussion with you that will be in your
favor likewise to you and your environment especially to your close
family. Endeavor to reply me and I have attached my picture in case
you long to know who emailed you. I will be waiting to hear from you
as soon as possble.
Thanks for paying attention to my mail and will appreciate so much if
I receive a reply from you for understable details.

Thanks,

Mrs. Rania Hassan

Re: [PATCH v5 03/78] xarray: Add the xa_lock to the radix_tree_root

2017-12-27 Thread Kirill A. Shutemov

On Tue, Dec 26, 2017 at 07:58:15PM -0800, Matthew Wilcox wrote:
> On Tue, Dec 26, 2017 at 07:43:40PM -0800, Matthew Wilcox wrote:
> > Also add the xa_lock() and xa_unlock() family of wrappers to make it
> > easier to use the lock.  If we could rely on -fplan9-extensions in
> > the compiler, we could avoid all of this syntactic sugar, but that
> > wasn't added until gcc 4.6.
> 
> Oh, in case anyone's wondering, here's how I'd do it with plan9 extensions:
> 
> struct xarray {
> spinlock_t;
> int xa_flags;
> void *xa_head;
> };
> 
> ...
> spin_lock_irqsave(&mapping->pages, flags);
> __delete_from_page_cache(page, NULL);
> spin_unlock_irqrestore(&mapping->pages, flags);
> ...
> 
> The plan9 extensions permit passing a pointer to a struct which has an
> unnamed element to a function which is expecting a pointer to the type
> of that element.  The compiler does any necessary arithmetic to produce 
> a pointer.  It's exactly as if I had written:
> 
> spin_lock_irqsave(&mapping->pages.xa_lock, flags);
> __delete_from_page_cache(page, NULL);
> spin_unlock_irqrestore(&mapping->pages.xa_lock, flags);
> 
> More details here: https://9p.io/sys/doc/compiler.html

Yeah, that's neat.

Dealing with old compilers is frustrating...

-- 
 Kirill A. Shutemov

[PATCH 1/4] PCI/AER: factor out error reporting from AER

2017-12-27 Thread Oza Pawandeep

This patch factors out error reporting callbacks, which are currently
tightly coupled with AER.
DPC should be able to call these callbacks when DPC trigger event occurs.

Signed-off-by: Oza Pawandeep 

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 6402f7f..fd053e5 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -462,7 +462,7 @@ static void ghes_do_proc(struct ghes *ghes,
 * use, so treat it as a fatal AER error.
 */
if (gdata->flags & CPER_SEC_RESET)
-   aer_severity = AER_FATAL;
+   aer_severity = PCI_ERR_AER_FATAL;
 
aer_recover_queue(pcie_err->device_id.segment,
  pcie_err->device_id.bus,
diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
index 223e4c3..d669497 100644
--- a/drivers/pci/pcie/Makefile
+++ b/drivers/pci/pcie/Makefile
@@ -6,7 +6,7 @@
 # Build PCI Express ASPM if needed
 obj-$(CONFIG_PCIEASPM) += aspm.o
 
-pcieportdrv-y  := portdrv_core.o portdrv_pci.o portdrv_bus.o
+pcieportdrv-y  := portdrv_core.o portdrv_pci.o portdrv_bus.o 
pcie-err.o
 pcieportdrv-$(CONFIG_ACPI) += portdrv_acpi.o
 
 obj-$(CONFIG_PCIEPORTBUS)  += pcieportdrv.o
diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
index 5449e5c..bc9db53 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -76,36 +76,6 @@ struct aer_rpc {
 */
 };
 
-struct aer_broadcast_data {
-   enum pci_channel_state state;
-   enum pci_ers_result result;
-};
-
-static inline pci_ers_result_t merge_result(enum pci_ers_result orig,
-   enum pci_ers_result new)
-{
-   if (new == PCI_ERS_RESULT_NO_AER_DRIVER)
-   return PCI_ERS_RESULT_NO_AER_DRIVER;
-
-   if (new == PCI_ERS_RESULT_NONE)
-   return orig;
-
-   switch (orig) {
-   case PCI_ERS_RESULT_CAN_RECOVER:
-   case PCI_ERS_RESULT_RECOVERED:
-   orig = new;
-   break;
-   case PCI_ERS_RESULT_DISCONNECT:
-   if (new == PCI_ERS_RESULT_NEED_RESET)
-   orig = PCI_ERS_RESULT_NEED_RESET;
-   break;
-   default:
-   break;
-   }
-
-   return orig;
-}
-
 extern struct bus_type pcie_port_bus_type;
 void aer_isr(struct work_struct *work);
 void aer_print_error(struct pci_dev *dev, struct aer_err_info *info);
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
b/drivers/pci/pcie/aer/aerdrv_core.c
index 7448052..758e744 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -165,7 +165,7 @@ static bool is_error_source(struct pci_dev *dev, struct 
aer_err_info *e_info)
return false;
 
/* Check if error is recorded */
-   if (e_info->severity == AER_CORRECTABLE) {
+   if (e_info->severity == PCI_ERR_AER_CORRECTABLE) {
pci_read_config_dword(dev, pos + PCI_ERR_COR_STATUS, &status);
pci_read_config_dword(dev, pos + PCI_ERR_COR_MASK, &mask);
} else {
@@ -234,189 +234,6 @@ static bool find_source_device(struct pci_dev *parent,
return true;
 }
 
-static int report_error_detected(struct pci_dev *dev, void *data)
-{
-   pci_ers_result_t vote;
-   const struct pci_error_handlers *err_handler;
-   struct aer_broadcast_data *result_data;
-   result_data = (struct aer_broadcast_data *) data;
-
-   device_lock(&dev->dev);
-   dev->error_state = result_data->state;
-
-   if (!dev->driver ||
-   !dev->driver->err_handler ||
-   !dev->driver->err_handler->error_detected) {
-   if (result_data->state == pci_channel_io_frozen &&
-   dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) {
-   /*
-* In case of fatal recovery, if one of down-
-* stream device has no driver. We might be
-* unable to recover because a later insmod
-* of a driver for this device is unaware of
-* its hw state.
-*/
-   dev_printk(KERN_DEBUG, &dev->dev, "device has %s\n",
-  dev->driver ?
-  "no AER-aware driver" : "no driver");
-   }
-
-   /*
-* If there's any device in the subtree that does not
-* have an error_detected callback, returning
-* PCI_ERS_RESULT_NO_AER_DRIVER prevents calling of
-* the subsequent mmio_enabled/slot_reset/resume
-* callbacks of "any" device in the subtree. All the
-* devices in the subtree

[PATCH 0/4] Address error and recovery for AER and DPC

2017-12-27 Thread Oza Pawandeep

This patch set brings in support for DPC and AER to co-exist and not to
race for recovery.

The current implementation of AER and error message broadcasting to the
EP driver is tightly coupled and limited to AER service driver.
It is important to factor out broadcasting and other link handling
callbacks. So that not only when AER gets triggered, but also when DPC get
triggered, or both get triggered simultaneously (for e.g. ERR_FATAL),
callbacks are handled appropriately.
having modularized the code, the race between AER and DPC is handled
gracefully.
for e.g. when DPC is active and kicked in, AER should not attempt to do
recovery, because DPC takes care of it.

DPC should enumerate the devices after recovering the link, which is
achieved by implementing error_resume callback.

Oza Pawandeep (4):
  PCI/AER: factor out error reporting from AER
  PCI/DPC/AER: Address Concurrency between AER and DPC
  PCI/ERR: Do not do recovery if DPC service is active
  PCI/DPC: Enumerate the devices after DPC trigger event

 drivers/acpi/apei/ghes.c   |   2 +-
 drivers/pci/pcie/Makefile  |   2 +-
 drivers/pci/pcie/aer/aerdrv.h  |  30 ---
 drivers/pci/pcie/aer/aerdrv_core.c | 306 +
 drivers/pci/pcie/aer/aerdrv_errprint.c |  27 ++-
 drivers/pci/pcie/pcie-dpc.c| 127 ++-
 drivers/pci/pcie/pcie-err.c| 392 +
 drivers/pci/pcie/portdrv.h |   2 +
 include/linux/aer.h|   4 -
 include/linux/pci.h|  23 ++
 10 files changed, 569 insertions(+), 346 deletions(-)
 create mode 100644 drivers/pci/pcie/pcie-err.c

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.,
a Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

[PATCH 3/4] PCI/ERR: Do not do recovery if DPC service is active

2017-12-27 Thread Oza Pawandeep

If AER attempts to do recovery for any device, and DPC is active on
any upstream port, AER should not do recovery, since it will be handled
by DPC

Change-Id: Ida507ce9145f420e35302db34e967f1b421e15c9
Signed-off-by: Oza Pawandeep 

diff --git a/drivers/pci/pcie/pcie-err.c b/drivers/pci/pcie/pcie-err.c
index 8bac584..1f01e76 100644
--- a/drivers/pci/pcie/pcie-err.c
+++ b/drivers/pci/pcie/pcie-err.c
@@ -267,6 +267,22 @@ pci_ers_result_t pci_broadcast_error_message(struct 
pci_dev *dev,
return result_data.result;
 }
 
+/*
+ * pcie_port_upstream_bridge - returns immediate upstream bridge.
+ * dev: pcie device
+ */
+static struct pci_dev *pcie_port_upstream_bridge(struct pci_dev *dev)
+{
+   struct pci_dev *parent;
+
+   parent = pci_upstream_bridge(dev);
+
+   if (parent && pci_is_pcie(parent))
+   return parent;
+
+   return NULL;
+}
+
 /**
  * pci_do_recovery - handle nonfatal/fatal error recovery process
  * @dev: pointer to a pci_dev data structure of agent detecting an error
@@ -280,9 +296,29 @@ void pci_do_recovery(struct pci_dev *dev, int severity)
 {
pci_ers_result_t status, result = PCI_ERS_RESULT_RECOVERED;
enum pci_channel_state state;
+   struct pcie_port_service_driver *driver;
+   struct pci_dev *pdev = dev;
 
mutex_lock(&pci_err_recovery_lock);
 
+   if (severity != PCI_ERR_DPC_FATAL) {
+   /*
+* DPC service could be running in RP
+* or any upstream switch.
+*/
+   do {
+   driver = pci_find_dpc_service(pdev);
+   if (driver) {
+   dev_printk(KERN_NOTICE, &dev->dev,
+   "AER: Recovery to be done by DPC %s\n",
+   pci_name(dev));
+   mutex_unlock(&pci_err_recovery_lock);
+   return;
+   }
+   pdev = pcie_port_upstream_bridge(dev);
+   } while (pdev);
+   }
+
if ((severity == PCI_ERR_AER_FATAL) ||
(severity == PCI_ERR_DPC_FATAL))
state = pci_channel_io_frozen;
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.,
a Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

Re: [PATCH v5 15/15] devicetree: bindings: Document qcom,pvs

2017-12-27 Thread Sricharan R

Hi Rob,

On 12/26/2017 11:06 PM, Rob Herring wrote:
> On Thu, Dec 21, 2017 at 5:53 AM, Sricharan R  wrote:
>> Hi Rob,
>>
>> On 12/21/2017 2:48 AM, Rob Herring wrote:
>>> On Wed, Dec 20, 2017 at 11:55:33AM +0530, Sricharan R wrote:
 Hi Viresh,

 On 12/20/2017 8:56 AM, Viresh Kumar wrote:
> On 19-12-17, 21:25, Sricharan R wrote:
>> +  cpu@0 {
>> +  compatible = "qcom,krait";
>> +  enable-method = "qcom,kpss-acc-v1";
>> +  device_type = "cpu";
>> +  reg = <0>;
>> +  qcom,acc = <&acc0>;
>> +  qcom,saw = <&saw0>;
>> +  clocks = <&kraitcc 0>;
>> +  clock-names = "cpu";
>> +  cpu-supply = <&smb208_s2a>;
>> +  operating-points-v2 = <&cpu_opp_table>;
>> +  };
>> +
>> +  qcom,pvs {
>> +  qcom,pvs-format-a;
>> +  };
>
> Not sure what Rob is going to say on that :)
>

  Yes. Would be good to know the best way.
>>>
>>> Seems like this should be a property of an efuse node either implied by
>>> the compatible or a separate property. What determines format A vs. B?
>>>
>>
>>  Yes, this efuse registers are part of the eeprom (qfprom) tied to the soc.
>>  So this property (details like bitfields and register offsets that it 
>> represents)
>>  can be put soc specific and nvmem apis can be used to read
>>  the registers. Does something like below look ok ?
>>
>>  qcom,pvs {
>> compatible = "qcom,pvs-ipq8064";
>> nvmem-cells = <&pvs_efuse>;
>>  }
> 
> Why do you need this node? It doesn't look like it corresponds to a
> h/w block. It looks like you are just creating it to instantiate a
> driver.
> 
>>  qfprom: qfprom@70 {
>> compatible  = "qcom,qfprom";
> 
> Either this or...
> 
>> reg = <0x0070 0x1000>;
>> #address-cells  = <1>;
>> #size-cells = <1>;
>> ranges;
>> pvs_efuse: pvs {
> 
> a compatible here should be specific enough so the OS can know what
> the bits are.

 Infact the above "qcom,pvs" node is required mainly to act as a consumer
 for the nvmem data provider ("qcom,qfprom") (using nvmem-cells = <&pvs_efuse>)
 Then "qfprom" can be made to contain a "format_a" or "format_b" specific cell.

 So all that is needed is, nvmem-cells = <&pvs_efuse_phandle> needs to be 
available
 somewhere. The requirement is similar what is now done by 
"operating-points-v2-ti-cpu"
 and the ti-cpufreq.c. There "operating-points-v2-ti-cpu" node, contains the 
syscon
 register to read the efuse values. Similarly does defining a new 
 "operating-points-v2-krait-cpu" which would contain the nvmem-cells property 
look ok ? 
 This would avoid defining a new qcom,pvs node.
 
cpu@0 {
compatible = "qcom,krait";
enable-method = "qcom,kpss-acc-v1";
device_type = "cpu";
reg = <0>;
qcom,acc = <&acc0>;
qcom,saw = <&saw0>;
clocks = <&kraitcc 0>;
clock-names = "cpu";
cpu-supply = <&smb208_s2a>;
operating-points-v2 = <&cpu_opp_table>;
};

cpu_opp_table: opp_table {
compatible = "operating-points-v2-krait-cpu";

nvmem-cells = <&pvs_efuse_format_a>;
/*
 * Missing opp-shared property means CPUs switch DVFS states
 * independently.
 */

opp-14 {
opp-hz = /bits/ 64 <14>;
opp-microvolt-speed0-pvs0-v0 = <125>;
opp-microvolt-speed0-pvs1-v0 = <1175000>;
opp-microvolt-speed0-pvs2-v0 = <1125000>;
opp-microvolt-speed0-pvs3-v0 = <105>;

};
...
}
 
qfprom: qfprom@70 {
compatible  = "qcom,qfprom";
reg = <0x0070 0x1000>;
#address-cells  = <1>;
#size-cells = <1>;
ranges;
pvs_efuse_format_a: pvs {
reg = <0xc0 0x8>;
};
}

Regards,
 Sricharan

-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH 4/4] PCI/DPC: Enumerate the devices after DPC trigger event

2017-12-27 Thread Oza Pawandeep

Implement error_resume callback in DPC, which, after DPC trigger event
enumerates the devices beneath.

Signed-off-by: Oza Pawandeep 

diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c
index e7ced58..78e557f 100644
--- a/drivers/pci/pcie/pcie-dpc.c
+++ b/drivers/pci/pcie/pcie-dpc.c
@@ -161,6 +161,43 @@ static void dpc_wait_link_inactive(struct dpc_dev *dpc)
dev_warn(dev, "Link state not disabled for DPC event\n");
 }
 
+static bool dpc_wait_link_active(struct pci_dev *pdev)
+{
+   unsigned long timeout = jiffies + HZ;
+   u16 lnk_status;
+   bool ret = true;
+
+   pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
+
+   while (!(lnk_status & PCI_EXP_LNKSTA_DLLLA) &&
+   !time_after(jiffies, timeout)) {
+   msleep(10);
+   pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
+   }
+
+   if (!(lnk_status & PCI_EXP_LNKSTA_DLLLA)) {
+   dev_warn(&pdev->dev, "Link state not enabled after DPC 
event\n");
+   ret = false;
+   }
+
+   return ret;
+}
+
+/**
+ * dpc_error_resume - enumerate the devices beneath
+ * @dev: pointer to Root Port's pci_dev data structure
+ *
+ * Invoked by Port Bus driver during nonfatal recovery.
+ */
+static void dpc_error_resume(struct pci_dev *pdev)
+{
+   if (dpc_wait_link_active(pdev)) {
+   pci_lock_rescan_remove();
+   pci_rescan_bus(pdev->bus);
+   pci_unlock_rescan_remove();
+   }
+}
+
 /**
  * dpc_reset_link - reset link DPC  routine
  * @dev: pointer to Root Port's pci_dev data structure
@@ -419,6 +456,7 @@ static void dpc_remove(struct pcie_device *dev)
.service= PCIE_PORT_SERVICE_DPC,
.probe  = dpc_probe,
.remove = dpc_remove,
+   .error_resume   = dpc_error_resume,
.reset_link = dpc_reset_link,
 };
 
diff --git a/drivers/pci/pcie/pcie-err.c b/drivers/pci/pcie/pcie-err.c
index 1f01e76..9c4377c 100644
--- a/drivers/pci/pcie/pcie-err.c
+++ b/drivers/pci/pcie/pcie-err.c
@@ -231,7 +231,8 @@ pci_ers_result_t pci_reset_link(struct pci_dev *dev, int 
severity)
 pci_ers_result_t pci_broadcast_error_message(struct pci_dev *dev,
enum pci_channel_state state,
char *error_mesg,
-   int (*cb)(struct pci_dev *, void *))
+   int (*cb)(struct pci_dev *, void *),
+   int severity)
 {
struct pci_err_broadcast_data result_data;
 
@@ -243,6 +244,15 @@ pci_ers_result_t pci_broadcast_error_message(struct 
pci_dev *dev,
result_data.result = PCI_ERS_RESULT_RECOVERED;
 
if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   /* If DPC is triggered, call resume error hanlder
+* because, at this point we can safely assume that
+* link recovery has happened.
+*/
+   if ((severity == PCI_ERR_DPC_FATAL) &&
+   (cb == pci_report_resume)) {
+   cb(dev, NULL);
+   return PCI_ERS_RESULT_RECOVERED;
+   }
/*
 * If the error is reported by a bridge, we think this error
 * is related to the downstream link of the bridge, so we
@@ -328,7 +338,8 @@ void pci_do_recovery(struct pci_dev *dev, int severity)
status = pci_broadcast_error_message(dev,
state,
"error_detected",
-   pci_report_error_detected);
+   pci_report_error_detected,
+   severity);
 
if ((severity == PCI_ERR_AER_FATAL) ||
(severity == PCI_ERR_DPC_FATAL)) {
@@ -337,11 +348,15 @@ void pci_do_recovery(struct pci_dev *dev, int severity)
goto failed;
}
 
+   if (severity == PCI_ERR_DPC_FATAL)
+   goto resume;
+
if (status == PCI_ERS_RESULT_CAN_RECOVER)
status = pci_broadcast_error_message(dev,
state,
"mmio_enabled",
-   pci_report_mmio_enabled);
+   pci_report_mmio_enabled,
+   severity);
 
if (status == PCI_ERS_RESULT_NEED_RESET) {
/*
@@ -352,16 +367,19 @@ void pci_do_recovery(struct pci_dev *dev, int severity)
status = pci_broadcast_error_message(dev,
state,
"slot_reset",
-   pci_report_slot_reset);
+   pci_report_slot_reset,
+   severity);
}
 
if (status != PCI_ERS_RESULT_RECOVERED)
goto failed;
 
+resume:
pci_broadcast_error_message(dev,
state,
"resume",
-   pci_repo

[PATCH 2/4] PCI/DPC/AER: Address Concurrency between AER and DPC

2017-12-27 Thread Oza Pawandeep

This patch addresses the race condition between AER and DPC for recovery.

Current DPC driver does not do recovery, e.g. calling end-point's driver's
callbacks, which sanitize the device.
DPC driver implements link_reset callback, and calls pci_do_recovery.

Signed-off-by: Oza Pawandeep 

diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c
index 2d976a6..e7ced58 100644
--- a/drivers/pci/pcie/pcie-dpc.c
+++ b/drivers/pci/pcie/pcie-dpc.c
@@ -15,6 +15,9 @@
 #include 
 #include 
 #include "../pci.h"
+#include "portdrv.h"
+
+static pci_ers_result_t dpc_reset_link(struct pci_dev *pdev);
 
 struct rp_pio_header_log_regs {
u32 dw0;
@@ -67,6 +70,60 @@ struct dpc_dev {
"Memory Request Completion Timeout", /* Bit Position 18 */
 };
 
+static int find_dpc_dev_iter(struct device *device, void *data)
+{
+   struct pcie_port_service_driver *service_driver;
+   struct device **dev;
+
+   dev = (struct device **) data;
+
+   if (device->bus == &pcie_port_bus_type && device->driver) {
+   service_driver = to_service_driver(device->driver);
+   if (service_driver->service == PCIE_PORT_SERVICE_DPC) {
+   *dev = device;
+   return 1;
+   }
+   }
+
+   return 0;
+}
+
+struct device *pci_find_dpc_dev(struct pci_dev *pdev)
+{
+   struct device *dev = NULL;
+
+   device_for_each_child(&pdev->dev, &dev, find_dpc_dev_iter);
+
+   return dev;
+}
+
+static int find_dpc_service_iter(struct device *device, void *data)
+{
+   struct pcie_port_service_driver *service_driver, **drv;
+
+   drv = (struct pcie_port_service_driver **) data;
+
+   if (device->bus == &pcie_port_bus_type && device->driver) {
+   service_driver = to_service_driver(device->driver);
+   if (service_driver->service == PCIE_PORT_SERVICE_DPC) {
+   *drv = service_driver;
+   return 1;
+   }
+   }
+
+   return 0;
+}
+
+struct pcie_port_service_driver *pci_find_dpc_service(struct pci_dev *dev)
+{
+   struct pcie_port_service_driver *drv = NULL;
+
+   device_for_each_child(&dev->dev, &drv, find_dpc_service_iter);
+
+   return drv;
+}
+EXPORT_SYMBOL(pci_find_dpc_service);
+
 static int dpc_wait_rp_inactive(struct dpc_dev *dpc)
 {
unsigned long timeout = jiffies + HZ;
@@ -104,11 +161,23 @@ static void dpc_wait_link_inactive(struct dpc_dev *dpc)
dev_warn(dev, "Link state not disabled for DPC event\n");
 }
 
-static void interrupt_event_handler(struct work_struct *work)
+/**
+ * dpc_reset_link - reset link DPC  routine
+ * @dev: pointer to Root Port's pci_dev data structure
+ *
+ * Invoked by Port Bus driver when performing link reset at Root Port.
+ */
+static pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
 {
-   struct dpc_dev *dpc = container_of(work, struct dpc_dev, work);
-   struct pci_dev *dev, *temp, *pdev = dpc->dev->port;
struct pci_bus *parent = pdev->subordinate;
+   struct pci_dev *dev, *temp;
+   struct dpc_dev *dpc;
+   struct pcie_device *pciedev;
+   struct device *devdpc;
+
+   devdpc = pci_find_dpc_dev(pdev);
+   pciedev = to_pcie_device(devdpc);
+   dpc = get_service_data(pciedev);
 
pci_lock_rescan_remove();
list_for_each_entry_safe_reverse(dev, temp, &parent->devices,
@@ -125,7 +194,7 @@ static void interrupt_event_handler(struct work_struct 
*work)
 
dpc_wait_link_inactive(dpc);
if (dpc->rp && dpc_wait_rp_inactive(dpc))
-   return;
+   return PCI_ERS_RESULT_DISCONNECT;
if (dpc->rp && dpc->rp_pio_status) {
pci_write_config_dword(pdev,
  dpc->cap_pos + PCI_EXP_DPC_RP_PIO_STATUS,
@@ -135,6 +204,17 @@ static void interrupt_event_handler(struct work_struct 
*work)
 
pci_write_config_word(pdev, dpc->cap_pos + PCI_EXP_DPC_STATUS,
PCI_EXP_DPC_STATUS_TRIGGER | PCI_EXP_DPC_STATUS_INTERRUPT);
+
+   return PCI_ERS_RESULT_RECOVERED;
+}
+
+static void interrupt_event_handler(struct work_struct *work)
+{
+   struct dpc_dev *dpc = container_of(work, struct dpc_dev, work);
+   struct pci_dev *pdev = dpc->dev->port;
+
+   /* From DPC point of view error is always FATAL. */
+   pci_do_recovery(pdev, PCI_ERR_DPC_FATAL);
 }
 
 static void dpc_rp_pio_print_tlp_header(struct device *dev,
@@ -339,6 +419,7 @@ static void dpc_remove(struct pcie_device *dev)
.service= PCIE_PORT_SERVICE_DPC,
.probe  = dpc_probe,
.remove = dpc_remove,
+   .reset_link = dpc_reset_link,
 };
 
 static int __init dpc_service_init(void)
diff --git a/drivers/pci/pcie/pcie-err.c b/drivers/pci/pcie/pcie-err.c
index d59866c..8bac584 100644
--- a/drivers/pci/pcie/pcie-err.c
+++ b/drivers/pci/pcie/pcie-err.c
@@ -176,7 +176,7 @@ static pci_ers_result_t pci_defaul

Re: [PATCH] backlight: otm3225a: add support for ORISE OTM3225A LCD SoC

2017-12-27 Thread Felix Brack

Hello Daniel,

On 22.12.2017 18:33, Daniel Thompson wrote:
> On 22/12/17 17:23, Jingoo Han wrote:>> diff --git
> a/drivers/video/backlight/otm3225a.c
>>> b/drivers/video/backlight/otm3225a.c
>>> new file mode 100644
>>> index 000..0de75f8
>>> --- /dev/null
>>> +++ b/drivers/video/backlight/otm3225a.c
>>> @@ -0,0 +1,210 @@
>>> +/*
>>> + * Driver for ORISE Technology OTM3225A SOC for TFT LCD
>>> + *
>>> + * Copyright (C) 2014-2017, EETS GmbH, Felix Brack 
>>
>> Please change the year of copyright as below. >
>> + * Copyright (C) 2017, EETS GmbH, Felix Brack 
> 
> ... and include (or just rely entirely on) a SPDX header to describe the
> licensing of the file.
>
Thanks for the hint. I have opted for this first line in the source
file: "// SPDX-License-Identifier: GPL-2.0"
> 
> Daniel.

kind regards, Felix

Re: [PATCH v5 05/78] xarray: Replace exceptional entries

2017-12-27 Thread Kirill A. Shutemov

On Tue, Dec 26, 2017 at 07:05:34PM -0800, Matthew Wilcox wrote:
> On Tue, Dec 26, 2017 at 08:15:42PM +0300, Kirill A. Shutemov wrote:
> > >  28 files changed, 249 insertions(+), 240 deletions(-)
> > 
> > Everything looks fine to me after quick scan, but hat's a lot of changes for
> > one patch...
> 
> Yeah.  It's pretty mechanical though.
> 
> > > - if (radix_tree_exceptional_entry(page)) {
> > > + if (xa_is_value(page)) {
> > >   if (!invalidate_exceptional_entry2(mapping,
> > >  index, page))
> > >   ret = -EBUSY;
> > 
> > invalidate_exceptional_entry? Are we going to leave the terminology here as 
> > is?
> 
> That is a great question.  If the page cache wants to call its value
> entries exceptional entries, it can continue to do that.  I think there's
> a better name for them, but I'm not sure what it is.  Right now, the
> page cache uses value entries to store:
> 
> 1. Shadow entries (for workingset)
> 2. Swap entries (for shmem)
> 3. DAX entries
> 
> I can't come up with a good name for these three things.  'nonpage' is
> the only thing which hasn't immediately fallen off my ideas list.

Yeah, naming problem...

> But I think renaming exceptional entries in the page cache is a great idea,
> and I don't want to do it as part of this patch set ;-)

Fair enough.

-- 
 Kirill A. Shutemov

Re: [PATCH v5 06/78] xarray: Change definition of sibling entries

2017-12-27 Thread Kirill A. Shutemov

On Tue, Dec 26, 2017 at 07:13:26PM -0800, Matthew Wilcox wrote:
> On Tue, Dec 26, 2017 at 08:21:53PM +0300, Kirill A. Shutemov wrote:
> > > +/**
> > > + * xa_is_internal() - Is the entry an internal entry?
> > > + * @entry: Entry retrieved from the XArray
> > > + *
> > > + * Return: %true if the entry is an internal entry.
> > > + */
> > 
> > What does it mean "internal entry"? Is it just a term for non-value and
> > non-data pointer entry? Do we allow anybody besides xarray implementation to
> > use internal entires?
> > 
> > Do we have it documented?
> 
> We do!  include/linux/radix-tree.h has it documented right now:

Looks good. Thanks.

-- 
 Kirill A. Shutemov

Re: [PATCH V1 3/4] usb: serial: f81534: add output pin control

2017-12-27 Thread Johan Hovold

On Thu, Dec 21, 2017 at 05:49:45PM +0800, Ji-Ze Hong (Peter Hong) wrote:
> Hi Johan,
> 
> Johan Hovold 於 2017/12/19 上午 12:06 寫道:
> > On Thu, Nov 16, 2017 at 03:46:08PM +0800, Ji-Ze Hong (Peter Hong) wrote:
> >> +static int f81534_set_port_output_pin(struct usb_serial_port *port)
> >> +{
> >> +  struct f81534_serial_private *serial_priv;
> >> +  struct f81534_port_private *port_priv;
> >> +  struct usb_serial *serial;
> >> +  const struct f81534_port_out_pin *pins;
> >> +  int status;
> >> +  int i;
> >> +  u8 value;
> >> +  u8 idx;
> >> +
> >> +  serial = port->serial;
> >> +  serial_priv = usb_get_serial_data(serial);
> >> +  port_priv = usb_get_serial_port_data(port);
> >> +
> >> +  idx = F81534_CONF_GPIO_OFFSET + port_priv->phy_num;
> >> +  value = serial_priv->conf_data[idx];
> >> +  pins = &f81534_port_out_pins[port_priv->phy_num];
> >> +
> >> +  for (i = 0; i < ARRAY_SIZE(pins->pin); ++i) {
> >> +  status = f81534_set_mask_register(serial,
> >> +  pins->pin[i].reg_addr, pins->pin[i].reg_mask,
> >> +  value & BIT(i) ? pins->pin[i].reg_mask : 0);
> >> +  if (status)
> >> +  return status;
> >> +  }
> > 
> > You're using 24 (get or set) accesses to update these three registers
> > here. Why not read them out (if necessary), determine their new values
> > and then write them back when done instead?
> > 
> 
> In this code, I'm only read/write 3 registers of 0x2ae8, 0x2a90, 0x2a80,
> but some register will read/write more than once. Should I change the
> code from port_probe() to attach() and re-write it as:
>   1: read the 3 register
>   2: change them will 12 pin desire value
>   3: write it back
> Is it ok?

Do you expect these pins to ever be changed after probe? If not, then
perhaps it can be moved to attach(), but otherwise I guess they should
be set at port_probe(). By using shadow registers, you should be able to
reduce the number of device accesses, but perhaps it's not worth the
complexity.

Do you have a rough idea about how long these register updates take? I
was just worried that these changes will add up to really long probe
times.

Thanks,
Johan

Re: [PATCH 4.14 108/159] kvm, mm: account kvm related kmem slabs to kmemcg

2017-12-27 Thread Paolo Bonzini

On 23/12/2017 10:24, Greg Kroah-Hartman wrote:
> For many subsystems, the maintainers _never_ mark patches for stable.
> Others, they catch maybe half of the things they should be applying.
> 
> KVM is one such example of the "half" group, they mark patches as
> resolving CVE issues at times, yet don't mark them for stable.  So when
> I see a patch like this, it triggers the "oh, look, KVM doing the same
> thing again", so I take the patch and of course cc: the
> developers/maintainers so they can object if they want to.

In general there are some cases where I tend to be conservative on
applying the "stable" tag, for example:

1) sometimes I'm not very familiar with API changes in the other
subsystems (this was the case for this patch).  If I am not sure of the
amount of backporting effort required, and the bug is not super
important, I don't mark it as stable because I don't want to later drop
a complex backport on the floor.  I prefer to have fewer patches
applied, but know that the fixes are backported to all branches.

2) not all bugs are equal; a WARN_ON_ONCE from a syzkaller testcase for
example doesn't really matter to a cloud provider that uses KVM, because
invalid API usage is not controlled by the customer.  But an oops or
BUG_ON probably *will* get CCed to stable.  So some patches for
syzkaller bugs may be CCed, some may not.

IIRC the CVE that you mention was a guest user->kernel escalation, but
it didn't affect Linux guests at all, and it couldn't be fixed
completely on Windows guests because Windows has another bug in the same
area.  Plus, I knew there would be different conflicts on all LTS
branches, so I decided to not mark it for stable.  I did dutifully
provide a backport when someone (either you or Ben Hutchings) asked for
one, though.

It does happen that Radim or I forget to Cc stable, so I'm okay with you
picking up more patches than what I mark and I will happily do the
backports for you.  Still, there is some thought put into whether to CC
stable or not. :)

Thanks,

Paolo

> Over time you get to know what subsystems are like this and what are
> not.  MM is one that is really good, I almost never take a mm patch
> without being told explicitly to do so.  Others are horrible and never
> mark anything, so stuff has to be picked up manually through Sasha's
> process or through other ways.
> 
> So it's not a perfect system, but it seems to work "good enough", and if
> you ever have any questions about any patch, always feel free to ask,
> there's usually a story behind almost every one...

Re: [PATCH] spi: Add a sysfs interface to instantiate devices

2017-12-27 Thread Mark Brown

On Sat, Dec 23, 2017 at 09:58:51AM +0100, Geert Uytterhoeven wrote:

> >> > > + struct spi_board_info bi = {
> >> > > + .modalias = "spidev",

> I would make it a little bit more generic and extract the modalias from the
> string written.

Right, that'd be much better.


signature.asc
Description: PGP signature

Re: [PATCH 0/3] mtd: spi-nor: fix DMA-unsafe buffer issue between MTD and SPI

2017-12-27 Thread Mark Brown

On Tue, Dec 26, 2017 at 06:45:28PM +, Trent Piepho wrote:

> Or, since this only fixes instances of DMA-unsafe buffers used in
> access to SPI NOR flash chips, and since there are other SPI master
> interface users, those chip specific fixes in some/all spi master
> drivers are still needed to fix transfers not originated via spi-nor? 

SPI client drivers are *supposed* to use DMA safe memory already.  How
often that happens in cases where it matters is a separate question, we
definitely have users with smaller transfers that don't do the right
thing but they're normally done using PIO anyway.

signature.asc
Description: PGP signature

Re: [Ocfs2-devel] [PATCH] ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE

2017-12-27 Thread Gang He

Hi Jun,


>>> 
> Hi Gang,
> 
> Do you mean that too many retrys in loop cast losts of CPU-time and
> block page-fault interrupt? We should not add any delay in
> ocfs2_fault(), right? And I still feel a little confused why your
> method can solve this problem.
You can see the related code in function filemap_fault(), if ocfs2 fails to 
read a page since 
it can not get a inode lock with non-block mode, the VFS layer code will invoke 
ocfs2
read page call back function circularly, this will lead to a softlockup problem 
(like the below back trace).
So, we should get a blocking lock to let the dlm lock to this node and also can 
avoid CPU loop,
second, base on my testing, the patch also can improve the efficiency in case 
modifying the same
file frequently from multiple nodes, since the lock acquisition chance is more 
fair.
In fact, the code was modified by a patch 1cce4df04f37 ("ocfs2: do not 
lock/unlock() inode DLM lock"),
before that patch, the code is the same, this patch can be considered to revert 
that patch, except adding more
clear comments.
 
Thanks
Gang


> 
> thanks,
> Jun
> 
> On 2017/12/27 17:29, Gang He wrote:
>> If we can't get inode lock immediately in the function
>> ocfs2_inode_lock_with_page() when reading a page, we should not
>> return directly here, since this will lead to a softlockup problem.
>> The method is to get a blocking lock and immediately unlock before
>> returning, this can avoid CPU resource waste due to lots of retries,
>> and benefits fairness in getting lock among multiple nodes, increase
>> efficiency in case modifying the same file frequently from multiple
>> nodes.
>> The softlockup problem looks like,
>> Kernel panic - not syncing: softlockup: hung tasks
>> CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1
>> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>> Call Trace:
>>   
>>   dump_stack+0x5c/0x82
>>   panic+0xd5/0x21e
>>   watchdog_timer_fn+0x208/0x210
>>   ? watchdog_park_threads+0x70/0x70
>>   __hrtimer_run_queues+0xcc/0x200
>>   hrtimer_interrupt+0xa6/0x1f0
>>   smp_apic_timer_interrupt+0x34/0x50
>>   apic_timer_interrupt+0x96/0xa0
>>   
>>  RIP: 0010:unlock_page+0x17/0x30
>>  RSP: :af154080bc88 EFLAGS: 0246 ORIG_RAX: ff10
>>  RAX: dead0100 RBX: f21e009f5300 RCX: 0004
>>  RDX: dead00ff RSI: 0202 RDI: f21e009f5300
>>  RBP:  R08:  R09: af154080bb00
>>  R10: af154080bc30 R11: 0040 R12: 993749a39518
>>  R13:  R14: f21e009f5300 R15: f21e009f5300
>>   ocfs2_inode_lock_with_page+0x25/0x30 [ocfs2]
>>   ocfs2_readpage+0x41/0x2d0 [ocfs2]
>>   ? pagecache_get_page+0x30/0x200
>>   filemap_fault+0x12b/0x5c0
>>   ? recalc_sigpending+0x17/0x50
>>   ? __set_task_blocked+0x28/0x70
>>   ? __set_current_blocked+0x3d/0x60
>>   ocfs2_fault+0x29/0xb0 [ocfs2]
>>   __do_fault+0x1a/0xa0
>>   __handle_mm_fault+0xbe8/0x1090
>>   handle_mm_fault+0xaa/0x1f0
>>   __do_page_fault+0x235/0x4b0
>>   trace_do_page_fault+0x3c/0x110
>>   async_page_fault+0x28/0x30
>>  RIP: 0033:0x7fa75ded638e
>>  RSP: 002b:7ffd6657db18 EFLAGS: 00010287
>>  RAX: 55c7662fb700 RBX: 0001 RCX: 55c7662fb700
>>  RDX: 1770 RSI: 7fa75e909000 RDI: 55c7662fb700
>>  RBP: 0003 R08: 000e R09: 
>>  R10: 0483 R11: 7fa75ded61b0 R12: 7fa75e90a770
>>  R13: 000e R14: 1770 R15: 
>> 
>> Fixes: 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock")
>> Signed-off-by: Gang He 
>> ---
>>  fs/ocfs2/dlmglue.c | 9 +
>>  1 file changed, 9 insertions(+)
>> 
>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>> index 4689940..5193218 100644
>> --- a/fs/ocfs2/dlmglue.c
>> +++ b/fs/ocfs2/dlmglue.c
>> @@ -2486,6 +2486,15 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>>  ret = ocfs2_inode_lock_full(inode, ret_bh, ex, OCFS2_LOCK_NONBLOCK);
>>  if (ret == -EAGAIN) {
>>  unlock_page(page);
>> +/*
>> + * If we can't get inode lock immediately, we should not return
>> + * directly here, since this will lead to a softlockup problem.
>> + * The method is to get a blocking lock and immediately unlock
>> + * before returning, this can avoid CPU resource waste due to
>> + * lots of retries, and benefits fairness in getting lock.
>> + */
>> +if (ocfs2_inode_lock(inode, ret_bh, ex) == 0)
>> +ocfs2_inode_unlock(inode, ex);
>>  ret = AOP_TRUNCATED_PAGE;
>>  }
>>  
>>

Applied "regmap: debugfs: document why we don't create the debugfs entries" to the regmap tree

2017-12-27 Thread Mark Brown

The patch

   regmap: debugfs: document why we don't create the debugfs entries

has been applied to the regmap tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 078711d7f88d33b0adebb402a1bcb2aa89afe68b Mon Sep 17 00:00:00 2001
From: Bartosz Golaszewski 
Date: Fri, 22 Dec 2017 18:42:08 +0100
Subject: [PATCH] regmap: debugfs: document why we don't create the debugfs
 entries

This is a follow-up to commit a5ba91c380b8 ("regmap: debugfs: emit a
debug message when locking is disabled"). I figured that a user may
see this message, grep the code, come to this place and he still won't
know why we actually disabled debugfs.

Add a comment explaining the reason.

Signed-off-by: Bartosz Golaszewski 
Signed-off-by: Mark Brown 
---
 drivers/base/regmap/regmap-debugfs.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/base/regmap/regmap-debugfs.c 
b/drivers/base/regmap/regmap-debugfs.c
index ae962b756863..f3266334063e 100644
--- a/drivers/base/regmap/regmap-debugfs.c
+++ b/drivers/base/regmap/regmap-debugfs.c
@@ -529,6 +529,13 @@ void regmap_debugfs_init(struct regmap *map, const char 
*name)
struct regmap_range_node *range_node;
const char *devname = "dummy";
 
+   /*
+* Userspace can initiate reads from the hardware over debugfs.
+* Normally internal regmap structures and buffers are protected with
+* a mutex or a spinlock, but if the regmap owner decided to disable
+* all locking mechanisms, this is no longer the case. For safety:
+* don't create the debugfs entries if locking is disabled.
+*/
if (map->debugfs_disable) {
dev_dbg(map->dev, "regmap locking disabled - not creating 
debugfs entries\n");
return;
-- 
2.15.0

Applied "regmap: Add one flag to indicate if a hwlock should be used" to the regmap tree

2017-12-27 Thread Mark Brown

The patch

   regmap: Add one flag to indicate if a hwlock should be used

has been applied to the regmap tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From a4887813c3a9481ab87c8a71ab1de50b975cc823 Mon Sep 17 00:00:00 2001
From: Baolin Wang 
Date: Mon, 25 Dec 2017 14:37:09 +0800
Subject: [PATCH] regmap: Add one flag to indicate if a hwlock should be used

Since the hwlock id 0 is valid for hardware spinlock core, but now id 0
is treated as one invalid value for regmap. Thus we should add one extra
flag for regmap config to indicate if a hardware spinlock should be used,
then id 0 can be valid for regmap to request.

Signed-off-by: Baolin Wang 
Signed-off-by: Mark Brown 
---
 drivers/base/regmap/regmap.c | 2 +-
 include/linux/regmap.h   | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/base/regmap/regmap.c b/drivers/base/regmap/regmap.c
index f25ab18ca057..d23a5c99b639 100644
--- a/drivers/base/regmap/regmap.c
+++ b/drivers/base/regmap/regmap.c
@@ -671,7 +671,7 @@ struct regmap *__regmap_init(struct device *dev,
map->lock = config->lock;
map->unlock = config->unlock;
map->lock_arg = config->lock_arg;
-   } else if (config->hwlock_id) {
+   } else if (config->use_hwlock) {
map->hwlock = hwspin_lock_request_specific(config->hwlock_id);
if (!map->hwlock) {
ret = -ENXIO;
diff --git a/include/linux/regmap.h b/include/linux/regmap.h
index 15eddc1353ba..c78e0057df66 100644
--- a/include/linux/regmap.h
+++ b/include/linux/regmap.h
@@ -317,6 +317,7 @@ typedef void (*regmap_unlock)(void *);
  *
  * @ranges: Array of configuration entries for virtual address ranges.
  * @num_ranges: Number of range configuration entries.
+ * @use_hwlock: Indicate if a hardware spinlock should be used.
  * @hwlock_id: Specify the hardware spinlock id.
  * @hwlock_mode: The hardware spinlock mode, should be HWLOCK_IRQSTATE,
  *  HWLOCK_IRQ or 0.
@@ -365,6 +366,7 @@ struct regmap_config {
const struct regmap_range_cfg *ranges;
unsigned int num_ranges;
 
+   bool use_hwlock;
unsigned int hwlock_id;
unsigned int hwlock_mode;
 };
-- 
2.15.0

Re: [PATCH v6 00/11] Intel SGX Driver

2017-12-27 Thread Dr. Greg Wettstein

On Dec 12,  3:07pm, Pavel Machek wrote:
} Subject: Re: [PATCH v6 00/11] Intel SGX Driver

Good morning, I hope this note finds the holiday season going well for
everyone.  This note is a bit delayed due to the holidays, my
apologies.

Pretty wide swath on this e-mail but will include the copy list due to
the possible general interest and impact of these issues.  We have
done an independent implementation of Intel's platform software (PSW),
directed at the use of SGX on intelligent network endpoint devices, so
we have some experience with the issues under discussion.

> On Sat 2017-11-25 21:29:17, Jarkko Sakkinen wrote:
> > Intel(R) SGX is a set of CPU instructions that can be used by
> > applications to set aside private regions of code and data. The
> > code outside the enclave is disallowed to access the memory inside
> > the enclave by the CPU access control.  In a way you can think
> > that SGX provides inverted sandbox. It protects the application
> > from a malicious host.

> Would you list guarantees provided by SGX?

Obviously, confidentiality and integrity.  SGX was designed to address
an Iago threat model, a very difficult challenge to address in
reality.

On SGX capable platforms, the Memory Encryption Engine (MEE) is an
integrated component of the hardware MMU, as SGX is a virtual memory
play.  As a result, the executable code and data are encrypted in main
memory and only decrypted when the data is fed from memory onto the
hardware fetch queues.  Irregardless of anything else, this has
implications with respect to cold boot attacks, if an architect
chooses to worry about that threat modality.

In reality, we believe the guarantee that is most important is
integrity, given the issues below.

> For example, host can still observe timing of cachelines being
> accessed by "protected" app, right? Can it also introduce bit flips?

Timing attacks are the bane of SGX, just as they are throughout the
rest of the commodity architectures.  Jarkko cited Beecham's work,
which is a good reference.  Oakland's work on controlled side-channel
attacks is also a very good, and fundamental, read on the issues
involved.

Microsoft Research and Georgia Tech have a paper out discussing the
use of transactional memory to mitigate these.

I don't have the citation immediately available, but a bit-flip attack
has also been described on enclaves.  Due to the nature of the
architecture, they tend to crash the enclave so they are more in the
category of a denial-of-service attack, rather then a functional
confidentiality or integrity compromise.

At the end of the day, giving up complete observational and functional
control to an adversary is a difficult challenge to address.  There is
also a large difference between attacks that can be conducted in a
carefully controlled lab environment and what an adversary or malware
can implement in practice.

Platforms which require security assurances ultimately need a root of
trust.  That either comes from a TPM or a Trusted Execution
Environment like SGX.  Realistically, we think the future involves an
integration of both technologies.  The only other alternative is
perfect software and I think the jury has already weighed in on that.

The advantage of SGX over a TPM is that it is blindingly fast with
respect to performance.  The IMA community has been involved in a
debate over the list digest patches in order to overcome performance
issues with TPM based extension measurements.  We lifted most of the
IMA infrastructure into an SGX enclave and demonstrated significant
performance impacts as a result.

The bigger question, for community integration, is the availability of
hardware.  I see Jarkko's patches are based on the notion of having
flexible launch control available, ie. the ability to program the
relevant MSR's with the checksum of the identity modulus which is to
serve as the root of trust.  I'm not sure there is any hardware in the
wild that currently supports this, Jarkko comments?

Even with that, the question arises as to what is going to be trusted
to program those registers.  The obvious candidate for this is
TXT/tboot which underscores a future involving the integration of
these technologies.

Unfortunately, in the security field it is way more fun, and seemingly
advantageous from a reputational perspective, to break things then to
build solutions :-)(

> Pavel

I hope the above clarifications are helpful.

Best wishes for a pleasant holiday weekend to everyone.

Dr. Greg

}-- End of excerpt from Pavel Machek

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.   Specializing in information infra-structure
Fargo, ND  58102development.
PH: 701-281-1686
FAX: 701-281-3949   EMAIL: g...@enjellic.com
--
"I suppose that could could happen but he wouldn't know a Galois Field
 if it kicked him in the nuts."

Re: [PATCH] USB: serial: ftdi_sio: add id for Airbus DS P8GR

2017-12-27 Thread Johan Hovold

On Wed, Dec 20, 2017 at 08:47:44PM +0100, Max Schulze wrote:
> Add AIRBUS_DS_P8GR device IDs to ftdi_sio driver.
> 
> Signed-off-by: Max Schulze 

Thanks for the patch. Note that I moved the new defines to try to keep
(some of) the ids sorted on VID, and dropped the comment header in the
id-table before applying. 

Johan

Re: [PATCH 10/11 v3] ARM: s3c24xx/s3c64xx: constify gpio_led

2017-12-27 Thread arvindY


Hi,

On Wednesday 27 December 2017 01:49 PM, Krzysztof Kozlowski wrote:

On Tue, Dec 26, 2017 at 7:50 PM, Arvind Yadav  wrote:

gpio_led are not supposed to change at runtime.
struct gpio_led_platform_data working with const gpio_led
provided by . So mark the non-const structs
as const.

Signed-off-by: Arvind Yadav 
---
changes in v2:
   The GPIO LED driver can be built as a module, it can
   be loaded after the init sections have gone away.
   So removed '__initconst'.
changes in v3:
  Description was missing.

  arch/arm/mach-s3c24xx/mach-h1940.c| 2 +-
  arch/arm/mach-s3c24xx/mach-rx1950.c   | 2 +-
  arch/arm/mach-s3c64xx/mach-hmt.c  | 2 +-
  arch/arm/mach-s3c64xx/mach-smartq5.c  | 2 +-
  arch/arm/mach-s3c64xx/mach-smartq7.c  | 2 +-
  arch/arm/mach-s3c64xx/mach-smdk6410.c | 2 +-
  6 files changed, 6 insertions(+), 6 deletions(-)

There were few build errors reported by kbuild for your patches. Are
you sure that you compiled every file you touch?

Best regards,
Krzysztof

Yes, I got few build error which I have fixed it. and send updated patch.
Now I have done cross checking.  It's not having any build failure.

Regards
arvind

[PATCH] PCI: imx6: Add PHY reference clock source support

2017-12-27 Thread Ilya Ledvich

i.MX7D variant of the IP can use either Crystal Oscillator input
or internal clock input as a Reference Clock input for PCIe PHY.
Add support for an optional property 'pcie-phy-refclk-internal'.
If present then an internal clock input is used as PCIe PHY
reference clock source. By default an external oscillator input
is still used.

Verified on Compulab SBC-iMX7 Single Board Computer.

Signed-off-by: Ilya Ledvich 
---
 Documentation/devicetree/bindings/pci/fsl,imx6q-pcie.txt | 5 +
 drivers/pci/dwc/pci-imx6.c   | 8 +++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/pci/fsl,imx6q-pcie.txt 
b/Documentation/devicetree/bindings/pci/fsl,imx6q-pcie.txt
index 7b1e48b..f9cf11e 100644
--- a/Documentation/devicetree/bindings/pci/fsl,imx6q-pcie.txt
+++ b/Documentation/devicetree/bindings/pci/fsl,imx6q-pcie.txt
@@ -50,6 +50,11 @@ Additional required properties for imx7d-pcie:
   - "pciephy"
   - "apps"
 
+Additional optional properties for imx7d-pcie:
+- pcie-phy-refclk-internal: If present then an internal PLL input is used as
+  PCIe PHY reference clock source. By default an external ocsillator input
+  is used.
+
 Example:
 
pcie@0x0100 {
diff --git a/drivers/pci/dwc/pci-imx6.c b/drivers/pci/dwc/pci-imx6.c
index b734835..a616192 100644
--- a/drivers/pci/dwc/pci-imx6.c
+++ b/drivers/pci/dwc/pci-imx6.c
@@ -61,6 +61,7 @@ struct imx6_pcie {
u32 tx_swing_low;
int link_gen;
struct regulator*vpcie;
+   boolpciephy_refclk_sel;
 };
 
 /* Parameters for the waiting for PCIe PHY PLL to lock on i.MX7 */
@@ -474,7 +475,9 @@ static void imx6_pcie_init_phy(struct imx6_pcie *imx6_pcie)
switch (imx6_pcie->variant) {
case IMX7D:
regmap_update_bits(imx6_pcie->iomuxc_gpr, IOMUXC_GPR12,
-  IMX7D_GPR12_PCIE_PHY_REFCLK_SEL, 0);
+  IMX7D_GPR12_PCIE_PHY_REFCLK_SEL,
+  imx6_pcie->pciephy_refclk_sel ?
+  IMX7D_GPR12_PCIE_PHY_REFCLK_SEL : 0);
break;
case IMX6SX:
regmap_update_bits(imx6_pcie->iomuxc_gpr, IOMUXC_GPR12,
@@ -840,6 +843,9 @@ static int imx6_pcie_probe(struct platform_device *pdev)
imx6_pcie->vpcie = NULL;
}
 
+   imx6_pcie->pciephy_refclk_sel =
+   of_property_read_bool(node, "pcie-phy-refclk-internal");
+
platform_set_drvdata(pdev, imx6_pcie);
 
ret = imx6_add_pcie_port(imx6_pcie, pdev);
-- 
1.9.1

[PATCH v2] MIPS: Use proper Return keyword

2017-12-27 Thread Mathieu Malaterre

For reference:
* 
https://www.kernel.org/doc/html/latest/doc-guide/kernel-doc.html#function-documentation

Fix non-fatal warning:

arch/mips/kernel/branch.c:418: warning: Excess function parameter 'returns' 
description in '__compute_return_epc_for_insn'

Signed-off-by: Mathieu Malaterre 
---
v2: Actually use the correct keyword

 arch/mips/kernel/branch.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/kernel/branch.c b/arch/mips/kernel/branch.c
index b79ed9af9886..e48f6c0a9e4a 100644
--- a/arch/mips/kernel/branch.c
+++ b/arch/mips/kernel/branch.c
@@ -399,7 +399,7 @@ int __MIPS16e_compute_return_epc(struct pt_regs *regs)
  *
  * @regs:  Pointer to pt_regs
  * @insn:  branch instruction to decode
- * @returns:   -EFAULT on error and forces SIGILL, and on success
+ * Return: -EFAULT on error and forces SIGILL, and on success
  * returns 0 or BRANCH_LIKELY_TAKEN as appropriate after
  * evaluating the branch.
  *
-- 
2.11.0

Re: [PATCH 1/2] ARM: dts: imx6: RDU2: disable internal watchdog

2017-12-27 Thread Fabio Estevam

Hi Andrey,

On Wed, Dec 27, 2017 at 1:56 AM, Andrey Smirnov
 wrote:
> The system has an external watchdog in the environment processor
> so the internal watchdog is of no use.
>
> Cc: Sascha Hauer 
> Cc: Fabio Estevam 
> Cc: Rob Herring 
> Cc: Mark Rutland 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: devicet...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: cphe...@gmail.com
> Signed-off-by: Lucas Stach 
> Signed-off-by: Andrey Smirnov 

Patch looks good.

Just not clear if the authorship comes from you or Lucas.

If Lucas is the original author then his name should appear in the From field.

Re: [PATCH 2/2] ARM: dts: imx6: RDU2: correct RTC compatible

2017-12-27 Thread Fabio Estevam

Hi Andrey,

On Wed, Dec 27, 2017 at 1:56 AM, Andrey Smirnov
 wrote:
> The RTC is manufactured by Maxim. This is a cosmetic fix, as Linux
> doesn't match the vendor string for i2c devices.
>
> Cc: Sascha Hauer 
> Cc: Fabio Estevam 
> Cc: Rob Herring 
> Cc: Mark Rutland 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: devicet...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: cphe...@gmail.com
> Signed-off-by: Lucas Stach 
> Signed-off-by: Andrey Smirnov 

This patch seems to be from Lucas:
https://patchwork.kernel.org/patch/10099397/

,so his name should appear in the From field.

Anyway, this patch has been sent earlier and we suggested to keep the
existing binding, which is the documented form:
https://patchwork.kernel.org/patch/10099397/

Re: [PATCH v2] gpio: winbond: add driver

2017-12-27 Thread Linus Walleij

On Wed, Dec 27, 2017 at 1:24 AM, William Breathitt Gray
 wrote:

> Here are the error messages printed on my system when I add a select
> ISA_BUS_API line to the GPIO_WINBOND Kconfig option in the version 2 of
> your patch:
>
> drivers/gpio/Kconfig:13:error: recursive dependency detected!
> For a resolution refer to Documentation/kbuild/kconfig-language.txt
> subsection "Kconfig recursive dependency limitations"
> drivers/gpio/Kconfig:13:symbol GPIOLIB is selected by STX104
> For a resolution refer to Documentation/kbuild/kconfig-language.txt
> subsection "Kconfig recursive dependency limitations"
> drivers/iio/adc/Kconfig:659:symbol STX104 depends on ISA_BUS_API
> For a resolution refer to Documentation/kbuild/kconfig-language.txt
> subsection "Kconfig recursive dependency limitations"
> arch/Kconfig:818:   symbol ISA_BUS_API is selected by GPIO_WINBOND
> For a resolution refer to Documentation/kbuild/kconfig-language.txt
> subsection "Kconfig recursive dependency limitations"
> drivers/gpio/Kconfig:701:   symbol GPIO_WINBOND depends on GPIOLIB

So STX104 depends on ISA_BUS_API which in turn is
selected by GPIO_WINBOND which also depends on GPIOLIB.

> The issue seems to relate to the select GPIOLIB line for the STX104
> Kconfig option (which has a ISA_BUS_API dependency). Switching GPIOLIB
> to be a dependency, or alternatively selecting ISA_BUS_API, alleviates
> the recursion.
>
> Linus, is my use of select GPIOLIB for the STX104 Kconfig option
> appropriate in this context -- or should it instead be part of the
> depends on line? The STX104 driver includes linux/gpio/driver.h and
> makes use of the devm_gpiochip_add_data function to add support for some
> minor auxililary GPIO lines on the STX104 device.

In the STX104 case, it seems to be appropriate to
select GPIOLIB, as it is a GPIO provider, not consumer.

Usually I prefer that drivers just select what they need so I don't
have to run around in the whole kernel tree and turn things on
to the left and right before I can finally select my driver, but
maybe that is just me.

The other ISA GPIO drivers depends on ISA_BUS_API, I guess
in difference from the symbol GPIOLIB it cannot be universally
selected, so shouldn't this driver also just depends on ISA_BUS_API
and select it from the machine or wherever?

Yours,
Linus Walleij

[PATCH] ASoC: rt5514-spi: Check the validity of drvdata pointer on resume

2017-12-27 Thread Marc Zyngier

The rt5514-spi driver seem to assume the validity of the drvdata pointer
on resume, which it may not be populated, leading to a not-so-nice crash.

This stems from the fact that rt5514_spi_pcm_probe() is never called on
my system (a kevin Chromebook). No idea why, but if it can happen, it
is worth fixing.

Fixes: e9c50aa6bd39 ("ASoC: rt5514-spi: check irq status to schedule data copy 
in resume function")
Signed-off-by: Marc Zyngier 
---
 sound/soc/codecs/rt5514-spi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/codecs/rt5514-spi.c b/sound/soc/codecs/rt5514-spi.c
index 2df91db765ac..9255afcf2c3a 100644
--- a/sound/soc/codecs/rt5514-spi.c
+++ b/sound/soc/codecs/rt5514-spi.c
@@ -482,7 +482,7 @@ static int __maybe_unused rt5514_resume(struct device *dev)
if (device_may_wakeup(dev))
disable_irq_wake(irq);
 
-   if (rt5514_dsp->substream) {
+   if (rt5514_dsp && rt5514_dsp->substream) {
rt5514_spi_burst_read(RT5514_IRQ_CTRL, (u8 *)&buf, sizeof(buf));
if (buf[0] & RT5514_IRQ_STATUS_BIT)
rt5514_schedule_copy(rt5514_dsp);
-- 
2.14.2

Re: [RESEND PATCH] blackfin: defconfig: Cleanup from old Kconfig options

2017-12-27 Thread Linus Walleij

On Tue, Dec 26, 2017 at 2:30 PM, Krzysztof Kozlowski  wrote:

> Remove old, dead Kconfig options (in order appearing in this commit):
>  - EXPERIMENTAL is gone since v3.9;
>  - INET_LRO: commit 7bbf3cae65b6 ("ipv4: Remove inet_lro library");
>  - USB_DEVICE_CLASS: commit 007bab91324e ("USB: remove
>CONFIG_USB_DEVICE_CLASS");
>  - HID_SUPPORT: commit 1f41a6a99476 ("HID: Fix the generic Kconfig
>options");
>  - NETDEV_1000 and NETDEV_1: commit f860b0522f65 ("drivers/net:
>Kconfig and Makefile cleanup"); NET_ETHERNET should be replaced with
>just ETHERNET but that is separate change;
>  - RCU_CPU_STALL_DETECTOR: commit a00e0d714fbd ("rcu: Remove conditional
>compilation for RCU CPU stall warnings");
>  - MISC_DEVICES: commit 7c5763b8453a ("drivers: misc: Remove
>MISC_DEVICES config option");
>
> Signed-off-by: Krzysztof Kozlowski 

Acked-by: Linus Walleij 

This architecture is becoming a burden :/

Last blackfin pull request was in June 2014 by Steven Miao.

Steven, what's up with this? I don't mind the arch/blackfin as much
as the plethora of boardfiles that need to be maintained as soon as
we change some platform data or so, and it's just a mystery whether
things really get tested and any of the changes made here since 2014
are screwing up the blackfin. No ACKs or Tested-by's ever appear.

Is there active testing of mainline with blackfin?

Yours,
Linus Walleij

Re: [RESEND PATCH] blackfin: defconfig: Cleanup from old Kconfig options

2017-12-27 Thread Adam Borowski

On Wed, Dec 27, 2017 at 12:29:33PM +0100, Linus Walleij wrote:
> On Tue, Dec 26, 2017 at 2:30 PM, Krzysztof Kozlowski  wrote:
> 
> > Remove old, dead Kconfig options (in order appearing in this commit):
> >
> > Signed-off-by: Krzysztof Kozlowski 
> 
> Acked-by: Linus Walleij 
> 
> This architecture is becoming a burden :/
> 
> Last blackfin pull request was in June 2014 by Steven Miao.

April 2015, actually.  Doesn't change the conclusion, though.

> Steven, what's up with this? I don't mind the arch/blackfin as much
> as the plethora of boardfiles that need to be maintained as soon as
> we change some platform data or so, and it's just a mystery whether
> things really get tested and any of the changes made here since 2014
> are screwing up the blackfin. No ACKs or Tested-by's ever appear.
> 
> Is there active testing of mainline with blackfin?

After multiple pings, I just (two days ago) sent the following:
https://marc.info/?l=linux-kernel&m=151421639229901

So even if there's no actual testing, at the very least it'd be good to mark
the architecture as orphaned, so people don't wait for ACKs that never come.


Meow!
-- 
// If you believe in so-called "intellectual property", please immediately
// cease using counterfeit alphabets.  Instead, contact the nearest temple
// of Amon, whose priests will provide you with scribal services for all
// your writing needs, for Reasonable And Non-Discriminatory prices.

RE: [PATCH 1/4] lockd: convert nlm_host.h_count from atomic_t to refcount_t

2017-12-27 Thread Reshetova, Elena

> On Fri, Dec 22, 2017 at 09:25:53AM -0500, J. Bruce Fields wrote:
> > On Fri, Dec 22, 2017 at 09:29:15AM +, Reshetova, Elena wrote:
> > >
> > > On Wed, Nov 29, 2017 at 01:15:43PM +0200, Elena Reshetova wrote:
> > > > atomic_t variables are currently used to implement reference
> > > > counters with the following properties:
> > > >  - counter is initialized to 1 using atomic_set()
> > > >  - a resource is freed upon counter reaching zero
> > > >  - once counter reaches zero, its further
> > > >increments aren't allowed
> > > >  - counter schema uses basic atomic operations
> > > >(set, inc, inc_not_zero, dec_and_test, etc.)
> > >
> > > >Whoops, I forgot that this doesn't apply to h_count.
> > >
> > > >Well, it's confusing, because h_count is actually used in two different
> > > >ways: depending on whether a nlm_host represents a client or server, it
> > > >may have the above properties or not.
> > >
> > >
> > > So, what happens when it is not having the above properties? Is the object
> > > being reused or?
> >
> > The object isn't destroyed when the counter hits zero--zero is just
> > taken as a hint to some garbage collection algorithm that it would be OK
> > to destroy it.  So decrementing to or incrementing from zero is OK.
> 
> In more detail: the nlm_host objects that are used on the NFS server to
> represent NFS clients are put by nlmsvc_release_host, and then may
> eventually be freed by nlm_gc_hosts.
> 
> The nlm_host objects that are used on the NFS client to represent NFS
> servers are put (and freed when h_count goes to zero) by
> nlmclnt_release_host.
> 
> In both cases reference are taken by nlm_get_host.  It would be possible
> to replace nlm_get_host by two different functions if that would help.
> Most callers are obviously only client-side or server-side.  The only
> exception is next_host_state.  It could be passed a pointer to the "get"
> function it should use.
> 
> After that we might actually just want to define separate client and
> server structs like:
> 
>   struct nlm_clnt_host {
>   struct nlm_host ch_host;
>   refcount_t  ch_count;
>   ...
>   }
> 
>   struct nlm_srv_host {
>   struct nlm_host sh_host;
>   refcount_t  sh_count;
>   ...
>   }
> 
> rather than have a single h_count which is used in two confusingly
> different ways.  There are also some other nlm_host fields that really
> only make sense for client or server.

This sounds reasonable for me, but obviously it is a bigger change and I might 
not
have enough knowledge on NFS to make it correctly. 

In any case, even for the current server case, when freeing might not happen 
and object gets 
re-used later on, is it possible to simply re-initialize the object (and its 
reference counter) properly before reusing?
I think this is the only thing that is needed from the correct refcounting POV 
in this case, so
instead of using refcount_inc() on reused object, you would explicitly do 
refcount_set(counter, 1) when reuse happens.


Best Regards,
Elena
> 
> --b.

[PATCH] clk: qcom: Add support for controlling Fabia PLL

2017-12-27 Thread Amit Nischal

Fabia PLL is a Digital Frequency Locked Loop (DFLL) clock
generator which has a wide range of frequency output. It
supports dynamic updating of the output frequency
("frequency slewing") without need to turn off the PLL
before configuration. Add support for initial configuration
and programming sequence to control fabia PLLs.

Signed-off-by: Amit Nischal 
---
 drivers/clk/qcom/clk-alpha-pll.c | 305 +++
 drivers/clk/qcom/clk-alpha-pll.h |  16 ++
 2 files changed, 321 insertions(+)

diff --git a/drivers/clk/qcom/clk-alpha-pll.c b/drivers/clk/qcom/clk-alpha-pll.c
index ad7478b..947607d 100644
--- a/drivers/clk/qcom/clk-alpha-pll.c
+++ b/drivers/clk/qcom/clk-alpha-pll.c
@@ -58,6 +58,8 @@
 #define PLL_TEST_CTL(p)((p)->offset + 
(p)->regs[PLL_OFF_TEST_CTL])
 #define PLL_TEST_CTL_U(p)  ((p)->offset + (p)->regs[PLL_OFF_TEST_CTL_U])
 #define PLL_STATUS(p)  ((p)->offset + (p)->regs[PLL_OFF_STATUS])
+#define PLL_OPMODE(p)  ((p)->offset + (p)->regs[PLL_OFF_OPMODE])
+#define PLL_FRAC(p)((p)->offset + (p)->regs[PLL_OFF_FRAC])

 const u8 clk_alpha_pll_regs[][PLL_OFF_MAX_REGS] = {
[CLK_ALPHA_PLL_TYPE_DEFAULT] =  {
@@ -90,6 +92,18 @@
[PLL_OFF_TEST_CTL] = 0x1c,
[PLL_OFF_STATUS] = 0x24,
},
+   [CLK_ALPHA_PLL_TYPE_FABIA] =  {
+   [PLL_OFF_L_VAL] = 0x04,
+   [PLL_OFF_USER_CTL] = 0x0c,
+   [PLL_OFF_USER_CTL_U] = 0x10,
+   [PLL_OFF_CONFIG_CTL] = 0x14,
+   [PLL_OFF_CONFIG_CTL_U] = 0x18,
+   [PLL_OFF_TEST_CTL] = 0x1c,
+   [PLL_OFF_TEST_CTL_U] = 0x20,
+   [PLL_OFF_STATUS] = 0x24,
+   [PLL_OFF_OPMODE] = 0x2c,
+   [PLL_OFF_FRAC] = 0x38,
+   },
 };

 /*
@@ -107,6 +121,12 @@
 #define PLL_HUAYRA_N_MASK  0xff
 #define PLL_HUAYRA_ALPHA_WIDTH 16

+#define FABIA_OPMODE_STANDBY   0x0
+#define FABIA_OPMODE_RUN   0x1
+
+#define FABIA_PLL_OUT_MASK 0x7
+#define FABIA_PLL_RATE_MARGIN  500
+
 #define pll_alpha_width(p) \
((PLL_ALPHA_VAL_U(p) - PLL_ALPHA_VAL(p) == 4) ? \
 ALPHA_REG_BITWIDTH : ALPHA_REG_16BIT_WIDTH)
@@ -819,3 +839,288 @@ static int clk_alpha_pll_postdiv_set_rate(struct clk_hw 
*hw, unsigned long rate,
.recalc_rate = clk_alpha_pll_postdiv_recalc_rate,
 };
 EXPORT_SYMBOL_GPL(clk_alpha_pll_postdiv_ro_ops);
+
+void clk_fabia_pll_configure(struct clk_alpha_pll *pll, struct regmap *regmap,
+const struct alpha_pll_config *config)
+{
+   u32 val, mask;
+
+   if (config->l)
+   regmap_write(regmap, PLL_L_VAL(pll), config->l);
+
+   if (config->alpha)
+   regmap_write(regmap, PLL_FRAC(pll), config->alpha);
+
+   if (config->config_ctl_val)
+   regmap_write(regmap, PLL_CONFIG_CTL(pll),
+   config->config_ctl_val);
+
+   if (config->post_div_mask) {
+   mask = config->post_div_mask;
+   val = config->post_div_val;
+   regmap_update_bits(regmap, PLL_USER_CTL(pll), mask, val);
+   }
+
+   regmap_update_bits(regmap, PLL_MODE(pll), PLL_UPDATE_BYPASS,
+   PLL_UPDATE_BYPASS);
+
+   regmap_update_bits(regmap, PLL_MODE(pll), PLL_RESET_N, PLL_RESET_N);
+}
+
+static int alpha_pll_fabia_enable(struct clk_hw *hw)
+{
+   int ret;
+   struct clk_alpha_pll *pll = to_clk_alpha_pll(hw);
+   u32 val, opmode_val;
+
+   ret = regmap_read(pll->clkr.regmap, PLL_MODE(pll), &val);
+   if (ret)
+   return ret;
+
+   /* If in FSM mode, just vote for it */
+   if (val & PLL_VOTE_FSM_ENA) {
+   ret = clk_enable_regmap(hw);
+   if (ret)
+   return ret;
+   return wait_for_pll_enable_active(pll);
+   }
+
+   /* Read opmode value */
+   ret = regmap_read(pll->clkr.regmap, PLL_OPMODE(pll), &opmode_val);
+   if (ret)
+   return ret;
+
+   /* Skip If PLL is already running */
+   if ((opmode_val & FABIA_OPMODE_RUN) && (val & PLL_OUTCTRL))
+   return 0;
+
+   /* Disable PLL output */
+   ret = regmap_update_bits(pll->clkr.regmap, PLL_MODE(pll),
+   PLL_OUTCTRL, 0);
+   if (ret)
+   return ret;
+
+   /* Set Operation mode to STANBY */
+   ret = regmap_write(pll->clkr.regmap, PLL_OPMODE(pll),
+   FABIA_OPMODE_STANDBY);
+   if (ret)
+   return ret;
+
+   /* PLL should be in STANDBY mode before continuing */
+   mb();
+
+   /* Bring PLL out of reset */
+   ret = regmap_update_bits(pll->clkr.regmap, PLL_MODE(pll),
+   PLL_RESET_N, PLL_RES

[PATCH IMPROVEMENT/BUGFIX 1/1] block, bfq: limit tags for writes and async I/O

2017-12-27 Thread Paolo Valente

Asynchronous I/O can easily starve synchronous I/O (both sync reads
and sync writes), by consuming all request tags. Similarly, storms of
synchronous writes, such as those that sync(2) may trigger, can starve
synchronous reads. In their turn, these two problems may also cause
BFQ to loose control on latency for interactive and soft real-time
applications. For example, on a PLEXTOR PX-256M5S SSD, LibreOffice
Writer takes 0.6 seconds to start if the device is idle, but it takes
more than 45 seconds (!) if there are sequential writes in the
background.

This commit addresses this issue by limiting the maximum percentage of
tags that asynchronous I/O requests and synchronous write requests can
consume. In particular, this commit grants a higher threshold to
synchronous writes, to prevent the latter from being starved by
asynchronous I/O.

According to the above test, LibreOffice Writer now starts in about
1.2 seconds on average, regardless of the background workload, and
apart from some rare outlier. To check this improvement, run, e.g.,
sudo ./comm_startup_lat.sh bfq 5 5 seq 10 "lowriter --terminate_after_init"
for the comm_startup_lat benchmark in the S suite [1].

[1] https://github.com/Algodev-github/S

Signed-off-by: Paolo Valente 
---
 block/bfq-iosched.c | 77 +
 block/bfq-iosched.h | 12 +
 2 files changed, 89 insertions(+)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index e33c5c4c9856..6f75015d18c0 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -417,6 +417,82 @@ static struct request *bfq_choose_req(struct bfq_data 
*bfqd,
}
 }
 
+/*
+ * See the comments on bfq_limit_depth for the purpose of
+ * the depths set in the function.
+ */
+static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
+{
+   bfqd->sb_shift = bt->sb.shift;
+
+   /*
+* In-word depths if no bfq_queue is being weight-raised:
+* leaving 25% of tags only for sync reads.
+*
+* In next formulas, right-shift the value
+* (1Usb_shift - something)), to be robust against
+* any possible value of bfqd->sb_shift, without having to
+* limit 'something'.
+*/
+   /* no more than 50% of tags for async I/O */
+   bfqd->word_depths[0][0] = max((1U>1, 1U);
+   /*
+* no more than 75% of tags for sync writes (25% extra tags
+* w.r.t. async I/O, to prevent async I/O from starving sync
+* writes)
+*/
+   bfqd->word_depths[0][1] = max(((1U>2, 1U);
+
+   /*
+* In-word depths in case some bfq_queue is being weight-
+* raised: leaving ~63% of tags for sync reads. This is the
+* highest percentage for which, in our tests, application
+* start-up times didn't suffer from any regression due to tag
+* shortage.
+*/
+   /* no more than ~18% of tags for async I/O */
+   bfqd->word_depths[1][0] = max(((1U>4, 1U);
+   /* no more than ~37% of tags for sync writes (~20% extra tags) */
+   bfqd->word_depths[1][1] = max(((1U>4, 1U);
+}
+
+/*
+ * Async I/O can easily starve sync I/O (both sync reads and sync
+ * writes), by consuming all tags. Similarly, storms of sync writes,
+ * such as those that sync(2) may trigger, can starve sync reads.
+ * Limit depths of async I/O and sync writes so as to counter both
+ * problems.
+ */
+static void bfq_limit_depth(unsigned int op, struct blk_mq_alloc_data *data)
+{
+   struct blk_mq_tags *tags = blk_mq_tags_from_data(data);
+   struct bfq_data *bfqd = data->q->elevator->elevator_data;
+   struct sbitmap_queue *bt;
+
+   if (op_is_sync(op) && !op_is_write(op))
+   return;
+
+   if (data->flags & BLK_MQ_REQ_RESERVED) {
+   if (unlikely(!tags->nr_reserved_tags)) {
+   WARN_ON_ONCE(1);
+   return;
+   }
+   bt = &tags->breserved_tags;
+   } else
+   bt = &tags->bitmap_tags;
+
+   if (unlikely(bfqd->sb_shift != bt->sb.shift))
+   bfq_update_depths(bfqd, bt);
+
+   data->shallow_depth =
+   bfqd->word_depths[!!bfqd->wr_busy_queues][op_is_sync(op)];
+
+   bfq_log(bfqd, "[%s] wr_busy %d sync %d depth %u",
+   __func__, bfqd->wr_busy_queues, op_is_sync(op),
+   data->shallow_depth);
+}
+
 static struct bfq_queue *
 bfq_rq_pos_tree_lookup(struct bfq_data *bfqd, struct rb_root *root,
 sector_t sector, struct rb_node **ret_parent,
@@ -5265,6 +5341,7 @@ static struct elv_fs_entry bfq_attrs[] = {
 
 static struct elevator_type iosched_bfq_mq = {
.ops.mq = {
+   .limit_depth= bfq_limit_depth,
.prepare_request= bfq_prepare_request,
.finish_

[PATCH IMPROVEMENT/BUGFIX 0/1] block, bfq: address starvation caused by tag consumption

2017-12-27 Thread Paolo Valente

Hi Jens, all,
here's the patch I anticipated in my last email. It addresses
(serious) starvation problems caused by request-tag exhaustion, as
explained in more detail in the commit message. I started from the
solution in the function kyber_limit_depth, but then I had to define
more articulate limits, to counter starvation also in cases not
covered in kyber_limit_depth.

If this solution proves to be effective, I'm willing to port it
somehow to the other schedulers.

Thanks,
Paolo

Paolo Valente (1):
  block, bfq: limit tags for writes and async I/O

 block/bfq-iosched.c | 77 +
 block/bfq-iosched.h | 12 +
 2 files changed, 89 insertions(+)

--
2.15.1

Re: [PATCH] clk: mediatek: remove unnecessary include header from reset.c

2017-12-27 Thread Jean Delvare

Hi Sean, Stephen,

On Wed, 27 Dec 2017 11:33:00 +0800, Sean Wang wrote:
> On Tue, 2017-12-26 at 17:10 -0800, Stephen Boyd wrote:
> > drivers/clk/mediatek/reset.c:64:6: warning: symbol 
> > 'mtk_register_reset_controller' was not declared. Should it be static?
> 
> It cannot be static since the function would be referenced in other
> files under the same folder
> 
> 
> One point I felt confused which is I didn't see the warning complains
> when I did these build test, even I also added -Werror and -Wall to
> build all files under driver/clk/mediatek. My toolchain is based on gcc
> version 5.2.0 (GCC).

I tested and I get the warning here (gcc 4.8.5 on SUSE) but only after
setting CONFIG_RESET_CONTROLLER=y. Without it,
drivers/clk/mediatek/reset.o is never built, so no warning can be
generated.

> If the warning still is, the include "clk-mtk.h" should be good to stay
> there because the declaration it needs is in the clk-mtk.h

Agreed.

-- 
Jean Delvare
SUSE L3 Support

Re: [PATCH v2] x86/kexec: Exclude GART aperture from vmcore

2017-12-27 Thread Borislav Petkov

On Wed, Dec 27, 2017 at 03:44:49PM +0800, Baoquan He wrote:
> > yes, instead of crashing the machine (because GART may be initialized in the
> > 2nd kernel, overlapping the 1st kernel memory, which the 2nd kernel with its
> > fake e820 map sees as unused).
> > 
> > I'd say this is an improvement.
> 
> I don't get what you said. If 'iommu=off' only specified in 1st kernel,
> kdump kernel will think the memory which GART bar pointed as a hole.
> This is incorrect. I don't see the improvement.

So he says, this memory is unused. Why is that incorrect?!?

Wh do I care about dumping unused memory?!?!

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

[PATCH 0/2] Add efuse driver for Ingenic JZ4780 SoC

2017-12-27 Thread Mathieu Malaterre

This patchset bring support for read-only access to the JZ4780 efuse as found
on MIPS Creator CI20.

To keep the driver as simple as possible, it was not possible to re-use most of
the nvmem core functionalities. This driver is not compatible with the original
efuse driver as found in the custom linux kernel from upstream (1), in
particular it does not expose to the users neither:
`/sys/devices/platform/*/chip_id` nor `/sys/devices/platform/*/user_id`.

The goal of this driver is to provide access to the MAC address to the dm9000
driver.

(1) 
https://github.com/ZubairLK/CI20_linux/commit/6efd4ffca7dcfaff0794ab60cd6922ce96c60419

Mathieu Malaterre (1):
  dts: Probe efuse for CI20

PrasannaKumar Muralidharan (1):
  nvmem: add driver for JZ4780 efuse

 .../ABI/testing/sysfs-driver-jz4780-efuse  |  16 ++
 .../bindings/nvmem/ingenic,jz4780-efuse.txt|  17 ++
 MAINTAINERS|   5 +
 arch/mips/boot/dts/ingenic/jz4780.dtsi |  40 ++-
 arch/mips/configs/ci20_defconfig   |   2 +
 drivers/nvmem/Kconfig  |  10 +
 drivers/nvmem/Makefile |   2 +
 drivers/nvmem/jz4780-efuse.c   | 274 +
 8 files changed, 354 insertions(+), 12 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-jz4780-efuse
 create mode 100644 
Documentation/devicetree/bindings/nvmem/ingenic,jz4780-efuse.txt
 create mode 100644 drivers/nvmem/jz4780-efuse.c

-- 
2.11.0

[PATCH 1/2] nvmem: add driver for JZ4780 efuse

2017-12-27 Thread Mathieu Malaterre

From: PrasannaKumar Muralidharan 

This patch brings support for the JZ4780 efuse. Currently it only expose
a read only access to the entire 8K bits efuse memory.

Tested-by: Mathieu Malaterre 
Signed-off-by: PrasannaKumar Muralidharan 
---
 .../ABI/testing/sysfs-driver-jz4780-efuse  |  16 ++
 .../bindings/nvmem/ingenic,jz4780-efuse.txt|  17 ++
 MAINTAINERS|   5 +
 arch/mips/boot/dts/ingenic/jz4780.dtsi |  40 ++-
 drivers/nvmem/Kconfig  |  10 +
 drivers/nvmem/Makefile |   2 +
 drivers/nvmem/jz4780-efuse.c   | 274 +
 7 files changed, 352 insertions(+), 12 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-jz4780-efuse
 create mode 100644 
Documentation/devicetree/bindings/nvmem/ingenic,jz4780-efuse.txt
 create mode 100644 drivers/nvmem/jz4780-efuse.c

diff --git a/Documentation/ABI/testing/sysfs-driver-jz4780-efuse 
b/Documentation/ABI/testing/sysfs-driver-jz4780-efuse
new file mode 100644
index ..bb6f5d6ceea0
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-driver-jz4780-efuse
@@ -0,0 +1,16 @@
+What:  /sys/devices/*//nvmem
+Date:  December 2017
+Contact:   PrasannaKumar Muralidharan 
+Description:   read-only access to the efuse on the Ingenic JZ4780 SoC
+   The SoC has a one time programmable 8K efuse that is
+   split into segments. The driver supports read only.
+   The segments are
+   0x000   64 bit Random Number
+   0x008  128 bit Ingenic Chip ID
+   0x018  128 bit Customer ID
+   0x028 3520 bit Reserved
+   0x1E08 bit Protect Segment
+   0x1E1 2296 bit HDMI Key
+   0x300 2048 bit Security boot key
+Users: any user space application which wants to read the Chip
+   and Customer ID
diff --git a/Documentation/devicetree/bindings/nvmem/ingenic,jz4780-efuse.txt 
b/Documentation/devicetree/bindings/nvmem/ingenic,jz4780-efuse.txt
new file mode 100644
index ..cd6d67ec22fc
--- /dev/null
+++ b/Documentation/devicetree/bindings/nvmem/ingenic,jz4780-efuse.txt
@@ -0,0 +1,17 @@
+Ingenic JZ EFUSE driver bindings
+
+Required properties:
+- "compatible" Must be set to "ingenic,jz4780-efuse"
+- "reg"Register location and length
+- "clocks" Handle for the ahb clock for the efuse.
+- "clock-names"Must be "bus_clk"
+
+Example:
+
+efuse: efuse@134100d0 {
+   compatible = "ingenic,jz4780-efuse";
+   reg = <0x134100D0 0xFF>;
+
+   clocks = <&cgu JZ4780_CLK_AHB2>;
+   clock-names = "bus_clk";
+};
diff --git a/MAINTAINERS b/MAINTAINERS
index a6e86e20761e..7a050c20c533 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6902,6 +6902,11 @@ M:   Zubair Lutfullah Kakakhel 

 S: Maintained
 F: drivers/dma/dma-jz4780.c
 
+INGENIC JZ4780 EFUSE Driver
+M: PrasannaKumar Muralidharan 
+S: Maintained
+F: drivers/nvmem/jz4780-efuse.c
+
 INGENIC JZ4780 NAND DRIVER
 M: Harvey Hunt 
 L: linux-...@lists.infradead.org
diff --git a/arch/mips/boot/dts/ingenic/jz4780.dtsi 
b/arch/mips/boot/dts/ingenic/jz4780.dtsi
index 9b5794667aee..3fb9d916a2ea 100644
--- a/arch/mips/boot/dts/ingenic/jz4780.dtsi
+++ b/arch/mips/boot/dts/ingenic/jz4780.dtsi
@@ -224,21 +224,37 @@
reg = <0x10002000 0x100>;
};
 
-   nemc: nemc@1341 {
-   compatible = "ingenic,jz4780-nemc";
-   reg = <0x1341 0x1>;
-   #address-cells = <2>;
+
+   ahb2: ahb2 {
+   compatible = "simple-bus";
+   #address-cells = <1>;
#size-cells = <1>;
-   ranges = <1 0 0x1b00 0x100
- 2 0 0x1a00 0x100
- 3 0 0x1900 0x100
- 4 0 0x1800 0x100
- 5 0 0x1700 0x100
- 6 0 0x1600 0x100>;
+   ranges = <>;
+
+   nemc: nemc@1341 {
+   compatible = "ingenic,jz4780-nemc";
+   reg = <0x1341 0x1>;
+   #address-cells = <2>;
+   #size-cells = <1>;
+   ranges = <1 0 0x1b00 0x100
+ 2 0 0x1a00 0x100
+ 3 0 0x1900 0x100
+ 4 0 0x1800 0x100
+ 5 0 0x1700 0x100
+ 6 0 0x1600 0x100>;
+
+   clocks = <&cgu JZ4780_CLK_NEMC>;
+
+   status = "disabled";
+   };
 
-   clocks = <&cgu JZ4780_CLK_NEMC>;
+   efuse: efuse@134100d0 {
+   compatible

[PATCH 2/2] dts: Probe efuse for CI20

2017-12-27 Thread Mathieu Malaterre

Signed-off-by: Mathieu Malaterre 
---
 arch/mips/configs/ci20_defconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/mips/configs/ci20_defconfig b/arch/mips/configs/ci20_defconfig
index b5f4ad8f2c45..62c63617e97a 100644
--- a/arch/mips/configs/ci20_defconfig
+++ b/arch/mips/configs/ci20_defconfig
@@ -171,3 +171,5 @@ CONFIG_STACKTRACE=y
 # CONFIG_FTRACE is not set
 CONFIG_CMDLINE_BOOL=y
 CONFIG_CMDLINE="earlycon console=ttyS4,115200 clk_ignore_unused"
+CONFIG_NVMEM=y
+CONFIG_JZ4780_EFUSE=y
-- 
2.11.0

[RFC PATCH] memory-hotplug: add sysfs immovable_mem attribute

2017-12-27 Thread Chao Fan

In sometimes users specify the memory region in immovable node in
some kernel commandline, such as "kernel_core" or the "immovable_mem="
in the patchset that I have send. But users don't know the memory
region. So add this interface to print it.

It will show like this: "nn@ss,nn@ss,...". "nn" means the size of memory
region, "ss" means the start position of this region.

Signed-off-by: Chao Fan 
---
 drivers/base/memory.c | 50 ++
 1 file changed, 50 insertions(+)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 1d60b58a8c19..9cadf1a9dccb 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -25,6 +25,7 @@
 
 #include 
 #include 
+#include 
 
 static DEFINE_MUTEX(mem_sysfs_mutex);
 
@@ -389,6 +390,52 @@ static ssize_t show_phys_device(struct device *dev,
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
+/*
+ * Immovable memory region
+ */
+
+static ssize_t
+show_immovable_mem(struct device *dev, struct device_attribute *attr,
+  char *buf)
+{
+   struct acpi_table_header *table_header = NULL;
+   struct acpi_srat_mem_affinity *ma;
+   struct acpi_subtable_header *th;
+   unsigned long long table_size;
+   unsigned long long table_end;
+   char pbuf[35], *p = buf;
+   int len;
+
+   acpi_get_table(ACPI_SIG_SRAT, 0, &table_header);
+
+   table_size = sizeof(struct acpi_table_srat);
+   table_end = (unsigned long)table_header + table_header->length;
+   th = (struct acpi_subtable_header *)((unsigned long)
+ table_header + table_size);
+
+   while (((unsigned long)th) +
+  sizeof(struct acpi_subtable_header) < table_end) {
+   if (th->type == 1) {
+   ma = (struct acpi_srat_mem_affinity *)th;
+   if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)
+   continue;
+   len = sprintf(pbuf, "%llx@%llx",
+  ma->length, ma->base_address);
+   if (p != buf) {
+   *p = ',';
+   p++;
+   }
+   memcpy(p, pbuf, len);
+   p = p + len;
+   }
+   th = (struct acpi_subtable_header *)((unsigned long)
+ th + th->length);
+   }
+   return sprintf(buf, "%s\n", buf);
+}
+
+static DEVICE_ATTR(immovable_mem, 0444, show_immovable_mem, NULL);
+
 static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
unsigned long nr_pages, int online_type,
struct zone *default_zone)
@@ -798,6 +845,9 @@ static struct attribute *memory_root_attrs[] = {
 #endif
 
&dev_attr_block_size_bytes.attr,
+#ifdef CONFIG_MEMORY_HOTREMOVE
+   &dev_attr_immovable_mem.attr,
+#endif
&dev_attr_auto_online_blocks.attr,
NULL
 };
-- 
2.14.3

Re: [RFC PATCH] memory-hotplug: add sysfs immovable_mem attribute

2017-12-27 Thread Chao Fan

The result in my virtual machine looks like this:
[root@localhost ~]# cat /sys/devices/system/memory/immovable_mem 
a@0,1f30@10,1f40@1f40,1f40@3e80,1f40@5dc0,1f40@7d00,1f40@9c40,480@bb80,1ac0@1,1f40@11ac0,1f40@13a00,1f40@15940

Any comments will be welcome.

Thanks,
Chao Fan

On Wed, Dec 27, 2017 at 08:30:12PM +0800, Chao Fan wrote:
>In sometimes users specify the memory region in immovable node in
>some kernel commandline, such as "kernel_core" or the "immovable_mem="
>in the patchset that I have send. But users don't know the memory
>region. So add this interface to print it.
>
>It will show like this: "nn@ss,nn@ss,...". "nn" means the size of memory
>region, "ss" means the start position of this region.
>
>Signed-off-by: Chao Fan 
>---
> drivers/base/memory.c | 50 ++
> 1 file changed, 50 insertions(+)
>
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index 1d60b58a8c19..9cadf1a9dccb 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -25,6 +25,7 @@
> 
> #include 
> #include 
>+#include 
> 
> static DEFINE_MUTEX(mem_sysfs_mutex);
> 
>@@ -389,6 +390,52 @@ static ssize_t show_phys_device(struct device *dev,
> }
> 
> #ifdef CONFIG_MEMORY_HOTREMOVE
>+/*
>+ * Immovable memory region
>+ */
>+
>+static ssize_t
>+show_immovable_mem(struct device *dev, struct device_attribute *attr,
>+ char *buf)
>+{
>+  struct acpi_table_header *table_header = NULL;
>+  struct acpi_srat_mem_affinity *ma;
>+  struct acpi_subtable_header *th;
>+  unsigned long long table_size;
>+  unsigned long long table_end;
>+  char pbuf[35], *p = buf;
>+  int len;
>+
>+  acpi_get_table(ACPI_SIG_SRAT, 0, &table_header);
>+
>+  table_size = sizeof(struct acpi_table_srat);
>+  table_end = (unsigned long)table_header + table_header->length;
>+  th = (struct acpi_subtable_header *)((unsigned long)
>+table_header + table_size);
>+
>+  while (((unsigned long)th) +
>+ sizeof(struct acpi_subtable_header) < table_end) {
>+  if (th->type == 1) {
>+  ma = (struct acpi_srat_mem_affinity *)th;
>+  if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)
>+  continue;
>+  len = sprintf(pbuf, "%llx@%llx",
>+ ma->length, ma->base_address);
>+  if (p != buf) {
>+  *p = ',';
>+  p++;
>+  }
>+  memcpy(p, pbuf, len);
>+  p = p + len;
>+  }
>+  th = (struct acpi_subtable_header *)((unsigned long)
>+th + th->length);
>+  }
>+  return sprintf(buf, "%s\n", buf);
>+}
>+
>+static DEVICE_ATTR(immovable_mem, 0444, show_immovable_mem, NULL);
>+
> static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
>   unsigned long nr_pages, int online_type,
>   struct zone *default_zone)
>@@ -798,6 +845,9 @@ static struct attribute *memory_root_attrs[] = {
> #endif
> 
>   &dev_attr_block_size_bytes.attr,
>+#ifdef CONFIG_MEMORY_HOTREMOVE
>+  &dev_attr_immovable_mem.attr,
>+#endif
>   &dev_attr_auto_online_blocks.attr,
>   NULL
> };
>-- 
>2.14.3
>

Re: [Ocfs2-devel] [PATCH] ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE

2017-12-27 Thread Changwei Ge

Hi Gang,
I like your fix.
It looks good to me.

On 2017/12/27 17:30, Gang He wrote:
> If we can't get inode lock immediately in the function
> ocfs2_inode_lock_with_page() when reading a page, we should not
> return directly here, since this will lead to a softlockup problem.
> The method is to get a blocking lock and immediately unlock before
> returning, this can avoid CPU resource waste due to lots of retries,
> and benefits fairness in getting lock among multiple nodes, increase
> efficiency in case modifying the same file frequently from multiple
> nodes.
> The softlockup problem looks like,
> Kernel panic - not syncing: softlockup: hung tasks
> CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> Call Trace:
>
>dump_stack+0x5c/0x82
>panic+0xd5/0x21e
>watchdog_timer_fn+0x208/0x210
>? watchdog_park_threads+0x70/0x70
>__hrtimer_run_queues+0xcc/0x200
>hrtimer_interrupt+0xa6/0x1f0
>smp_apic_timer_interrupt+0x34/0x50
>apic_timer_interrupt+0x96/0xa0
>
>   RIP: 0010:unlock_page+0x17/0x30
>   RSP: :af154080bc88 EFLAGS: 0246 ORIG_RAX: ff10
>   RAX: dead0100 RBX: f21e009f5300 RCX: 0004
>   RDX: dead00ff RSI: 0202 RDI: f21e009f5300
>   RBP:  R08:  R09: af154080bb00
>   R10: af154080bc30 R11: 0040 R12: 993749a39518
>   R13:  R14: f21e009f5300 R15: f21e009f5300
>ocfs2_inode_lock_with_page+0x25/0x30 [ocfs2]
>ocfs2_readpage+0x41/0x2d0 [ocfs2]
>? pagecache_get_page+0x30/0x200
>filemap_fault+0x12b/0x5c0
>? recalc_sigpending+0x17/0x50
>? __set_task_blocked+0x28/0x70
>? __set_current_blocked+0x3d/0x60
>ocfs2_fault+0x29/0xb0 [ocfs2]
>__do_fault+0x1a/0xa0
>__handle_mm_fault+0xbe8/0x1090
>handle_mm_fault+0xaa/0x1f0
>__do_page_fault+0x235/0x4b0
>trace_do_page_fault+0x3c/0x110
>async_page_fault+0x28/0x30
>   RIP: 0033:0x7fa75ded638e
>   RSP: 002b:7ffd6657db18 EFLAGS: 00010287
>   RAX: 55c7662fb700 RBX: 0001 RCX: 55c7662fb700
>   RDX: 1770 RSI: 7fa75e909000 RDI: 55c7662fb700
>   RBP: 0003 R08: 000e R09: 
>   R10: 0483 R11: 7fa75ded61b0 R12: 7fa75e90a770
>   R13: 000e R14: 1770 R15: 
> 
> Fixes: 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock")
> Signed-off-by: Gang He 
Reviewed-by: Changwei Ge 

> ---
>   fs/ocfs2/dlmglue.c | 9 +
>   1 file changed, 9 insertions(+)
> 
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 4689940..5193218 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -2486,6 +2486,15 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>   ret = ocfs2_inode_lock_full(inode, ret_bh, ex, OCFS2_LOCK_NONBLOCK);
>   if (ret == -EAGAIN) {
>   unlock_page(page);
> + /*
> +  * If we can't get inode lock immediately, we should not return
> +  * directly here, since this will lead to a softlockup problem.
> +  * The method is to get a blocking lock and immediately unlock
> +  * before returning, this can avoid CPU resource waste due to
> +  * lots of retries, and benefits fairness in getting lock.
> +  */
> + if (ocfs2_inode_lock(inode, ret_bh, ex) == 0)
> + ocfs2_inode_unlock(inode, ex);
>   ret = AOP_TRUNCATED_PAGE;
>   }
>   
>

Re: [PATCH] objtool: Fix clang enum conversion warning

2017-12-27 Thread Lukas Bulwahn



On Tue, 26 Dec 2017, Nick Desaulniers wrote:


I sent a similar one recently:
https://patchwork.kernel.org/patch/10131815/ (maybe Josh is just
forwarding me an earlier fix?)

Reviewed-by: Nick Desaulniers 



I actually submitted this (other) patch to LKML on 2017-12-10:

https://patchwork.kernel.org/patch/10103977/

I also pointed this out on the llvmlinux mailing list:

https://lists.linuxfoundation.org/pipermail/llvmlinux/2017-December/001535.html

(The mail might not have been distributed yet to its recipients, because I 
am on the llvmlinux mailing list only for a few days, and I might have not

been whitelisted for getting through the spam filtering of that list.)

Nick submitted another patch to LKML on 2017-12-24 (see above).

The source code change is the same; but the commit message was 
different. Now the third patch from Josh here is another equal patch with 
yet another commit message, combining information from both patches.


Assuming that the authorship of this one-line change does not matter, as 
it is largely suggested by the clang compiler anyway, and we want to move 
the change forward, we should decide on which of three patches to move

forward. I can give my Reviewed-by and Tested-by to any of them.

Lukas

[PATCH] sched: cgroup: export nr_running for each cpu cgroup

2017-12-27 Thread Yafang Shao

Export the nr_running for each cpu cgroup could help us monitor the
container conveniently.
The total threads of cpu cgroup could be got from the tasks file, and it
could also be got from pids subsystem.
But we still donot know how many processes are running in a container,
only if we traversal the status of all processes, that's a little
expensive.
Hence export the nr_running.

Signed-off-by: Yafang Shao 
---
 kernel/sched/core.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 644fa2e..926575a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6648,10 +6648,17 @@ static int cpu_cfs_stat_show(struct seq_file *sf, void 
*v)
 {
struct task_group *tg = css_tg(seq_css(sf));
struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
+   unsigned int nr_running = 0;
+   int cpu;
+
+   for_each_online_cpu(cpu)
+   if (tg->cfs_rq[cpu])
+   nr_running += tg->cfs_rq[cpu]->h_nr_running;
 
seq_printf(sf, "nr_periods %d\n", cfs_b->nr_periods);
seq_printf(sf, "nr_throttled %d\n", cfs_b->nr_throttled);
seq_printf(sf, "throttled_time %llu\n", cfs_b->throttled_time);
+   seq_printf(sf, "nr_running %u\n", nr_running);
 
return 0;
 }
-- 
1.8.3.1

Re: [SIL2review] [PATCH] fixdep: free memory on second error path of do_config_file

2017-12-27 Thread Lukas Bulwahn



On Mon, 18 Dec 2017, Masahiro Yamada wrote:


2017-12-15 17:23 GMT+09:00 Nicholas Mc Guire :

On Thu, Dec 14, 2017 at 08:54:10PM +0100, Lukas Bulwahn wrote:

Commit dee81e988674 ("fixdep: faster CONFIG_ search") introduces the memory
leak when `map = mmap(...)` was replaced with `map = malloc(...)` and
`read(fd, map, ...)`. It introduces a new second error path, which does not
free the allocated memory for `map`. We now correct that behavior and free
`map` before the do_config_file() function returns.

Facebook's static analysis tool Infer (http://fbinfer.com) found this
memory leak:

  scripts/basic/fixdep.c:297: error: MEMORY_LEAK
memory dynamically allocated by call to `malloc()` at line 290, \
column 8 is not reachable after line 297, column 3.

Fixes: dee81e988674 ("fixdep: faster CONFIG_ search")

Signed-off-by: Lukas Bulwahn 

Reviewed-by: Nicholas Mc Guire 


---
 scripts/basic/fixdep.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/scripts/basic/fixdep.c b/scripts/basic/fixdep.c
index bbf62cb..131c450 100644
--- a/scripts/basic/fixdep.c
+++ b/scripts/basic/fixdep.c
@@ -296,6 +296,7 @@ static void do_config_file(const char *filename)
  if (read(fd, map, st.st_size) != st.st_size) {
  perror("fixdep: read");
  close(fd);
+ free(map);


 This looks reasonable but actually it is not clear why do_config_file()
should return at all if read fails as the read error would go unnoticed
in the current code and allow the build to continue. so this probably
should be an exit(2) here and not a return which would then take care
of the free() anyway.


I agree.

This should be exit(2).

You can also remove close() as well since
the operation system will close it anyway.




 Atleast I do not see the rational to allow continuation if the read
failed as the file should not be empty nor a mismatch with st.st_size
expected. If it were due to a EINTR then it still should terminate as
EINTR was not handled and we thus could miss a valid dependency.

 Note: this probably also should be applied to the if (!map) condition
before as well, as at that point it is known that map > 0 and a malloc()
failure would allow skipping parse_config_file() for a valid config.


I agree.  We should change this too.



I created a new patch with your requested changes, but I did not consider 
it a second version of this patch by the time of writing. You can find the 
new patch at:


v1: https://patchwork.kernel.org/patch/10126547/
v2: https://patchwork.kernel.org/patch/10128297/

This patch here is now superseded by the new patch. So, this patch here 
can be abandoned.


Lukas

Re: [PATCH v2] x86/kexec: Exclude GART aperture from vmcore

2017-12-27 Thread Baoquan He

On 12/27/17 at 01:25pm, Borislav Petkov wrote:
> On Wed, Dec 27, 2017 at 03:44:49PM +0800, Baoquan He wrote:
> > > yes, instead of crashing the machine (because GART may be initialized in 
> > > the
> > > 2nd kernel, overlapping the 1st kernel memory, which the 2nd kernel with 
> > > its
> > > fake e820 map sees as unused).
> > > 
> > > I'd say this is an improvement.
> > 
> > I don't get what you said. If 'iommu=off' only specified in 1st kernel,
> > kdump kernel will think the memory which GART bar pointed as a hole.
> > This is incorrect. I don't see the improvement.
> 
> So he says, this memory is unused. Why is that incorrect?!?

'iommu=off' specified in 1st kernel, that region will be normal memory,
there could be important kernel data written into the place. While kdump
kernel will take that region as a gart aperture, trying to read data
from that region will cause error which Jiri originally tried to fix.

[PATCH 2/5] kasan: don't use __builtin_return_address(1)

2017-12-27 Thread Dmitry Vyukov

__builtin_return_address(1) is unreliable without frame pointers.
With defconfig on kmalloc_pagealloc_invalid_free test I am getting:

BUG: KASAN: double-free or invalid-free in   (null)

Pass caller PC from callers explicitly.

Signed-off-by: Dmitry Vyukov 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: kasan-...@googlegroups.com
---
 include/linux/kasan.h | 9 +
 mm/kasan/kasan.c  | 8 
 mm/kasan/kasan.h  | 2 +-
 mm/kasan/report.c | 4 ++--
 mm/slab.c | 6 +++---
 mm/slub.c | 8 
 6 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index fc9e642533f8..f0d13c30acc6 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -56,14 +56,14 @@ void kasan_poison_object_data(struct kmem_cache *cache, 
void *object);
 void kasan_init_slab_obj(struct kmem_cache *cache, const void *object);
 
 void kasan_kmalloc_large(const void *ptr, size_t size, gfp_t flags);
-void kasan_kfree_large(void *ptr);
+void kasan_kfree_large(void *ptr, unsigned long ip);
 void kasan_poison_kfree(void *ptr);
 void kasan_kmalloc(struct kmem_cache *s, const void *object, size_t size,
  gfp_t flags);
 void kasan_krealloc(const void *object, size_t new_size, gfp_t flags);
 
 void kasan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags);
-bool kasan_slab_free(struct kmem_cache *s, void *object);
+bool kasan_slab_free(struct kmem_cache *s, void *object, unsigned long ip);
 
 struct kasan_cache {
int alloc_meta_offset;
@@ -108,7 +108,7 @@ static inline void kasan_init_slab_obj(struct kmem_cache 
*cache,
const void *object) {}
 
 static inline void kasan_kmalloc_large(void *ptr, size_t size, gfp_t flags) {}
-static inline void kasan_kfree_large(void *ptr) {}
+static inline void kasan_kfree_large(void *ptr, unsigned long ip) {}
 static inline void kasan_poison_kfree(void *ptr) {}
 static inline void kasan_kmalloc(struct kmem_cache *s, const void *object,
size_t size, gfp_t flags) {}
@@ -117,7 +117,8 @@ static inline void kasan_krealloc(const void *object, 
size_t new_size,
 
 static inline void kasan_slab_alloc(struct kmem_cache *s, void *object,
   gfp_t flags) {}
-static inline bool kasan_slab_free(struct kmem_cache *s, void *object)
+static inline bool kasan_slab_free(struct kmem_cache *s, void *object,
+  unsigned long ip)
 {
return false;
 }
diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index ecb64fda79e6..32f555ded938 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -501,7 +501,7 @@ static void kasan_poison_slab_free(struct kmem_cache 
*cache, void *object)
kasan_poison_shadow(object, rounded_up_size, KASAN_KMALLOC_FREE);
 }
 
-bool kasan_slab_free(struct kmem_cache *cache, void *object)
+bool kasan_slab_free(struct kmem_cache *cache, void *object, unsigned long ip)
 {
s8 shadow_byte;
 
@@ -511,7 +511,7 @@ bool kasan_slab_free(struct kmem_cache *cache, void *object)
 
shadow_byte = READ_ONCE(*(s8 *)kasan_mem_to_shadow(object));
if (shadow_byte < 0 || shadow_byte >= KASAN_SHADOW_SCALE_SIZE) {
-   kasan_report_invalid_free(object, __builtin_return_address(1));
+   kasan_report_invalid_free(object, ip);
return true;
}
 
@@ -601,10 +601,10 @@ void kasan_poison_kfree(void *ptr)
kasan_poison_slab_free(page->slab_cache, ptr);
 }
 
-void kasan_kfree_large(void *ptr)
+void kasan_kfree_large(void *ptr, unsigned long ip)
 {
if (ptr != page_address(virt_to_head_page(ptr)))
-   kasan_report_invalid_free(ptr, __builtin_return_address(1));
+   kasan_report_invalid_free(ptr, ip);
/* The object will be poisoned by page_alloc. */
 }
 
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index 57f517d1dfce..2792de927fcd 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -107,7 +107,7 @@ static inline const void *kasan_shadow_to_mem(const void 
*shadow_addr)
 
 void kasan_report(unsigned long addr, size_t size,
bool is_write, unsigned long ip);
-void kasan_report_invalid_free(void *object, void *ip);
+void kasan_report_invalid_free(void *object, unsigned long ip);
 
 #if defined(CONFIG_SLAB) || defined(CONFIG_SLUB)
 void quarantine_put(struct kasan_free_meta *info, struct kmem_cache *cache);
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 55916ad21722..75206991ece0 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -326,12 +326,12 @@ static void print_shadow_for_address(const void *addr)
}
 }
 
-void kasan_report_invalid_free(void *object, void *ip)
+void kasan_report_invalid_free(void *object, unsigned long ip)
 {
unsigned long flags;
 
kasan_start_report(&flags);
-   pr_err("BUG: KASAN: double-free or invalid-free in %pS\n", ip);
+   pr_err("BUG

[PATCH 1/5] kasan: detect invalid frees for large objects

2017-12-27 Thread Dmitry Vyukov

Detect frees of pointers into middle of large heap objects.

I dropped const from kasan_kfree_large() because it starts propagating
through a bunch of functions in kasan_report.c, slab/slub nearest_obj(),
all of their local variables, fixup_red_left(), etc.

Signed-off-by: Dmitry Vyukov 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: kasan-...@googlegroups.com
---
 include/linux/kasan.h |  4 ++--
 lib/test_kasan.c  | 33 +
 mm/kasan/kasan.c  | 12 +---
 mm/kasan/kasan.h  |  3 +--
 mm/kasan/report.c |  3 +--
 mm/slub.c |  4 ++--
 6 files changed, 44 insertions(+), 15 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index e3eb834c9a35..fc9e642533f8 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -56,7 +56,7 @@ void kasan_poison_object_data(struct kmem_cache *cache, void 
*object);
 void kasan_init_slab_obj(struct kmem_cache *cache, const void *object);
 
 void kasan_kmalloc_large(const void *ptr, size_t size, gfp_t flags);
-void kasan_kfree_large(const void *ptr);
+void kasan_kfree_large(void *ptr);
 void kasan_poison_kfree(void *ptr);
 void kasan_kmalloc(struct kmem_cache *s, const void *object, size_t size,
  gfp_t flags);
@@ -108,7 +108,7 @@ static inline void kasan_init_slab_obj(struct kmem_cache 
*cache,
const void *object) {}
 
 static inline void kasan_kmalloc_large(void *ptr, size_t size, gfp_t flags) {}
-static inline void kasan_kfree_large(const void *ptr) {}
+static inline void kasan_kfree_large(void *ptr) {}
 static inline void kasan_poison_kfree(void *ptr) {}
 static inline void kasan_kmalloc(struct kmem_cache *s, const void *object,
size_t size, gfp_t flags) {}
diff --git a/lib/test_kasan.c b/lib/test_kasan.c
index 2724f86c4cef..e9c5d765be66 100644
--- a/lib/test_kasan.c
+++ b/lib/test_kasan.c
@@ -94,6 +94,37 @@ static noinline void __init kmalloc_pagealloc_oob_right(void)
ptr[size] = 0;
kfree(ptr);
 }
+
+static noinline void __init kmalloc_pagealloc_uaf(void)
+{
+   char *ptr;
+   size_t size = KMALLOC_MAX_CACHE_SIZE + 10;
+
+   pr_info("kmalloc pagealloc allocation: use-after-free\n");
+   ptr = kmalloc(size, GFP_KERNEL);
+   if (!ptr) {
+   pr_err("Allocation failed\n");
+   return;
+   }
+
+   kfree(ptr);
+   ptr[0] = 0;
+}
+
+static noinline void __init kmalloc_pagealloc_invalid_free(void)
+{
+   char *ptr;
+   size_t size = KMALLOC_MAX_CACHE_SIZE + 10;
+
+   pr_info("kmalloc pagealloc allocation: invalid-free\n");
+   ptr = kmalloc(size, GFP_KERNEL);
+   if (!ptr) {
+   pr_err("Allocation failed\n");
+   return;
+   }
+
+   kfree(ptr + 1);
+}
 #endif
 
 static noinline void __init kmalloc_large_oob_right(void)
@@ -505,6 +536,8 @@ static int __init kmalloc_tests_init(void)
kmalloc_node_oob_right();
 #ifdef CONFIG_SLUB
kmalloc_pagealloc_oob_right();
+   kmalloc_pagealloc_uaf();
+   kmalloc_pagealloc_invalid_free();
 #endif
kmalloc_large_oob_right();
kmalloc_oob_krealloc_more();
diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index 8aaee42fcfab..ecb64fda79e6 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -511,8 +511,7 @@ bool kasan_slab_free(struct kmem_cache *cache, void *object)
 
shadow_byte = READ_ONCE(*(s8 *)kasan_mem_to_shadow(object));
if (shadow_byte < 0 || shadow_byte >= KASAN_SHADOW_SCALE_SIZE) {
-   kasan_report_double_free(cache, object,
-   __builtin_return_address(1));
+   kasan_report_invalid_free(object, __builtin_return_address(1));
return true;
}
 
@@ -602,12 +601,11 @@ void kasan_poison_kfree(void *ptr)
kasan_poison_slab_free(page->slab_cache, ptr);
 }
 
-void kasan_kfree_large(const void *ptr)
+void kasan_kfree_large(void *ptr)
 {
-   struct page *page = virt_to_page(ptr);
-
-   kasan_poison_shadow(ptr, PAGE_SIZE << compound_order(page),
-   KASAN_FREE_PAGE);
+   if (ptr != page_address(virt_to_head_page(ptr)))
+   kasan_report_invalid_free(ptr, __builtin_return_address(1));
+   /* The object will be poisoned by page_alloc. */
 }
 
 int kasan_module_alloc(void *addr, size_t size)
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index 7c0bcd1f4c0d..57f517d1dfce 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -107,8 +107,7 @@ static inline const void *kasan_shadow_to_mem(const void 
*shadow_addr)
 
 void kasan_report(unsigned long addr, size_t size,
bool is_write, unsigned long ip);
-void kasan_report_double_free(struct kmem_cache *cache, void *object,
-   void *ip);
+void kasan_report_invalid_free(void *object, void *ip);
 
 #if defined(CONFIG_SLAB) || defined(CONFIG_SLUB)
 void quarantine_put

Re: [PATCH 1/3] staging: irda: fix type from "unsigned" to "unsigned int"

2017-12-27 Thread Greg KH

On Tue, Dec 26, 2017 at 09:52:54PM -0800, JI-HUN KIM wrote:
> Clean up checkpatch warning:
> WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
> 
> Signed-off-by: JI-HUN KIM 
> ---
>  drivers/staging/irda/drivers/esi-sir.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Please read drivers/staging/irda/TODO

sorry.

greg k-h

[PATCH 0/5] kasan: detect invalid frees

2017-12-27 Thread Dmitry Vyukov

KASAN detects double-frees, but does not detect invalid-frees
(when a pointer into a middle of heap object is passed to free).
We recently had a very unpleasant case in crypto code which freed
an inner object inside of a heap allocation. This left unnoticed
during free, but totally corrupted heap and later lead to a bunch
of random crashes all over kernel code.

Detect invalid frees.

Dmitry Vyukov (5):
  kasan: detect invalid frees for large objects
  kasan: don't use __builtin_return_address(1)
  kasan: detect invalid frees for large mempool objects
  kasan: unify code between kasan_slab_free() and kasan_poison_kfree()
  kasan: detect invalid frees

 include/linux/kasan.h | 13 
 lib/test_kasan.c  | 83 +++
 mm/kasan/kasan.c  | 57 +++
 mm/kasan/kasan.h  |  3 +-
 mm/kasan/report.c |  5 ++--
 mm/mempool.c  |  6 ++--
 mm/slab.c |  6 ++--
 mm/slub.c | 10 +++
 8 files changed, 135 insertions(+), 48 deletions(-)

-- 
2.15.1.620.gb9897f4670-goog

[PATCH 5/5] kasan: detect invalid frees

2017-12-27 Thread Dmitry Vyukov

Detect frees of pointers into middle of heap objects.

Signed-off-by: Dmitry Vyukov 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: kasan-...@googlegroups.com
---
 lib/test_kasan.c | 50 ++
 mm/kasan/kasan.c |  6 ++
 2 files changed, 56 insertions(+)

diff --git a/lib/test_kasan.c b/lib/test_kasan.c
index e9c5d765be66..a808d81b409d 100644
--- a/lib/test_kasan.c
+++ b/lib/test_kasan.c
@@ -523,6 +523,54 @@ static noinline void __init kasan_alloca_oob_right(void)
*(volatile char *)p;
 }
 
+static noinline void __init kmem_cache_double_free(void)
+{
+   char *p;
+   size_t size = 200;
+   struct kmem_cache *cache;
+
+   cache = kmem_cache_create("test_cache", size, 0, 0, NULL);
+   if (!cache) {
+   pr_err("Cache allocation failed\n");
+   return;
+   }
+   pr_info("double-free on heap object\n");
+   p = kmem_cache_alloc(cache, GFP_KERNEL);
+   if (!p) {
+   pr_err("Allocation failed\n");
+   kmem_cache_destroy(cache);
+   return;
+   }
+
+   kmem_cache_free(cache, p);
+   kmem_cache_free(cache, p);
+   kmem_cache_destroy(cache);
+}
+
+static noinline void __init kmem_cache_invalid_free(void)
+{
+   char *p;
+   size_t size = 200;
+   struct kmem_cache *cache;
+
+   cache = kmem_cache_create("test_cache", size, 0, SLAB_TYPESAFE_BY_RCU,
+ NULL);
+   if (!cache) {
+   pr_err("Cache allocation failed\n");
+   return;
+   }
+   pr_info("invalid-free of heap object\n");
+   p = kmem_cache_alloc(cache, GFP_KERNEL);
+   if (!p) {
+   pr_err("Allocation failed\n");
+   kmem_cache_destroy(cache);
+   return;
+   }
+
+   kmem_cache_free(cache, p + 1);
+   kmem_cache_destroy(cache);
+}
+
 static int __init kmalloc_tests_init(void)
 {
/*
@@ -560,6 +608,8 @@ static int __init kmalloc_tests_init(void)
ksize_unpoisons_memory();
copy_user_test();
use_after_scope_test();
+   kmem_cache_double_free();
+   kmem_cache_invalid_free();
 
kasan_restore_multi_shot(multishot);
 
diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index 578843fab5dc..3fb497d4fbf8 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -495,6 +495,12 @@ static bool __kasan_slab_free(struct kmem_cache *cache, 
void *object,
s8 shadow_byte;
unsigned long rounded_up_size;
 
+   if (unlikely(nearest_obj(cache, virt_to_head_page(object), object) !=
+   object)) {
+   kasan_report_invalid_free(object, ip);
+   return true;
+   }
+
/* RCU slabs could be legally used after free within the RCU period */
if (unlikely(cache->flags & SLAB_TYPESAFE_BY_RCU))
return false;
-- 
2.15.1.620.gb9897f4670-goog

[PATCH 4/5] kasan: unify code between kasan_slab_free() and kasan_poison_kfree()

2017-12-27 Thread Dmitry Vyukov

Both of these functions deal with freeing of slab objects.
However, kasan_poison_kfree() mishandles SLAB_TYPESAFE_BY_RCU
(must also not poison such objects) and does not detect double-frees.

Unify code between these functions.
This solves both of the problems and allows to add more common code
(e.g. detection of invalid frees).

Signed-off-by: Dmitry Vyukov 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: kasan-...@googlegroups.com
---
 mm/kasan/kasan.c | 28 
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index 77c103748728..578843fab5dc 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -489,21 +489,11 @@ void kasan_slab_alloc(struct kmem_cache *cache, void 
*object, gfp_t flags)
kasan_kmalloc(cache, object, cache->object_size, flags);
 }
 
-static void kasan_poison_slab_free(struct kmem_cache *cache, void *object)
-{
-   unsigned long size = cache->object_size;
-   unsigned long rounded_up_size = round_up(size, KASAN_SHADOW_SCALE_SIZE);
-
-   /* RCU slabs could be legally used after free within the RCU period */
-   if (unlikely(cache->flags & SLAB_TYPESAFE_BY_RCU))
-   return;
-
-   kasan_poison_shadow(object, rounded_up_size, KASAN_KMALLOC_FREE);
-}
-
-bool kasan_slab_free(struct kmem_cache *cache, void *object, unsigned long ip)
+static bool __kasan_slab_free(struct kmem_cache *cache, void *object,
+ unsigned long ip, bool quarantine)
 {
s8 shadow_byte;
+   unsigned long rounded_up_size;
 
/* RCU slabs could be legally used after free within the RCU period */
if (unlikely(cache->flags & SLAB_TYPESAFE_BY_RCU))
@@ -515,9 +505,10 @@ bool kasan_slab_free(struct kmem_cache *cache, void 
*object, unsigned long ip)
return true;
}
 
-   kasan_poison_slab_free(cache, object);
+   rounded_up_size = round_up(cache->object_size, KASAN_SHADOW_SCALE_SIZE);
+   kasan_poison_shadow(object, rounded_up_size, KASAN_KMALLOC_FREE);
 
-   if (unlikely(!(cache->flags & SLAB_KASAN)))
+   if (!quarantine || unlikely(!(cache->flags & SLAB_KASAN)))
return false;
 
set_track(&get_alloc_info(cache, object)->free_track, GFP_NOWAIT);
@@ -525,6 +516,11 @@ bool kasan_slab_free(struct kmem_cache *cache, void 
*object, unsigned long ip)
return true;
 }
 
+bool kasan_slab_free(struct kmem_cache *cache, void *object, unsigned long ip)
+{
+   return __kasan_slab_free(cache, object, ip, true);
+}
+
 void kasan_kmalloc(struct kmem_cache *cache, const void *object, size_t size,
   gfp_t flags)
 {
@@ -602,7 +598,7 @@ void kasan_poison_kfree(void *ptr, unsigned long ip)
kasan_poison_shadow(ptr, PAGE_SIZE << compound_order(page),
KASAN_FREE_PAGE);
} else {
-   kasan_poison_slab_free(page->slab_cache, ptr);
+   __kasan_slab_free(page->slab_cache, ptr, ip, false);
}
 }
 
-- 
2.15.1.620.gb9897f4670-goog

Re: [PATCH 2/2] dts: Probe efuse for CI20

2017-12-27 Thread Greg Kroah-Hartman

On Wed, Dec 27, 2017 at 01:27:02PM +0100, Mathieu Malaterre wrote:
> Signed-off-by: Mathieu Malaterre 

I know i can't take patches without any changelog text at all, and
really, you shouldn't ever create such a thing :)

thanks,

greg k-h

[PATCH 3/5] kasan: detect invalid frees for large mempool objects

2017-12-27 Thread Dmitry Vyukov

Detect frees of pointers into middle of mempool objects.

I did a one-off test, but it turned out to be very tricky,
so I reverted it. First, mempool does not call kasan_poison_kfree()
unless allocation function fails. I stubbed an allocation function
to fail on second and subsequent allocations. But then mempool stopped
to call kasan_poison_kfree() at all, because it does it only when
allocation function is mempool_kmalloc(). We could support this
special failing test allocation function in mempool, but it also
can't live with kasan tests, because these are in a module.

Signed-off-by: Dmitry Vyukov 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: kasan-...@googlegroups.com
---
 include/linux/kasan.h |  4 ++--
 mm/kasan/kasan.c  | 11 ---
 mm/mempool.c  |  6 +++---
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index f0d13c30acc6..fc45f8952d1e 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -57,7 +57,7 @@ void kasan_init_slab_obj(struct kmem_cache *cache, const void 
*object);
 
 void kasan_kmalloc_large(const void *ptr, size_t size, gfp_t flags);
 void kasan_kfree_large(void *ptr, unsigned long ip);
-void kasan_poison_kfree(void *ptr);
+void kasan_poison_kfree(void *ptr, unsigned long ip);
 void kasan_kmalloc(struct kmem_cache *s, const void *object, size_t size,
  gfp_t flags);
 void kasan_krealloc(const void *object, size_t new_size, gfp_t flags);
@@ -109,7 +109,7 @@ static inline void kasan_init_slab_obj(struct kmem_cache 
*cache,
 
 static inline void kasan_kmalloc_large(void *ptr, size_t size, gfp_t flags) {}
 static inline void kasan_kfree_large(void *ptr, unsigned long ip) {}
-static inline void kasan_poison_kfree(void *ptr) {}
+static inline void kasan_poison_kfree(void *ptr, unsigned long ip) {}
 static inline void kasan_kmalloc(struct kmem_cache *s, const void *object,
size_t size, gfp_t flags) {}
 static inline void kasan_krealloc(const void *object, size_t new_size,
diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index 32f555ded938..77c103748728 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -588,17 +588,22 @@ void kasan_krealloc(const void *object, size_t size, 
gfp_t flags)
kasan_kmalloc(page->slab_cache, object, size, flags);
 }
 
-void kasan_poison_kfree(void *ptr)
+void kasan_poison_kfree(void *ptr, unsigned long ip)
 {
struct page *page;
 
page = virt_to_head_page(ptr);
 
-   if (unlikely(!PageSlab(page)))
+   if (unlikely(!PageSlab(page))) {
+   if (ptr != page_address(page)) {
+   kasan_report_invalid_free(ptr, ip);
+   return;
+   }
kasan_poison_shadow(ptr, PAGE_SIZE << compound_order(page),
KASAN_FREE_PAGE);
-   else
+   } else {
kasan_poison_slab_free(page->slab_cache, ptr);
+   }
 }
 
 void kasan_kfree_large(void *ptr, unsigned long ip)
diff --git a/mm/mempool.c b/mm/mempool.c
index 7d8c5a0010a2..5c9dce34719b 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -103,10 +103,10 @@ static inline void poison_element(mempool_t *pool, void 
*element)
 }
 #endif /* CONFIG_DEBUG_SLAB || CONFIG_SLUB_DEBUG_ON */
 
-static void kasan_poison_element(mempool_t *pool, void *element)
+static __always_inline void kasan_poison_element(mempool_t *pool, void 
*element)
 {
if (pool->alloc == mempool_alloc_slab || pool->alloc == mempool_kmalloc)
-   kasan_poison_kfree(element);
+   kasan_poison_kfree(element, _RET_IP_);
if (pool->alloc == mempool_alloc_pages)
kasan_free_pages(element, (unsigned long)pool->pool_data);
 }
@@ -119,7 +119,7 @@ static void kasan_unpoison_element(mempool_t *pool, void 
*element, gfp_t flags)
kasan_alloc_pages(element, (unsigned long)pool->pool_data);
 }
 
-static void add_element(mempool_t *pool, void *element)
+static __always_inline void add_element(mempool_t *pool, void *element)
 {
BUG_ON(pool->curr_nr >= pool->min_nr);
poison_element(pool, element);
-- 
2.15.1.620.gb9897f4670-goog

Re: [RFC PATCH] memory-hotplug: add sysfs immovable_mem attribute

2017-12-27 Thread Greg KH

On Wed, Dec 27, 2017 at 08:30:12PM +0800, Chao Fan wrote:
> In sometimes users specify the memory region in immovable node in
> some kernel commandline, such as "kernel_core" or the "immovable_mem="
> in the patchset that I have send. But users don't know the memory
> region. So add this interface to print it.
> 
> It will show like this: "nn@ss,nn@ss,...". "nn" means the size of memory
> region, "ss" means the start position of this region.
> 
> Signed-off-by: Chao Fan 
> ---
>  drivers/base/memory.c | 50 ++
>  1 file changed, 50 insertions(+)

Why did you not also create the needed Documentation/ABI/ file update?

That's required for sysfs attributes.

> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 1d60b58a8c19..9cadf1a9dccb 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -25,6 +25,7 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  static DEFINE_MUTEX(mem_sysfs_mutex);
>  
> @@ -389,6 +390,52 @@ static ssize_t show_phys_device(struct device *dev,
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTREMOVE
> +/*
> + * Immovable memory region
> + */
> +
> +static ssize_t
> +show_immovable_mem(struct device *dev, struct device_attribute *attr,
> +char *buf)
> +{
> + struct acpi_table_header *table_header = NULL;
> + struct acpi_srat_mem_affinity *ma;
> + struct acpi_subtable_header *th;
> + unsigned long long table_size;
> + unsigned long long table_end;
> + char pbuf[35], *p = buf;
> + int len;
> +
> + acpi_get_table(ACPI_SIG_SRAT, 0, &table_header);
> +
> + table_size = sizeof(struct acpi_table_srat);
> + table_end = (unsigned long)table_header + table_header->length;
> + th = (struct acpi_subtable_header *)((unsigned long)
> +   table_header + table_size);
> +
> + while (((unsigned long)th) +
> +sizeof(struct acpi_subtable_header) < table_end) {
> + if (th->type == 1) {
> + ma = (struct acpi_srat_mem_affinity *)th;
> + if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)
> + continue;
> + len = sprintf(pbuf, "%llx@%llx",
> +ma->length, ma->base_address);

sysfs is "one value per file", and if you ever have to care if you are
overrunning the length of the buffer, that's a huge hint you are doing
something wrong here.

sorry,

greg k-h

Re: BUG warnings in 4.14.9

2017-12-27 Thread Greg Kroah-Hartman

On Wed, Dec 27, 2017 at 04:25:00AM +, alexander.le...@verizon.com wrote:
> On Tue, Dec 26, 2017 at 10:54:37PM +0200, Ido Schimmel wrote:
> >On Tue, Dec 26, 2017 at 07:59:55PM +0100, Willy Tarreau wrote:
> >> Guys,
> >>
> >> Chris reported the bug below and confirmed that reverting commit
> >> 9704f81 (ipv6: grab rt->rt6i_ref before allocating pcpu rt) seems to
> >> have fixed the issue for him. This patch is a94b9367 in mainline.
> >>
> >> I personally have no opinion on the patch, just found it because it
> >> was the only one touching this area between 4.14.8 and 4.14.9 :-)
> >>
> >> Should this be reverted or maybe fixed differently ?
> >
> >Maybe I'm missing something, but how come this patch even made its way
> >into 4.14.y? It's part of a series to RCU-ify IPv6 FIB lookup that went
> >into 4.15.
> >
> >Anyway, the mentioned bug was already fixed by commit 951f788a80ff
> >("ipv6: fix a BUG in rt6_get_pcpu_route()") when the code was still in
> >net-next.
> 
> Uh, you're right. Greg, please just revert 9704f81. Thanks!

Now reverted, sorry about this.

greg k-h

Re: [PATCH 2/2] usb: quirks: Add reset-resume quirk for Dell DW1820 QCA Rome Bluetooth

2017-12-27 Thread Greg Kroah-Hartman

On Tue, Dec 26, 2017 at 10:01:46PM +0100, Marcel Holtmann wrote:
> Hi Greg,
> 
> > Commit ("fd865802c66bc451dc515ed89360f84376ce1a56 Bluetooth: btusb: fix
> > QCA Rome suspend/resume") enables reset_resume in btusb_probe(). This
> > makes the device resets during btusb_open(), firmware loading gets
> > interrupted as a result.
> > 
> > We still want to reset the device to solve the original issue, but we
> > should do it before btusb_open().
> > 
> > Hence, add reset-resume quirk in usb core intead of btusb.
> > 
> > Cc: sta...@vger.kernel.org
> > Cc: Leif Liddy 
> > Cc: Matthias Kaehlcke 
> > Cc: Brian Norris 
> > Cc: Daniel Drake 
> > Signed-off-by: Kai-Heng Feng 
> > 
> > ---
> > drivers/usb/core/quirks.c | 3 +++
> > 1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c
> > index a10b346b9777..96951104c45b 100644
> > --- a/drivers/usb/core/quirks.c
> > +++ b/drivers/usb/core/quirks.c
> > @@ -197,6 +197,9 @@ static const struct usb_device_id usb_quirk_list[] = {
> > { USB_DEVICE(0x0b05, 0x17e0), .driver_info =
> > USB_QUIRK_IGNORE_REMOTE_WAKEUP },
> > 
> > +   /* QCA Rome Bluetooth in Dell DW1820 wireless module */
> > +   { USB_DEVICE(0x0cf3, 0xe007), .driver_info = USB_QUIRK_RESET_RESUME },
> > +
> 
> can I get an ACK from you to take this patch through bluetooth-next tree? Or 
> are you planning to take it?

It's not in my queue at all, so I didn't even have the chance to take it
:)

Acked-by: Greg Kroah-Hartman

cancel_work_sync() can cause priority invertion

2017-12-27 Thread Nikita Yushchenko

Hi

For those who care about linux RT behavior:

while analyzing traces, just found priority inversion caused by RT task
calling cancel_work_sync(), while work item in question is executing in
non-RT kworker that was preempted for significant time.

WBR,
Nikita Yushchenko

Re: [PATCH 2/2 v4] scsi: ufs: introduce sysfs entries exposing UFS health info

2017-12-27 Thread Greg Kroah-Hartman

On Wed, Dec 27, 2017 at 09:00:10AM +, Avri Altman wrote:
> 
> 
> > -Original Message-
> > From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi-
> > ow...@vger.kernel.org] On Behalf Of Greg Kroah-Hartman
> > Sent: Thursday, December 21, 2017 10:00 AM
> > To: Jaegeuk Kim 
> > Cc: linux-kernel@vger.kernel.org; linux-s...@vger.kernel.org; Jaegeuk Kim
> > 
> > Subject: Re: [PATCH 2/2 v4] scsi: ufs: introduce sysfs entries exposing UFS
> > health info
> > 
> > On Wed, Dec 20, 2017 at 02:13:25PM -0800, Jaegeuk Kim wrote:
> > > This patch adds a new sysfs group, namely health, via:
> > >
> > >/sys/devices/soc/X.ufshc/health/
> As device health is just one piece of information out of the device 
> management,
> I think that you should address this in a more comprehensive way,
> And set hooks for much more device info:
> Allow access to device descriptors, attributes and flags.

Add on patches are easy to create for this if people really want and
need it :)

> The attributes and flags should be placed in separate subfolders

Why?  What is that going to help with?

> The LUN specific descriptors and attributes should be placed in a luns
> subfolder, and then per descriptor / attribute type

Again, why?

> You might also would like to consider differentiating read and write -
> to control those type of accesses as well.

What do you mean by this exactly?

As it is, this is a step forward in getting attributes that people are
asking for and already using, into the kernel tree.  Please don't object
because not all attributes that are possible are being added here, it
should be trivial to add more as needed, right?

I'm really tired of seeing all of the various out-of-tree forks of this
driver, it's about time that someone works to get those features merged,
right?

thanks,

greg k-h

1 2 3 4 5 6 >

1 - 100 of 503 matches

Mail list logo