date:20181211

Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]

2018-12-11 Thread David Gibson

On Tue, Dec 11, 2018 at 08:01:43AM -0600, Bjorn Helgaas wrote:
> Hi David,
> 
> I see you're still working on this, but if you do end up going this
> direction eventually, would you mind splitting this into two patches:
> 1) rename the quirk to make it more generic (but not changing any
> behavior), and 2) add the ConnectX devices to the quirk.  That way
> the ConnectX change is smaller and more easily
> understood/reverted/etc.

Sure.  Would it make sense to send (1) as an independent cleanup,
while I'm still working out exactly what (if anything) we need for
(2)?

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v4 1/2] x86/fpu: track AVX-512 usage of tasks

2018-12-11 Thread Dave Hansen

On 12/11/18 4:34 PM, Li, Aubrey wrote:
>> Is there a reason we shouldn't do:
>>
>>  if (!cpu_feature_enabled(X86_FEATURE_AVX512F))
>>  update_avx512_state(fpu);
>>
>> ?
>>
> Why _!_ ?

Sorry, got it backwards.  I think I was considering having you do a

if (!cpu_feature_enabled(X86_FEATURE_AVX512F))
return;

inside of update_avx512_state(), but I got the state mixed up in my head.

You don't need the '!'.

Re: [PATCH 0/2] Graph fixes for using multiple endpoints per port

2018-12-11 Thread Tony Lindgren

* Kuninori Morimoto  [181212 00:12]:
> 
> Hi Tony, again
> 
> > > The issue I have with that it does not then follow the binding doc :)
> > > 
> > > See this part in Documentation/devicetree/bindings/graph.txt:
> > > 
> > >  "If a single port is connected to more than one remote device, an
> > >  'endpoint' child node must be provided for each link."
> 
> My understanding is that 1 "port" is for 1 "physical interface".

Yes I agree.

> In sound case, it is 1 "DAI".
> And, 1 "endpoint" is for 1 "connection".

Yes. So I have 1 physical port (mcbsp) TDM split between
two codecs (cpcap and mdm).

>   "If a single port is connected to more than one remote device, an
>   'endpoint' child node must be provided for each link."
> 
> This meanns, "If 1 DAI (*1) is connected to multiple remote DAIs(*2),
> this connection is indicated by multiple "endpoint"" or something like that.
> 
> (*2)
> DAIA-endpoint---endpoint--\
> DAIB-endpoint---endpoint-DAI (*1)
> DAIC-endpoint---endpoint--/

Yeah. So the only thing missing is parsing multiple endpoints
at the DAI end :) And that's why the two patches I posted.

> > > Isn't the I2C TDM case the same as "single port connecected to
> > > more than one remote device" rather than multiple ports?
> 
> I re-checked https://lkml.org/lkml/2018/3/28/405.
> Are MDM6600 / OMAP4 CPU portion, and are CPCAP / WL1285 Codec portion ?

MDM6600 is a Qualcomm modem running Motorola custom firmware
CPCAP is a Motorola custom PMIC for omap4430 SoC
WL128 is a WLAN and Bluetooth chip

So following your drawing above:

There are two separate McBSP instances, here's the one
in question:

MDM6600 modem-endpoint--\
CPCAP PMIC codec2-endpointMcBSP3 on SoC
WL1285 Bluetooth--endpoint--/

The other McBSP instance is dedicated for SoC audio:

CPCAP PMIC codec1-endpoint---McBSP3 on SoC

> Then, it is not yet supported (on ALSA SoC level?).

Only the cpcap codec is in the mainline currently, the mdm
codec driver has not yet been posted as it depends on some
ts 27.010 serdev patches.

> If my memory was correct, Lars-Peter had some idea for Mux,
> But, not yet implemented I think.

Hmm well I don't think much else is needed currently, we
already have everything needed at the ASoC level. See yet
another WIP patch configuring the TDM for mdm codec voice
call by the existing cpcap codec driver below just by
implementing .set_tdm_slot function.

> audio-graph[-scu] / simple-card[-scu] are supporting DPCM,
> but it is for multiple CPU - single Codec.

Well with my patches I certainly have the above configuration
working just fine with two audio-graph-card instances connected
to a single physical McBSP port.

Regards,

Tony

8< 
diff --git a/sound/soc/codecs/cpcap.c b/sound/soc/codecs/cpcap.c
--- a/sound/soc/codecs/cpcap.c
+++ b/sound/soc/codecs/cpcap.c
@@ -16,6 +16,14 @@
 #include 
 #include 
 
+/* Register 512 CPCAP_REG_VAUDIOC --- Audio Regulator and Bias Voltage */
+#define CPCAP_BIT_AUDIO_LOW_PWR   6
+#define CPCAP_BIT_AUD_LOWPWR_SPEED5
+#define CPCAP_BIT_VAUDIOPRISTBY   4
+#define CPCAP_BIT_VAUDIO_MODE12
+#define CPCAP_BIT_VAUDIO_MODE01
+#define CPCAP_BIT_V_AUDIO_EN  0
+
 /* Register 513 CPCAP_REG_CC --- CODEC */
 #define CPCAP_BIT_CDC_CLK215
 #define CPCAP_BIT_CDC_CLK114
@@ -251,6 +259,8 @@ struct cpcap_audio {
int codec_clk_id;
int codec_freq;
int codec_format;
+
+   unsigned int voice_call:1;
 };
 
 static int cpcap_st_workaround(struct snd_soc_dapm_widget *w,
@@ -1370,6 +1380,114 @@ static int cpcap_voice_set_dai_fmt(struct snd_soc_dai 
*codec_dai,
return 0;
 }
 
+/*
+ * Configure voice call if cpcap->voice_call is set.
+ *
+ * We can configure most with snd_soc_dai_set_sysclk(), snd_soc_dai_set_fmt()
+ * and snd_soc_dai_set_tdm_slot(). This function configures the rest of the
+ * cpcap related hardware piceses as CPU is not involved in the voice call.
+ */
+static int cpcap_voice_call(struct snd_soc_dai *dai)
+{
+   struct snd_soc_component *component = dai->component;
+   struct cpcap_audio *cpcap = snd_soc_component_get_drvdata(component);
+   int mask, err;
+
+   /* Maybe enable modem to codec VAUDIO_MODE1? */
+   mask = BIT(CPCAP_BIT_VAUDIO_MODE1);
+   err = regmap_update_bits(cpcap->regmap, CPCAP_REG_VAUDIOC,
+mask, cpcap->voice_call ? mask : 0);
+   if (err)
+   return err;
+
+   /* Maybe clear MIC1_MUX? */
+   mask = BIT(CPCAP_BIT_MIC1_MUX);
+   err = regmap_update_bits(cpcap->regmap, CPCAP_REG_TXI,
+mask, cpcap->voice_call ? 0 : mask);
+   if (err)
+   return err;
+
+   /* Maybe set MIC2_MUX? */
+   mask = BIT(CPCAP_BIT_MB_ON1L) | BIT(CPCAP_BIT_MB_ON1R) |
+   BIT(CPCAP_BIT_MIC2_MUX) | BIT(CPCAP_BIT_MIC2_PGA_EN);
+   err = regmap_update_bits(cpcap->regmap, CPCAP_REG_TXI,
+

Re: [PATCH] selftests/seccomp: Remove SIGSTOP si_pid check

2018-12-11 Thread Kees Cook

On Thu, Dec 6, 2018 at 3:50 PM Kees Cook  wrote:
>
> Commit f149b3155744 ("signal: Never allocate siginfo for SIGKILL or SIGSTOP")
> means that the seccomp selftest cannot check si_pid under SIGSTOP anymore.
> Since it's believed[1] there are no other userspace things depending on the
> old behavior, this removes the behavioral check in the selftest, since it's
> more a "extra" sanity check (which turns out, maybe, not to have been
> useful to test).
>
> [1] 
> https://lkml.kernel.org/r/cagxu5jjazaozp1qfz66tyrtbuywqb+un2soa1vlhpccoiyv...@mail.gmail.com
>
> Reported-by: Tycho Andersen 
> Suggested-by: Eric W. Biederman 
> Signed-off-by: Kees Cook 
> ---
> Shuah, can you make sure that Linus gets this before v4.20 is released? 
> Thanks!

Ping. Shuah, can you get this to Linus (or should I send it directly?)

Thanks!

-Kees


> ---
>  tools/testing/selftests/seccomp/seccomp_bpf.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
> b/tools/testing/selftests/seccomp/seccomp_bpf.c
> index e1473234968d..c9a2abf8be1b 100644
> --- a/tools/testing/selftests/seccomp/seccomp_bpf.c
> +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
> @@ -2731,9 +2731,14 @@ TEST(syscall_restart)
> ASSERT_EQ(child_pid, waitpid(child_pid, &status, 0));
> ASSERT_EQ(true, WIFSTOPPED(status));
> ASSERT_EQ(SIGSTOP, WSTOPSIG(status));
> -   /* Verify signal delivery came from parent now. */
> ASSERT_EQ(0, ptrace(PTRACE_GETSIGINFO, child_pid, NULL, &info));
> -   EXPECT_EQ(getpid(), info.si_pid);
> +   /*
> +* There is no siginfo on SIGSTOP any more, so we can't verify
> +* signal delivery came from parent now (getpid() == info.si_pid).
> +* 
> https://lkml.kernel.org/r/cagxu5jjazaozp1qfz66tyrtbuywqb+un2soa1vlhpccoiyv...@mail.gmail.com
> +* At least verify the SIGSTOP via PTRACE_GETSIGINFO.
> +*/
> +   EXPECT_EQ(SIGSTOP, info.si_signo);
>
> /* Restart nanosleep with SIGCONT, which triggers restart_syscall. */
> ASSERT_EQ(0, kill(child_pid, SIGCONT));
> --
> 2.17.1
>
>
> --
> Kees Cook



-- 
Kees Cook

Re: [PATCH v10 3/4] seccomp: add a return code to trap to userspace

2018-12-11 Thread Kees Cook

On Sun, Dec 9, 2018 at 10:24 AM Tycho Andersen  wrote:
>
> This patch introduces a means for syscalls matched in seccomp to notify
> some other task that a particular filter has been triggered.
>
> The motivation for this is primarily for use with containers. For example,
> if a container does an init_module(), we obviously don't want to load this
> untrusted code, which may be compiled for the wrong version of the kernel
> anyway. Instead, we could parse the module image, figure out which module
> the container is trying to load and load it on the host.
>
> As another example, containers cannot mount() in general since various
> filesystems assume a trusted image. However, if an orchestrator knows that
> e.g. a particular block device has not been exposed to a container for
> writing, it want to allow the container to mount that block device (that
> is, handle the mount for it).
>
> This patch adds functionality that is already possible via at least two
> other means that I know about, both of which involve ptrace(): first, one
> could ptrace attach, and then iterate through syscalls via PTRACE_SYSCALL.
> Unfortunately this is slow, so a faster version would be to install a
> filter that does SECCOMP_RET_TRACE, which triggers a PTRACE_EVENT_SECCOMP.
> Since ptrace allows only one tracer, if the container runtime is that
> tracer, users inside the container (or outside) trying to debug it will not
> be able to use ptrace, which is annoying. It also means that older
> distributions based on Upstart cannot boot inside containers using ptrace,
> since upstart itself uses ptrace to monitor services while starting.
>
> The actual implementation of this is fairly small, although getting the
> synchronization right was/is slightly complex.
>
> Finally, it's worth noting that the classic seccomp TOCTOU of reading
> memory data from the task still applies here, but can be avoided with
> careful design of the userspace handler: if the userspace handler reads all
> of the task memory that is necessary before applying its security policy,
> the tracee's subsequent memory edits will not be read by the tracer.
>
> Signed-off-by: Tycho Andersen 
> CC: Kees Cook 
> CC: Andy Lutomirski 
> CC: Oleg Nesterov 
> CC: Eric W. Biederman 
> CC: "Serge E. Hallyn" 
> Acked-by: Serge Hallyn 
> CC: Christian Brauner 
> CC: Tyler Hicks 
> CC: Akihiro Suda 

This takes care of everything I mentioned (and has incorporated LOTS
of people's suggestions), so I think it's ready for -next. I've
applied this and am doing local testing now.

Thanks for keeping with this!

-Kees

> ---
> v2: * make id a u64; the idea here being that it will never overflow,
>   because 64 is huge (one syscall every nanosecond => wrap every 584
>   years) (Andy)
> * prevent nesting of user notifications: if someone is already attached
>   the tree in one place, nobody else can attach to the tree (Andy)
> * notify the listener of signals the tracee receives as well (Andy)
> * implement poll
> v3: * lockdep fix (Oleg)
> * drop unnecessary WARN()s (Christian)
> * rearrange error returns to be more rpetty (Christian)
> * fix build in !CONFIG_SECCOMP_USER_NOTIFICATION case
> v4: * fix implementation of poll to use poll_wait() (Jann)
> * change listener's fd flags to be 0 (Jann)
> * hoist filter initialization out of ifdefs to its own function
>   init_user_notification()
> * add some more testing around poll() and closing the listener while a
>   syscall is in action
> * s/GET_LISTENER/NEW_LISTENER, since you can't _get_ a listener, but it
>   creates a new one (Matthew)
> * correctly handle pid namespaces, add some testcases (Matthew)
> * use EINPROGRESS instead of EINVAL when a notification response is
>   written twice (Matthew)
> * fix comment typo from older version (SEND vs READ) (Matthew)
> * whitespace and logic simplification (Tobin)
> * add some Documentation/ bits on userspace trapping
> v5: * fix documentation typos (Jann)
> * add signalled field to struct seccomp_notif (Jann)
> * switch to using ioctls instead of read()/write() for struct passing
>   (Jann)
> * add an ioctl to ensure an id is still valid
> v6: * docs typo fixes, update docs for ioctl() change (Christian)
> v7: * switch struct seccomp_knotif's id member to a u64 (derp :)
> * use notify_lock in IS_ID_VALID query to avoid racing
> * s/signalled/signaled (Tyler)
> * fix docs to reflect that ids are not globally unique (Tyler)
> * add a test to check -ERESTARTSYS behavior (Tyler)
> * drop CONFIG_SECCOMP_USER_NOTIFICATION (Tyler)
> * reorder USER_NOTIF in seccomp return codes list (Tyler)
> * return size instead of sizeof(struct user_notif) (Tyler)
> * ENOENT instead of EINVAL when invalid id is passed (Tyler)
> * drop CONFIG_SECCOMP_USER_NOTIFICATION guards (Tyler)
> * s/IS_ID_VALID/ID_VALID and switch ioctl to be "well behaved" (Tyler)
> * add a n

Re: [PATCH v10 4/4] samples: add an example of seccomp user trap

2018-12-11 Thread Kees Cook

On Tue, Dec 11, 2018 at 2:24 PM Serge E. Hallyn  wrote:
>
> On Sun, Dec 09, 2018 at 11:24:14AM -0700, Tycho Andersen wrote:
> > The idea here is just to give a demonstration of how one could safely use
> > the SECCOMP_RET_USER_NOTIF feature to do mount policies. This particular
> > policy is (as noted in the comment) not very interesting, but it serves to
> > illustrate how one might apply a policy dodging the various TOCTOU issues.
> >
> > Signed-off-by: Tycho Andersen 
> > CC: Kees Cook 
> > CC: Andy Lutomirski 
> > CC: Oleg Nesterov 
> > CC: Eric W. Biederman 
> > CC: "Serge E. Hallyn" 
> > CC: Christian Brauner 
> > CC: Tyler Hicks 
> > CC: Akihiro Suda 
> > ---
> > v5: new in v5
> > v7: updates for v7 API changes
> > v8: * add some more comments about what's happening in main() (Kees)
> > * move from ptrace API to SECCOMP_FILTER_FLAG_NEW_LISTENER
> > v9: * s/mknod/mount in error message
> > * switch to the SECCOMP_GET_NOTIF_SIZES API
> > * add a note about getting ENOENT from SECCOMP_IOCTL_NOTIF_SEND
> > ---
> >  samples/seccomp/.gitignore  |   1 +
> >  samples/seccomp/Makefile|   7 +-
> >  samples/seccomp/user-trap.c | 375 
> >  3 files changed, 382 insertions(+), 1 deletion(-)
> >
> > diff --git a/samples/seccomp/.gitignore b/samples/seccomp/.gitignore
> > index 78fb78184291..d1e2e817d556 100644
> > --- a/samples/seccomp/.gitignore
> > +++ b/samples/seccomp/.gitignore
> > @@ -1,3 +1,4 @@
> >  bpf-direct
> >  bpf-fancy
> >  dropper
> > +user-trap
> > diff --git a/samples/seccomp/Makefile b/samples/seccomp/Makefile
> > index cf34ff6b4065..4920903c8009 100644
> > --- a/samples/seccomp/Makefile
> > +++ b/samples/seccomp/Makefile
> > @@ -1,6 +1,6 @@
> >  # SPDX-License-Identifier: GPL-2.0
> >  ifndef CROSS_COMPILE
> > -hostprogs-$(CONFIG_SAMPLE_SECCOMP) := bpf-fancy dropper bpf-direct
> > +hostprogs-$(CONFIG_SAMPLE_SECCOMP) := bpf-fancy dropper bpf-direct 
> > user-trap
> >
> >  HOSTCFLAGS_bpf-fancy.o += -I$(objtree)/usr/include
> >  HOSTCFLAGS_bpf-fancy.o += -idirafter $(objtree)/include
> > @@ -16,6 +16,10 @@ HOSTCFLAGS_bpf-direct.o += -I$(objtree)/usr/include
> >  HOSTCFLAGS_bpf-direct.o += -idirafter $(objtree)/include
> >  bpf-direct-objs := bpf-direct.o
> >
> > +HOSTCFLAGS_user-trap.o += -I$(objtree)/usr/include
> > +HOSTCFLAGS_user-trap.o += -idirafter $(objtree)/include
> > +user-trap-objs := user-trap.o
> > +
> >  # Try to match the kernel target.
> >  ifndef CONFIG_64BIT
> >
> > @@ -33,6 +37,7 @@ HOSTCFLAGS_bpf-fancy.o += $(MFLAG)
> >  HOSTLDLIBS_bpf-direct += $(MFLAG)
> >  HOSTLDLIBS_bpf-fancy += $(MFLAG)
> >  HOSTLDLIBS_dropper += $(MFLAG)
> > +HOSTLDLIBS_user-trap += $(MFLAG)
> >  endif
> >  always := $(hostprogs-m)
> >  endif
> > diff --git a/samples/seccomp/user-trap.c b/samples/seccomp/user-trap.c
> > new file mode 100644
> > index ..61267cb59c8e
> > --- /dev/null
> > +++ b/samples/seccomp/user-trap.c
> > @@ -0,0 +1,375 @@
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
> > +
> > +static int seccomp(unsigned int op, unsigned int flags, void *args)
> > +{
> > + errno = 0;
> > + return syscall(__NR_seccomp, op, flags, args);
> > +}
> > +
> > +static int send_fd(int sock, int fd)
> > +{
> > + struct msghdr msg = {};
> > + struct cmsghdr *cmsg;
> > + char buf[CMSG_SPACE(sizeof(int))] = {0}, c = 'c';
> > + struct iovec io = {
> > + .iov_base = &c,
> > + .iov_len = 1,
> > + };
> > +
> > + msg.msg_iov = &io;
> > + msg.msg_iovlen = 1;
> > + msg.msg_control = buf;
> > + msg.msg_controllen = sizeof(buf);
> > + cmsg = CMSG_FIRSTHDR(&msg);
> > + cmsg->cmsg_level = SOL_SOCKET;
> > + cmsg->cmsg_type = SCM_RIGHTS;
> > + cmsg->cmsg_len = CMSG_LEN(sizeof(int));
> > + *((int *)CMSG_DATA(cmsg)) = fd;
> > + msg.msg_controllen = cmsg->cmsg_len;
> > +
> > + if (sendmsg(sock, &msg, 0) < 0) {
> > + perror("sendmsg");
> > + return -1;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int recv_fd(int sock)
> > +{
> > + struct msghdr msg = {};
> > + struct cmsghdr *cmsg;
> > + char buf[CMSG_SPACE(sizeof(int))] = {0}, c = 'c';
> > + struct iovec io = {
> > + .iov_base = &c,
> > + .iov_len = 1,
> > + };
> > +
> > + msg.msg_iov = &io;
> > + msg.msg_iovlen = 1;
> > + msg.msg_control = buf;
> > + msg.msg_controllen = sizeof(buf);
> > +
> > + if (recvmsg(sock, &msg, 0) < 0) {
> > + perror("recvmsg");
> > + return -1;
> > + }
> > +
> > + cmsg = CMSG_FIRSTHDR(&msg);
> > +
> > + return *((int *)CM

Re: use generic DMA mapping code in powerpc V4

2018-12-11 Thread Benjamin Herrenschmidt

On Tue, 2018-12-11 at 19:17 +0100, Christian Zigotzky wrote:
> X5000 (P5020 board): U-Boot loads the kernel and the dtb file. Then the 
> kernel starts but it doesn't find any hard disks (partitions). That 
> means this is also the bad commit for the P5020 board.

What are the disks hanging off ? A PCIe device of some sort ?

Can you send good & bad dmesg logs ?

Ben.

Re: [PATCH] kfifo: add memory barrier in kfifo to prevent data loss

2018-12-11 Thread Kees Cook

On Mon, Dec 10, 2018 at 7:41 PM  wrote:
>
> From: Yulei Zhang 
>
> Early this year we spot there may be two issues in kernel
> kfifo.
>
> One is reported by Xiao Guangrong to linux kernel.
> https://lkml.org/lkml/2018/5/11/58
> In current kfifo implementation there are missing memory
> barrier in the read side, so that without proper barrier
> between reading the kfifo->in and fetching the data there
> is potential ordering issue.
>
> Beside that, there is another potential issue in kfifo,
> please consider the following case:
> at the beginning
> ring->size = 4
> ring->out = 0
> ring->in = 4
>
> ConsumerProducer
> ---  --
> index = ring->out; /* index == 0 */
> ring->out++; /* ring->out == 1 */
> < Re-Order >
>  out = ring->out;
>  if (ring->in - out >= ring->mask)
>  return -EFULL;
>  /* see the ring is not full */
>  index = ring->in & ring->mask;
>  /* index == 0 */
>  ring->data[index] = new_data;
>  ring->in++;
>
> data = ring->data[index];
> /* you will find the old data is overwritten by the new_data */
>
> In order to avoid the issue:
> 1) for the consumer, we should read the ring->data[] out before
> updating ring->out
> 2) for the producer, we should read ring->out before updating
> ring->data[]
>
> So in this patch we introduce the following four functions which
> are wrapped with proper memory barrier and keep in pairs to make
> sure the in and out index are fetched and updated in order to avoid
> data loss.
>
> kfifo_read_index_in()
> kfifo_write_index_in()
> kfifo_read_index_out()
> kfifo_write_index_out()
>
> Signed-off-by: Yulei Zhang 
> Signed-off-by: Guangrong Xiao 

I've added some more people to CC that might want to see this. Thanks
for sending this!

-Kees

> ---
>  include/linux/kfifo.h |  70 ++-
>  lib/kfifo.c   | 107 +++---
>  2 files changed, 136 insertions(+), 41 deletions(-)
>
> diff --git a/include/linux/kfifo.h b/include/linux/kfifo.h
> index 89fc8dc7bf38..3bd2a869ca7e 100644
> --- a/include/linux/kfifo.h
> +++ b/include/linux/kfifo.h
> @@ -286,6 +286,71 @@ __kfifo_uint_must_check_helper( \
>  }) \
>  )
>
> +/**
> + * kfifo_read_index_in - returns the in index of the fifo
> + * @fifo: address of the kfifo to be used
> + *
> + * add memory read barrier to make sure the fifo->in index
> + * is fetched first before write data to the fifo, which
> + * is paired with the write barrier in kfifo_write_index_in
> + */
> +#define kfifo_read_index_in(kfifo) \
> +({ \
> +   typeof((kfifo) + 1) __tmp = (kfifo); \
> +   struct __kfifo *__kfifo = __tmp; \
> +   unsigned int __val = READ_ONCE(__kfifo->in); \
> +   smp_rmb(); \
> +   __val; \
> +})
> +
> +/**
> + * kfifo_write_index_in - updates the in index of the fifo
> + * @fifo: address of the kfifo to be used
> + *
> + * add memory write barrier to make sure the data entry is
> + * updated before increase the fifo->in
> + */
> +#define kfifo_write_index_in(kfifo, val) \
> +({ \
> +   typeof((kfifo) + 1) __tmp = (kfifo); \
> +   struct __kfifo *__kfifo = __tmp; \
> +   unsigned int __val = (val); \
> +   smp_wmb(); \
> +   WRITE_ONCE(__kfifo->in, __val); \
> +})
> +
> +/**
> + * kfifo_read_index_out - returns the out index of the fifo
> + * @fifo: address of the kfifo to be used
> + *
> + * add memory barrier to make sure the fifo->out index is
> + * fetched before read data from the fifo, which is paired
> + * with the memory barrier in kfifo_write_index_out
> + */
> +#define kfifo_read_index_out(kfifo) \
> +({ \
> +   typeof((kfifo) + 1) __tmp = (kfifo); \
> +   struct __kfifo *__kfifo = __tmp; \
> +   unsigned int __val = smp_load_acquire(&__kfifo->out); \
> +   __val; \
> +})
> +
> +/**
> + * kfifo_write_index_out - updates the out index of the fifo
> + * @fifo: address of the kfifo to be used
> + *
> + * add memory barrier to make sure reading out the entry before
> + * update the fifo->out index to avoid overwitten the entry by
> + * the producer
> + */
> +#define kfifo_write_index_out(kfifo, val) \
> +({ \
> +   typeof((kfifo) + 1) __tmp = (kfifo); \
> +   struct __kfifo *__kfifo = __tmp; \
> +   unsigned int __val = (val); \
> +   smp_store_release(&__kfifo->out, __val); \
> +})
> +
>  /**
>   * kfifo_skip - skip output data
>   * @fifo: address of the fifo to be used
> @@ -298,7 +363,7 @@ __kfifo_uint_must_check_helper( \
> if (__recsize) \
> __kfifo_skip_r(__kfifo, __recsize); \
> else \
> -   __kfifo->out++; \
> +   kfifo_write_index_out(__kfifo, __kfifo->out++); \
>  })
>
>  /**
> @@ -740,7 +805,8 @@ __kfifo_uint_must_check_hel

Re: [PATCH 0/2] Graph fixes for using multiple endpoints per port

2018-12-11 Thread Tony Lindgren

* Tony Lindgren  [181212 00:43]:
> The other McBSP instance is dedicated for SoC audio:
> 
> CPCAP PMIC codec1-endpoint---McBSP3 on SoC

Sorry this should be McBSP2, not McBSP3 above for
the SoC dedicated audio port.

Tony

Re: [PATCHv3] panic: avoid deadlocks in re-entrant console drivers

2018-12-11 Thread Daniel Wang

Is it okay to tag this commit with `Cc: sta...@vger.kernel.org` so
that it'll get applied to the stable trees once merged into Linux's
tree, if it's not too late? Otherwise I'll follow up on the stable
merges separately. Thanks for making the changes anyway.

On Thu, Nov 22, 2018 at 5:12 AM Petr Mladek  wrote:
>
> On Thu 2018-11-01 09:08:08, Petr Mladek wrote:
> > On Thu 2018-11-01 10:48:21, Sergey Senozhatsky wrote:
> > > On (10/31/18 13:27), Petr Mladek wrote:
> > > > >
> > > > > Signed-off-by: Sergey Senozhatsky 
> > > >
> > > > The patch makes sense to me. The locks should stay busted also for
> > > > console_flush_on_panic().
> > > >
> > > > With the added #include :
> > > >
> > > > Reviewed-by: Petr Mladek 
> > >
> > > Thanks!
> > >
> > > Since there are no objections - how shall we route it? Via printk tree?
> >
> > Good question. OK, I am going to put it into printk.git unless I hear
> > complains withing next couple of days.
>
> I have pushed this into printk.git, branch for-4.21.
>
> Best Regards,
> Petr



-- 
Best,
Daniel


smime.p7s
Description: S/MIME Cryptographic Signature

Re: [PATCH] selftests/seccomp: Remove SIGSTOP si_pid check

2018-12-11 Thread shuah


On 12/11/18 5:43 PM, Kees Cook wrote:

On Thu, Dec 6, 2018 at 3:50 PM Kees Cook  wrote:


Commit f149b3155744 ("signal: Never allocate siginfo for SIGKILL or SIGSTOP")
means that the seccomp selftest cannot check si_pid under SIGSTOP anymore.
Since it's believed[1] there are no other userspace things depending on the
old behavior, this removes the behavioral check in the selftest, since it's
more a "extra" sanity check (which turns out, maybe, not to have been
useful to test).

[1] 
https://lkml.kernel.org/r/cagxu5jjazaozp1qfz66tyrtbuywqb+un2soa1vlhpccoiyv...@mail.gmail.com

Reported-by: Tycho Andersen 
Suggested-by: Eric W. Biederman 
Signed-off-by: Kees Cook 
---
Shuah, can you make sure that Linus gets this before v4.20 is released? Thanks!


Ping. Shuah, can you get this to Linus (or should I send it directly?)



I will send this. Thanks for the ping.

-- Shuah

[PATCH 00/11] staging: iio: adt7316: dac fixes

2018-12-11 Thread Jeremy Fertic

Here are some dac related fixes for adt7316. I'm testing with an adt7516
over i2c to an orange pi pc. I've attempted to test any functionality that
these patches are touching.

Jeremy Fertic (11):
  staging: iio: adt7316: fix register and bit definitions
  staging: iio: adt7316: invert the logic of the check for an ldac pin
  staging: iio: adt7316: fix dac_bits assignment
  staging: iio: adt7316: fix handling of dac high resolution option
  staging: iio: adt7316: fix the dac read calculation
  staging: iio: adt7316: fix the dac write calculation
  staging: iio: adt7316: use correct variable in DAC_internal_Vref read
  staging: iio: adt7316: allow adt751x to use internal vref for all dacs
  staging: iio: adt7316: remove dac vref buffer bypass from adt751x
  staging: iio: adt7316: change interpretation of write to dac update mode
  staging: iio: adt7316: correct spelling of ADT7316_DA_EN_VIA_DAC_LDCA

 drivers/staging/iio/addac/adt7316.c | 89 ++---
 1 file changed, 43 insertions(+), 46 deletions(-)

-- 
2.19.1

[PATCH 01/11] staging: iio: adt7316: fix register and bit definitions

2018-12-11 Thread Jeremy Fertic

Change two register addresses and one bit definition to match the
datasheet.

Signed-off-by: Jeremy Fertic 
---
 drivers/staging/iio/addac/adt7316.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/iio/addac/adt7316.c 
b/drivers/staging/iio/addac/adt7316.c
index dc93e85808e0..1fa4a4c2b4f3 100644
--- a/drivers/staging/iio/addac/adt7316.c
+++ b/drivers/staging/iio/addac/adt7316.c
@@ -59,8 +59,8 @@
 #define ADT7316_CONFIG10x18
 #define ADT7316_CONFIG20x19
 #define ADT7316_CONFIG30x1A
-#define ADT7316_LDAC_CONFIG0x1B
-#define ADT7316_DAC_CONFIG 0x1C
+#define ADT7316_DAC_CONFIG 0x1B
+#define ADT7316_LDAC_CONFIG0x1C
 #define ADT7316_INT_MASK1  0x1D
 #define ADT7316_INT_MASK2  0x1E
 #define ADT7316_IN_TEMP_OFFSET 0x1F
@@ -117,7 +117,7 @@
  */
 #define ADT7316_ADCLK_22_5 0x1
 #define ADT7316_DA_HIGH_RESOLUTION 0x2
-#define ADT7316_DA_EN_VIA_DAC_LDCA 0x4
+#define ADT7316_DA_EN_VIA_DAC_LDCA 0x8
 #define ADT7516_AIN_IN_VREF0x10
 #define ADT7316_EN_IN_TEMP_PROP_DACA   0x20
 #define ADT7316_EN_EX_TEMP_PROP_DACB   0x40
-- 
2.19.1

Re: [PATCH v2] MAINTAINERS: add maintainers for ChromeOS EC sub-drivers

2018-12-11 Thread Chanwoo Choi

Hi Enric,

On 2018년 12월 12일 04:09, Enric Balletbo i Serra wrote:
> There are multiple ChromeOS EC sub-drivers spread in different
> subsystems, as all of them are related to the Chrome stuff add
> Benson and myself as a maintainers for all these sub-drivers.
> 
> Signed-off-by: Enric Balletbo i Serra 
> ---
> 
> Changes in v2:
> - Fix typo in Benson email address.
> 
>  MAINTAINERS | 24 
>  1 file changed, 24 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a24129b0b043..bbe7180e2851 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3625,6 +3625,30 @@ S: Maintained
>  T:   git 
> git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform.git
>  F:   drivers/platform/chrome/
>  
> +CHROMEOS EC SUBDRIVERS
> +M:   Benson Leung 
> +M:   Enric Balletbo i Serra 
> +S:   Maintained
> +F:   Documentation/devicetree/bindings/extcon/extcon-usbc-cros-ec.txt
> +F:   Documentation/devicetree/bindings/input/cros-ec-keyb.txt
> +F:   Documentation/devicetree/bindings/pwm/google,cros-ec-pwm.txt
> +F:   Documentation/devicetree/bindings/i2c/i2c-cros-ec-tunnel.txt
> +F:   Documentation/devicetree/bindings/mfd/cros-ec.txt
> +F:   Documentation/ABI/testing/sysfs-bus-iio-cros-ec
> +F:   drivers/extcon/extcon-usbc-cros-ec.c
> +F:   drivers/i2c/busses/i2c-cros-ec-tunnel.c
> +F:   drivers/iio/accel/cros_ec*
> +F:   drivers/iio/common/cros_ec_sensors/
> +F:   drivers/iio/light/cros_ec*
> +F:   drivers/iio/pressure/cros_ec*
> +F:   drivers/input/keyboard/cros_ec*
> +F:   drivers/mfd/cros_ec*
> +F:   drivers/power/supply/cros_usbpd-charger.c
> +F:   drivers/pwm/pwm-cros-ec.c
> +F:   drivers/rtc/rtc-cros-ec.c
> +F:   include/linux/iio/common/cros_ec_sensors_core.h
> +F:   include/linux/mfd/cros_ec*
> +
>  CIRRUS LOGIC AUDIO CODEC DRIVERS
>  M:   Brian Austin 
>  M:   Paul Handrigan 
> 

For extcon part,
Acked-by: Chanwoo Choi 

-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

[PATCH 02/11] staging: iio: adt7316: invert the logic of the check for an ldac pin

2018-12-11 Thread Jeremy Fertic

ADT7316_DA_EN_VIA_DAC_LDCA is set when the dac and ldac registers are being
used to update the dacs instead of the ldac pin. ADT7516_SEL_AIN3 is an adc
input that shares the ldac pin. Only set these bits if an ldac pin is not
being used.

Signed-off-by: Jeremy Fertic 
---
 drivers/staging/iio/addac/adt7316.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/iio/addac/adt7316.c 
b/drivers/staging/iio/addac/adt7316.c
index 1fa4a4c2b4f3..e5e1f9d6357f 100644
--- a/drivers/staging/iio/addac/adt7316.c
+++ b/drivers/staging/iio/addac/adt7316.c
@@ -2130,7 +2130,7 @@ int adt7316_probe(struct device *dev, struct adt7316_bus 
*bus,
return ret;
}
 
-   if (chip->ldac_pin) {
+   if (!chip->ldac_pin) {
chip->config3 |= ADT7316_DA_EN_VIA_DAC_LDCA;
if ((chip->id & ID_FAMILY_MASK) == ID_ADT75XX)
chip->config1 |= ADT7516_SEL_AIN3;
-- 
2.19.1

[PATCH 07/11] staging: iio: adt7316: use correct variable in DAC_internal_Vref read

2018-12-11 Thread Jeremy Fertic

The dac internal vref settings are part of the ldac config register rather
than the dac config register. Change the variable being used so the read
returns the correct result.

Signed-off-by: Jeremy Fertic 
---
 drivers/staging/iio/addac/adt7316.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/iio/addac/adt7316.c 
b/drivers/staging/iio/addac/adt7316.c
index 77ef3c209b67..98101a7157d2 100644
--- a/drivers/staging/iio/addac/adt7316.c
+++ b/drivers/staging/iio/addac/adt7316.c
@@ -1056,10 +1056,10 @@ static ssize_t adt7316_show_DAC_internal_Vref(struct 
device *dev,
 
if ((chip->id & ID_FAMILY_MASK) == ID_ADT75XX)
return sprintf(buf, "0x%x\n",
-   (chip->dac_config & ADT7516_DAC_IN_VREF_MASK) >>
+   (chip->ldac_config & ADT7516_DAC_IN_VREF_MASK) >>
ADT7516_DAC_IN_VREF_OFFSET);
return sprintf(buf, "%d\n",
-  !!(chip->dac_config & ADT7316_DAC_IN_VREF));
+  !!(chip->ldac_config & ADT7316_DAC_IN_VREF));
 }
 
 static ssize_t adt7316_store_DAC_internal_Vref(struct device *dev,
-- 
2.19.1

[PATCH 11/11] staging: iio: adt7316: correct spelling of ADT7316_DA_EN_VIA_DAC_LDCA

2018-12-11 Thread Jeremy Fertic

Change LDCA to LDAC.

Signed-off-by: Jeremy Fertic 
---
 drivers/staging/iio/addac/adt7316.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/iio/addac/adt7316.c 
b/drivers/staging/iio/addac/adt7316.c
index 58b462ad0c83..020d695ded97 100644
--- a/drivers/staging/iio/addac/adt7316.c
+++ b/drivers/staging/iio/addac/adt7316.c
@@ -119,7 +119,7 @@
  */
 #define ADT7316_ADCLK_22_5 0x1
 #define ADT7316_DA_HIGH_RESOLUTION 0x2
-#define ADT7316_DA_EN_VIA_DAC_LDCA 0x8
+#define ADT7316_DA_EN_VIA_DAC_LDAC 0x8
 #define ADT7516_AIN_IN_VREF0x10
 #define ADT7316_EN_IN_TEMP_PROP_DACA   0x20
 #define ADT7316_EN_EX_TEMP_PROP_DACB   0x40
@@ -847,7 +847,7 @@ static ssize_t adt7316_show_DAC_update_mode(struct device 
*dev,
struct iio_dev *dev_info = dev_to_iio_dev(dev);
struct adt7316_chip_info *chip = iio_priv(dev_info);
 
-   if (!(chip->config3 & ADT7316_DA_EN_VIA_DAC_LDCA))
+   if (!(chip->config3 & ADT7316_DA_EN_VIA_DAC_LDAC))
return sprintf(buf, "manual\n");
 
switch (chip->dac_config & ADT7316_DA_EN_MODE_MASK) {
@@ -876,7 +876,7 @@ static ssize_t adt7316_store_DAC_update_mode(struct device 
*dev,
u8 data;
int ret;
 
-   if (!(chip->config3 & ADT7316_DA_EN_VIA_DAC_LDCA))
+   if (!(chip->config3 & ADT7316_DA_EN_VIA_DAC_LDAC))
return -EPERM;
 
ret = kstrtou8(buf, 10, &data);
@@ -907,7 +907,7 @@ static ssize_t adt7316_show_all_DAC_update_modes(struct 
device *dev,
struct iio_dev *dev_info = dev_to_iio_dev(dev);
struct adt7316_chip_info *chip = iio_priv(dev_info);
 
-   if (chip->config3 & ADT7316_DA_EN_VIA_DAC_LDCA)
+   if (chip->config3 & ADT7316_DA_EN_VIA_DAC_LDAC)
return sprintf(buf, "0 - auto at any MSB DAC writing\n"
"1 - auto at MSB DAC AB and CD writing\n"
"2 - auto at MSB DAC ABCD writing\n"
@@ -929,7 +929,7 @@ static ssize_t adt7316_store_update_DAC(struct device *dev,
u8 data;
int ret;
 
-   if (chip->config3 & ADT7316_DA_EN_VIA_DAC_LDCA) {
+   if (chip->config3 & ADT7316_DA_EN_VIA_DAC_LDAC) {
if ((chip->dac_config & ADT7316_DA_EN_MODE_MASK) !=
ADT7316_DA_EN_MODE_LDAC)
return -EPERM;
@@ -2128,7 +2128,7 @@ int adt7316_probe(struct device *dev, struct adt7316_bus 
*bus,
}
 
if (!chip->ldac_pin) {
-   chip->config3 |= ADT7316_DA_EN_VIA_DAC_LDCA;
+   chip->config3 |= ADT7316_DA_EN_VIA_DAC_LDAC;
if ((chip->id & ID_FAMILY_MASK) == ID_ADT75XX)
chip->config1 |= ADT7516_SEL_AIN3;
}
-- 
2.19.1

[PATCH 10/11] staging: iio: adt7316: change interpretation of write to dac update mode

2018-12-11 Thread Jeremy Fertic

Based on the output of adt7316_show_all_DAC_update_modes() and
adt7316_show_DAC_update_mode(), adt7316_store_DAC_update_mode() should
expect the user to enter an integer input from 0 to 3. The user input is
currently expected to account for the actual bit positions in the register.
For example, choosing option 3 would require a write of 0x30 (actually 48
since it expects base 10). To address this inconsistency, create a shift
macro to be used in the valid input check as well as the calculation for
the register write.

Signed-off-by: Jeremy Fertic 
---
I'm not sure if this patch is appropriate since it's making a user visible
change. I've included it since the driver is still in staging.

 drivers/staging/iio/addac/adt7316.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/iio/addac/adt7316.c 
b/drivers/staging/iio/addac/adt7316.c
index bca599d8c51c..58b462ad0c83 100644
--- a/drivers/staging/iio/addac/adt7316.c
+++ b/drivers/staging/iio/addac/adt7316.c
@@ -129,6 +129,7 @@
  */
 #define ADT7316_DA_2VREF_CH_MASK   0xF
 #define ADT7316_DA_EN_MODE_MASK0x30
+#define ADT7316_DA_EN_MODE_SHIFT   4
 #define ADT7316_DA_EN_MODE_SINGLE  0x00
 #define ADT7316_DA_EN_MODE_AB_CD   0x10
 #define ADT7316_DA_EN_MODE_ABCD0x20
@@ -879,11 +880,11 @@ static ssize_t adt7316_store_DAC_update_mode(struct 
device *dev,
return -EPERM;
 
ret = kstrtou8(buf, 10, &data);
-   if (ret || data > ADT7316_DA_EN_MODE_MASK)
+   if (ret || data > (ADT7316_DA_EN_MODE_MASK >> ADT7316_DA_EN_MODE_SHIFT))
return -EINVAL;
 
dac_config = chip->dac_config & (~ADT7316_DA_EN_MODE_MASK);
-   dac_config |= data;
+   dac_config |= data << ADT7316_DA_EN_MODE_SHIFT;
 
ret = chip->bus.write(chip->bus.client, ADT7316_DAC_CONFIG, dac_config);
if (ret)
-- 
2.19.1

[PATCH 08/11] staging: iio: adt7316: allow adt751x to use internal vref for all dacs

2018-12-11 Thread Jeremy Fertic

With adt7516/7/9, internal vref is available for dacs a and b, dacs c and
d, or all dacs. The driver doesn't currently support internal vref for all
dacs. Change the else if to an if so both bits are checked rather than
just one or the other.

Signed-off-by: Jeremy Fertic 
---
 drivers/staging/iio/addac/adt7316.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/iio/addac/adt7316.c 
b/drivers/staging/iio/addac/adt7316.c
index 98101a7157d2..3348fdf08f2e 100644
--- a/drivers/staging/iio/addac/adt7316.c
+++ b/drivers/staging/iio/addac/adt7316.c
@@ -1081,7 +1081,7 @@ static ssize_t adt7316_store_DAC_internal_Vref(struct 
device *dev,
ldac_config = chip->ldac_config & (~ADT7516_DAC_IN_VREF_MASK);
if (data & 0x1)
ldac_config |= ADT7516_DAC_AB_IN_VREF;
-   else if (data & 0x2)
+   if (data & 0x2)
ldac_config |= ADT7516_DAC_CD_IN_VREF;
} else {
ret = kstrtou8(buf, 16, &data);
-- 
2.19.1

[PATCH 04/11] staging: iio: adt7316: fix handling of dac high resolution option

2018-12-11 Thread Jeremy Fertic

The dac high resolution option enables or disables 10 bit dac resolution
for the adt7316/7 and adt7516/7 when they're set to output voltage
proportional to temperature. Remove the "1 (12 bits)" output from the show
function since that is not an option for this mode. Return "1 (10 bits)"
if the device is one of the above mentioned chips and the dac high
resolution mode is enabled. In the store function, return an error if the
device does not support this mode.

Signed-off-by: Jeremy Fertic 
---
 drivers/staging/iio/addac/adt7316.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/iio/addac/adt7316.c 
b/drivers/staging/iio/addac/adt7316.c
index a9990e7f2a4d..eee7c04f93f4 100644
--- a/drivers/staging/iio/addac/adt7316.c
+++ b/drivers/staging/iio/addac/adt7316.c
@@ -632,9 +632,7 @@ static ssize_t adt7316_show_da_high_resolution(struct 
device *dev,
struct adt7316_chip_info *chip = iio_priv(dev_info);
 
if (chip->config3 & ADT7316_DA_HIGH_RESOLUTION) {
-   if (chip->id == ID_ADT7316 || chip->id == ID_ADT7516)
-   return sprintf(buf, "1 (12 bits)\n");
-   if (chip->id == ID_ADT7317 || chip->id == ID_ADT7517)
+   if (chip->id != ID_ADT7318 && chip->id != ID_ADT7519)
return sprintf(buf, "1 (10 bits)\n");
}
 
@@ -651,10 +649,12 @@ static ssize_t adt7316_store_da_high_resolution(struct 
device *dev,
u8 config3;
int ret;
 
+   if (chip->id == ID_ADT7318 || chip->id == ID_ADT7519)
+   return -EPERM;
+
+   config3 = chip->config3 & (~ADT7316_DA_HIGH_RESOLUTION);
if (buf[0] == '1')
-   config3 = chip->config3 | ADT7316_DA_HIGH_RESOLUTION;
-   else
-   config3 = chip->config3 & (~ADT7316_DA_HIGH_RESOLUTION);
+   config3 |= ADT7316_DA_HIGH_RESOLUTION;
 
ret = chip->bus.write(chip->bus.client, ADT7316_CONFIG3, config3);
if (ret)
-- 
2.19.1

[PATCH 05/11] staging: iio: adt7316: fix the dac read calculation

2018-12-11 Thread Jeremy Fertic

The calculation of the current dac value is using the wrong bits of the
dac lsb register. Create two macros to shift the lsb register value into
lsb position, depending on whether the dac is 10 or 12 bit. Initialize
data to 0 so, with an 8 bit dac, the msb register value can be bitwise
ORed with data.

Signed-off-by: Jeremy Fertic 
---
 drivers/staging/iio/addac/adt7316.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/iio/addac/adt7316.c 
b/drivers/staging/iio/addac/adt7316.c
index eee7c04f93f4..b7d12d003ddc 100644
--- a/drivers/staging/iio/addac/adt7316.c
+++ b/drivers/staging/iio/addac/adt7316.c
@@ -47,6 +47,8 @@
 #define ADT7516_MSB_AIN3   0xA
 #define ADT7516_MSB_AIN4   0xB
 #define ADT7316_DA_DATA_BASE   0x10
+#define ADT7316_DA_10_BIT_LSB_SHIFT6
+#define ADT7316_DA_12_BIT_LSB_SHIFT4
 #define ADT7316_DA_MSB_DATA_REGS   4
 #define ADT7316_LSB_DAC_A  0x10
 #define ADT7316_MSB_DAC_A  0x11
@@ -1403,7 +1405,7 @@ static IIO_DEVICE_ATTR(ex_analog_temp_offset, 0644,
 static ssize_t adt7316_show_DAC(struct adt7316_chip_info *chip,
int channel, char *buf)
 {
-   u16 data;
+   u16 data = 0;
u8 msb, lsb, offset;
int ret;
 
@@ -1428,7 +1430,11 @@ static ssize_t adt7316_show_DAC(struct adt7316_chip_info 
*chip,
if (ret)
return -EIO;
 
-   data = (msb << offset) + (lsb & ((1 << offset) - 1));
+   if (chip->dac_bits == 12)
+   data = lsb >> ADT7316_DA_12_BIT_LSB_SHIFT;
+   else if (chip->dac_bits == 10)
+   data = lsb >> ADT7316_DA_10_BIT_LSB_SHIFT;
+   data |= msb << offset;
 
return sprintf(buf, "%d\n", data);
 }
-- 
2.19.1

[PATCH 03/11] staging: iio: adt7316: fix dac_bits assignment

2018-12-11 Thread Jeremy Fertic

The only assignment to dac_bits is in adt7316_store_da_high_resolution().
This function enables or disables 10 bit dac resolution for the adt7316/7
and adt7516/7 when they're set to output voltage proportional to
temperature. Remove these assignments since they're unnecessary for the
dac high resolution functionality.

Instead, assign a value to dac_bits in adt7316_probe() since the number
of dac bits might be needed as soon as the device is registered and
available to userspace.

Signed-off-by: Jeremy Fertic 
---
 drivers/staging/iio/addac/adt7316.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/iio/addac/adt7316.c 
b/drivers/staging/iio/addac/adt7316.c
index e5e1f9d6357f..a9990e7f2a4d 100644
--- a/drivers/staging/iio/addac/adt7316.c
+++ b/drivers/staging/iio/addac/adt7316.c
@@ -651,17 +651,10 @@ static ssize_t adt7316_store_da_high_resolution(struct 
device *dev,
u8 config3;
int ret;
 
-   chip->dac_bits = 8;
-
-   if (buf[0] == '1') {
+   if (buf[0] == '1')
config3 = chip->config3 | ADT7316_DA_HIGH_RESOLUTION;
-   if (chip->id == ID_ADT7316 || chip->id == ID_ADT7516)
-   chip->dac_bits = 12;
-   else if (chip->id == ID_ADT7317 || chip->id == ID_ADT7517)
-   chip->dac_bits = 10;
-   } else {
+   else
config3 = chip->config3 & (~ADT7316_DA_HIGH_RESOLUTION);
-   }
 
ret = chip->bus.write(chip->bus.client, ADT7316_CONFIG3, config3);
if (ret)
@@ -2123,6 +2116,13 @@ int adt7316_probe(struct device *dev, struct adt7316_bus 
*bus,
else
return -ENODEV;
 
+   if (chip->id == ID_ADT7316 || chip->id == ID_ADT7516)
+   chip->dac_bits = 12;
+   else if (chip->id == ID_ADT7317 || chip->id == ID_ADT7517)
+   chip->dac_bits = 10;
+   else
+   chip->dac_bits = 8;
+
chip->ldac_pin = devm_gpiod_get_optional(dev, "adi,ldac", 
GPIOD_OUT_LOW);
if (IS_ERR(chip->ldac_pin)) {
ret = PTR_ERR(chip->ldac_pin);
-- 
2.19.1

[PATCH 09/11] staging: iio: adt7316: remove dac vref buffer bypass from adt751x

2018-12-11 Thread Jeremy Fertic

The option to allow the external vref to bypass the reference buffer is
only available for adt7316/7/8. Remove the attributes for adt751x as
well as the chip->id checks from the show and store functions.

Signed-off-by: Jeremy Fertic 
---
 drivers/staging/iio/addac/adt7316.c | 14 --
 1 file changed, 14 deletions(-)

diff --git a/drivers/staging/iio/addac/adt7316.c 
b/drivers/staging/iio/addac/adt7316.c
index 3348fdf08f2e..bca599d8c51c 100644
--- a/drivers/staging/iio/addac/adt7316.c
+++ b/drivers/staging/iio/addac/adt7316.c
@@ -964,9 +964,6 @@ static ssize_t adt7316_show_DA_AB_Vref_bypass(struct device 
*dev,
struct iio_dev *dev_info = dev_to_iio_dev(dev);
struct adt7316_chip_info *chip = iio_priv(dev_info);
 
-   if ((chip->id & ID_FAMILY_MASK) == ID_ADT75XX)
-   return -EPERM;
-
return sprintf(buf, "%d\n",
!!(chip->dac_config & ADT7316_VREF_BYPASS_DAC_AB));
 }
@@ -981,9 +978,6 @@ static ssize_t adt7316_store_DA_AB_Vref_bypass(struct 
device *dev,
u8 dac_config;
int ret;
 
-   if ((chip->id & ID_FAMILY_MASK) == ID_ADT75XX)
-   return -EPERM;
-
dac_config = chip->dac_config & (~ADT7316_VREF_BYPASS_DAC_AB);
if (buf[0] == '1')
dac_config |= ADT7316_VREF_BYPASS_DAC_AB;
@@ -1009,9 +1003,6 @@ static ssize_t adt7316_show_DA_CD_Vref_bypass(struct 
device *dev,
struct iio_dev *dev_info = dev_to_iio_dev(dev);
struct adt7316_chip_info *chip = iio_priv(dev_info);
 
-   if ((chip->id & ID_FAMILY_MASK) == ID_ADT75XX)
-   return -EPERM;
-
return sprintf(buf, "%d\n",
!!(chip->dac_config & ADT7316_VREF_BYPASS_DAC_CD));
 }
@@ -1026,9 +1017,6 @@ static ssize_t adt7316_store_DA_CD_Vref_bypass(struct 
device *dev,
u8 dac_config;
int ret;
 
-   if ((chip->id & ID_FAMILY_MASK) == ID_ADT75XX)
-   return -EPERM;
-
dac_config = chip->dac_config & (~ADT7316_VREF_BYPASS_DAC_CD);
if (buf[0] == '1')
dac_config |= ADT7316_VREF_BYPASS_DAC_CD;
@@ -1713,8 +1701,6 @@ static struct attribute *adt7516_attributes[] = {
&iio_dev_attr_DAC_update_mode.dev_attr.attr,
&iio_dev_attr_all_DAC_update_modes.dev_attr.attr,
&iio_dev_attr_update_DAC.dev_attr.attr,
-   &iio_dev_attr_DA_AB_Vref_bypass.dev_attr.attr,
-   &iio_dev_attr_DA_CD_Vref_bypass.dev_attr.attr,
&iio_dev_attr_DAC_internal_Vref.dev_attr.attr,
&iio_dev_attr_VDD.dev_attr.attr,
&iio_dev_attr_in_temp.dev_attr.attr,
-- 
2.19.1

[PATCH 06/11] staging: iio: adt7316: fix the dac write calculation

2018-12-11 Thread Jeremy Fertic

The lsb calculation is not masking the correct bits from the user input.
Subtract 1 from (1 << offset) to correctly set up the mask to be applied
to user input.

The lsb register stores its value starting at the bit 7 position.
adt7316_store_DAC() currently assumes the value is at the other end of the
register. Shift the lsb value before storing it in a new variable lsb_reg,
and write this variable to the lsb register.

Signed-off-by: Jeremy Fertic 
---
 drivers/staging/iio/addac/adt7316.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/iio/addac/adt7316.c 
b/drivers/staging/iio/addac/adt7316.c
index b7d12d003ddc..77ef3c209b67 100644
--- a/drivers/staging/iio/addac/adt7316.c
+++ b/drivers/staging/iio/addac/adt7316.c
@@ -1442,7 +1442,7 @@ static ssize_t adt7316_show_DAC(struct adt7316_chip_info 
*chip,
 static ssize_t adt7316_store_DAC(struct adt7316_chip_info *chip,
 int channel, const char *buf, size_t len)
 {
-   u8 msb, lsb, offset;
+   u8 msb, lsb, lsb_reg, offset;
u16 data;
int ret;
 
@@ -1460,9 +1460,13 @@ static ssize_t adt7316_store_DAC(struct 
adt7316_chip_info *chip,
return -EINVAL;
 
if (chip->dac_bits > 8) {
-   lsb = data & (1 << offset);
+   lsb = data & ((1 << offset) - 1);
+   if (chip->dac_bits == 12)
+   lsb_reg = lsb << ADT7316_DA_12_BIT_LSB_SHIFT;
+   else
+   lsb_reg = lsb << ADT7316_DA_10_BIT_LSB_SHIFT;
ret = chip->bus.write(chip->bus.client,
-   ADT7316_DA_DATA_BASE + channel * 2, lsb);
+   ADT7316_DA_DATA_BASE + channel * 2, lsb_reg);
if (ret)
return -EIO;
}
-- 
2.19.1

Re: [PATCH v2] mm, memory_hotplug: Don't bail out in do_migrate_range prematurely

2018-12-11 Thread Wei Yang

On Tue, Dec 11, 2018 at 02:53:12PM +0100, Oscar Salvador wrote:
>v1 -> v2:
>- Keep branch to decrease refcount and print out
>  the failed pfn/page
>- Modified changelog per Michal's feedback
>- move put_page() out of the if/else branch
>
>---
>>From f81da873be9a5b7845249d1e62a423f054c487d5 Mon Sep 17 00:00:00 2001
>From: Oscar Salvador 
>Date: Tue, 11 Dec 2018 11:45:19 +0100
>Subject: [PATCH] mm, memory_hotplug: Don't bail out in do_migrate_range
> prematurely
>
>do_migrate_ranges() takes a memory range and tries to isolate the

I think it is do_migrate_range().

>pages to put them into a list.
>This list will be later on used in migrate_pages() to know
>the pages we need to migrate.
>
>Currently, if we fail to isolate a single page, we put all already
>isolated pages back to their LRU and we bail out from the function.
>This is quite suboptimal, as this will force us to start over again
>because scan_movable_pages will give us the same range.
>If there is no chance that we can isolate that page, we will loop here
>forever.
>
>Issue debugged in [1] has proved that.
>During the debugging of that issue, it was noticed that if
>do_migrate_ranges() fails to isolate a single page, we will
>just discard the work we have done so far and bail out, which means
>that scan_movable_pages() will find again the same set of pages.
>
>Instead, we can just skip the error, keep isolating as much pages
>as possible and then proceed with the call to migrate_pages().
>
>This will allow us to do as much work as possible at once.
>
>[1] https://lkml.org/lkml/2018/12/6/324
>
>Signed-off-by: Oscar Salvador 
>---
> mm/memory_hotplug.c | 18 ++
> 1 file changed, 2 insertions(+), 16 deletions(-)
>
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index 86ab673fc4e3..68e740b1768e 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -1339,7 +1339,6 @@ do_migrate_range(unsigned long start_pfn, unsigned long 
>end_pfn)
>   unsigned long pfn;
>   struct page *page;
>   int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
>-  int not_managed = 0;
>   int ret = 0;
>   LIST_HEAD(source);
> 
>@@ -1388,7 +1387,6 @@ do_migrate_range(unsigned long start_pfn, unsigned long 
>end_pfn)
>   else
>   ret = isolate_movable_page(page, ISOLATE_UNEVICTABLE);
>   if (!ret) { /* Success */
>-  put_page(page);
>   list_add_tail(&page->lru, &source);
>   move_pages--;
>   if (!__PageMovable(page))
>@@ -1398,22 +1396,10 @@ do_migrate_range(unsigned long start_pfn, unsigned 
>long end_pfn)
>   } else {
>   pr_warn("failed to isolate pfn %lx\n", pfn);
>   dump_page(page, "isolation failed");
>-  put_page(page);
>-  /* Because we don't have big zone->lock. we should
>- check this again here. */
>-  if (page_count(page)) {
>-  not_managed++;
>-  ret = -EBUSY;
>-  break;
>-  }
>   }
>+  put_page(page);
>   }
>   if (!list_empty(&source)) {
>-  if (not_managed) {
>-  putback_movable_pages(&source);
>-  goto out;
>-  }
>-
>   /* Allocate a new page from the nearest neighbor node */
>   ret = migrate_pages(&source, new_node_page, NULL, 0,
>   MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
>@@ -1426,7 +1412,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long 
>end_pfn)
>   putback_movable_pages(&source);
>   }
>   }
>-out:
>+
>   return ret;
> }
> 
>-- 
>2.13.7

-- 
Wei Yang
Help you, Help me

Re: [PATCH v4 1/2] x86/fpu: track AVX-512 usage of tasks

2018-12-11 Thread Li, Aubrey

On 2018/12/12 8:14, Arjan van de Ven wrote:
> On 12/11/2018 3:46 PM, Li, Aubrey wrote:
>> On 2018/12/12 1:18, Dave Hansen wrote:
>>> On 12/10/18 4:24 PM, Aubrey Li wrote:
 The tracking turns on the usage flag at the next context switch of
 the task, but requires 3 consecutive context switches with no usage
 to clear it. This decay is required because well-written AVX-512
 applications are expected to clear this state when not actively using
 AVX-512 registers.
>>>
>>> One concern about this:  Given a HZ=1000 system, this means that the
>>> flag needs to get scanned every ~3ms.  That's a pretty good amount of
>>> scanning on a system with hundreds or thousands of tasks running around.
>>>
>>> How many tasks does this scale to until you're eating up an entire CPU
>>> or two just scanning /proc?
>>>
>>
>> Do we have a real requirement to do this in practical environment?
>> AFAIK, 1s or even 5s is good enough in some customers environment.
> 
> maybe instead of a 1/0 bit, it's useful to store the timestamp of the last
> time we found the task to use avx? (need to find a good time unit)
> 
> 
Are you suggesting kernel does not do any translation, just provide a fact
to the user space tool and let user space tool to decide how to use this info?

So how does user space tool use this timestamp in your mind?

Thanks,
-Aubrey

linux-next: manual merge of the net-next tree with the char-misc.current tree

2018-12-11 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  MAINTAINERS

between commit:

  cb4f131e1f2c ("MAINTAINERS: Patch monkey for the Hyper-V code")

from the char-misc.current tree and commit:

  b255e500c8dc ("net: documentation: build a directory structure for drivers")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc MAINTAINERS
index 4a2a30380bed,a676b40ac1d6..
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@@ -6930,11 -6902,9 +6931,11 @@@ Hyper-V CORE AND DRIVER
  M:"K. Y. Srinivasan" 
  M:Haiyang Zhang 
  M:Stephen Hemminger 
 +M:Sasha Levin 
 +T:git git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git
  L:de...@linuxdriverproject.org
 -S:Maintained
 +S:Supported
- F:Documentation/networking/netvsc.txt
+ F:Documentation/networking/device_drivers/microsoft/netvsc.txt
  F:arch/x86/include/asm/mshyperv.h
  F:arch/x86/include/asm/trace/hyperv.h
  F:arch/x86/include/asm/hyperv-tlfs.h


pgp3T9vycBxOt.pgp
Description: OpenPGP digital signature

Re: [PATCH v4 1/2] x86/fpu: track AVX-512 usage of tasks

2018-12-11 Thread Dave Hansen

On 12/11/18 4:59 PM, Li, Aubrey wrote:
>> maybe instead of a 1/0 bit, it's useful to store the timestamp of the last
>> time we found the task to use avx? (need to find a good time unit)
>>
>>
> Are you suggesting kernel does not do any translation, just provide a fact
> to the user space tool and let user space tool to decide how to use this info?
> 
> So how does user space tool use this timestamp in your mind?

Couple of things...

If we have a timestamp, we don't need a decay.  That means less tuning
and also less work at runtime in the common case.

Let's say we report milliseconds.  The app itself looks at the number
and could now say the following:

1. task A used AVX512 1ms ago, I might need to segregate it
2. task B used AVX512 1ms ago, time to stop segregating it
3. task C used AVX512 1ms ago, and was using it 100ms ago (during the
   last scan), it's a regular AVX512 user.

That way, you don't have to *catch* tasks in the window between use and
the end of the decay period.

Re: [alsa-devel] [PATCH] ALSA: hda/realtek - Add quirks for Dell Optiplex 5060, 7060 for audio out while using headphones

2018-12-11 Thread Hui Wang


On 2018/12/12 上午6:49, Girija Kumar Kasinadhuni wrote:

There are two audio ports on these machines - headset and mic.

Looks like the two audio ports are headset and lineout.

There' s no sound when hp is inserted into the headset port,
though the hp is detected. Without pulseaudio, this issue is
seen on both the Optiplex machines. The issue goes away with the
"options snd-hda-intel model=dell-headset-dock" quirk in the
/etc/modprobe.d/alsa-base.conf or /etc/modprobe.d/alsa.conf
Getting this fix into the kernel instead of fixing in an
application layer above, or in a .conf file. This is also helpful
for audio clients using the ALSA api like cras from ChromiumOS.


Did you test audio playback and recording on two audio jacks after 
applying the patch?


And Node 0x16 of this codec is "Vendor Defined Widget", but the fixup 
will set it to "dock line out", it looks weird.




Signed-off-by: Girija Kumar Kasinadhuni 
---
alsa-info:

  Optiplex 5060 -
  - without fix:
http://www.alsa-project.org/db/?f=2cf3c4354e82bde4aa480612a914535982fa3f91
  - with fix: 
http://www.alsa-project.org/db/?f=7a928960400ec828b82632411284ad8a63727b92
  Optiplex 7060
  - without fix: 
http://www.alsa-project.org/db/?f=69fa554faa712e7cad1f188be4a5192353d64f2c
  - with fix: 
http://www.alsa-project.org/db/?f=80ce1fb178b377b27b0795ac1c55e8cbee1b5b96

  sound/pci/hda/patch_realtek.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index 8d75597028ee..581f6a288abf 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -6444,6 +6444,8 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x1028, 0x064b, "Dell", 
ALC293_FIXUP_DELL1_MIC_NO_PRESENCE),
SND_PCI_QUIRK(0x1028, 0x0665, "Dell XPS 13", ALC288_FIXUP_DELL_XPS_13),
SND_PCI_QUIRK(0x1028, 0x0669, "Dell Optiplex 9020m", 
ALC255_FIXUP_DELL1_MIC_NO_PRESENCE),
+   SND_PCI_QUIRK(0x1028, 0x085b, "Dell Optiplex 5060", 
ALC269_FIXUP_DELL2_MIC_NO_PRESENCE),
+   SND_PCI_QUIRK(0x1028, 0x085a, "Dell Optiplex 7060", 
ALC269_FIXUP_DELL2_MIC_NO_PRESENCE),
SND_PCI_QUIRK(0x1028, 0x069a, "Dell Vostro 5480", 
ALC290_FIXUP_SUBWOOFER_HSJACK),
SND_PCI_QUIRK(0x1028, 0x06c7, "Dell", 
ALC255_FIXUP_DELL1_MIC_NO_PRESENCE),
SND_PCI_QUIRK(0x1028, 0x06d9, "Dell", 
ALC293_FIXUP_DELL1_MIC_NO_PRESENCE),

Re: [PATCH 1/3] mm, memory_hotplug: try to migrate full pfn range

2018-12-11 Thread Wei Yang

On Tue, Dec 11, 2018 at 03:27:39PM +0100, Michal Hocko wrote:
>From: Michal Hocko 
>
>do_migrate_range has been limiting the number of pages to migrate to 256
>for some reason which is not documented. Even if the limit made some
>sense back then when it was introduced it doesn't really serve a good
>purpose these days. If the range contains huge pages then
>we break out of the loop too early and go through LRU and pcp
>caches draining and scan_movable_pages is quite suboptimal.
>
>The only reason to limit the number of pages I can think of is to reduce
>the potential time to react on the fatal signal. But even then the
>number of pages is a questionable metric because even a single page
>might migration block in a non-killable state (e.g. __unmap_and_move).
>
>Remove the limit and offline the full requested range (this is one
>membblock worth of pages with the current code). Should we ever get a

s/membblock/memblock/

Or memory block is more accurate? May memblock confuse audience with
lower level facility?

>report that offlining takes too long to react on fatal signal then we
>should rather fix the core migration to use killable waits and bailout
>on a signal.
>
>Reviewed-by: David Hildenbrand 
>Reviewed-by: Pavel Tatashin 
>Reviewed-by: Oscar Salvador 
>Signed-off-by: Michal Hocko 
>---
> mm/memory_hotplug.c | 8 ++--
> 1 file changed, 2 insertions(+), 6 deletions(-)
>
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index c82193db4be6..6263c8cd4491 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -1339,18 +1339,16 @@ static struct page *new_node_page(struct page *page, 
>unsigned long private)
>   return new_page_nodemask(page, nid, &nmask);
> }
> 
>-#define NR_OFFLINE_AT_ONCE_PAGES  (256)
> static int
> do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> {
>   unsigned long pfn;
>   struct page *page;
>-  int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
>   int not_managed = 0;
>   int ret = 0;
>   LIST_HEAD(source);
> 
>-  for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
>+  for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>   if (!pfn_valid(pfn))
>   continue;
>   page = pfn_to_page(pfn);
>@@ -1362,8 +1360,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long 
>end_pfn)
>   ret = -EBUSY;
>   break;
>   }
>-  if (isolate_huge_page(page, &source))
>-  move_pages -= 1 << compound_order(head);
>+  isolate_huge_page(page, &source);
>   continue;
>   } else if (PageTransHuge(page))
>   pfn = page_to_pfn(compound_head(page))
>@@ -1382,7 +1379,6 @@ do_migrate_range(unsigned long start_pfn, unsigned long 
>end_pfn)
>   if (!ret) { /* Success */
>   put_page(page);
>   list_add_tail(&page->lru, &source);
>-  move_pages--;
>   if (!__PageMovable(page))
>   inc_node_page_state(page, NR_ISOLATED_ANON +
>   page_is_file_cache(page));
>-- 
>2.19.2

-- 
Wei Yang
Help you, Help me

Re: 4.14 backport request for dbdda842fe96f: "printk: Add console owner and waiter logic to load balance console writes"

2018-12-11 Thread Daniel Wang

> Let's first figure out if it works.

I would still like to try applying your patches that went into
printk.git, but for now I wonder if we can get Steven's patch into
4.14 first, for at least we know it mitigated the issue if not
fundamentally addressed it, and we've agreed it's an innocuous change
that doesn't risk breaking stable.

I haven't done this before so I'll need your help. What's the next
step to actually get Steven's patch *in* linux-4.14.y? According to
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
I am supposed to send an email with the patch ID and subject, which
are both mentioned in this email. Should I send another one? What's
the process like? Thanks!

On Thu, Nov 8, 2018 at 10:47 PM Sergey Senozhatsky
 wrote:
>
> On (11/01/18 09:05), Daniel Wang wrote:
> > > Another deadlock scenario could be the following one:
> > >
> > > printk()
> > >  console_trylock()
> > >   down_trylock()
> > >raw_spin_lock_irqsave(&sem->lock, flags)
> > > 
> > >  panic()
> > >   console_flush_on_panic()
> > >console_trylock()
> > > raw_spin_lock_irqsave(&sem->lock, flags)// 
> > > deadlock
> > >
> > > There are no patches addressing this one at the moment. And it's
> > > unclear if you are hitting this scenario.
> >
> > I am not sure, but Steven's patches did make the deadlock I saw go away...
>
> You certainly can find cases when "busy spin on console_sem owner" logic
> can reduce some possibilities.
>
> But spin_lock(&lock); NMI; spin_lock(&lock); code is still in the kernel.
>
> > A little swamped by other things lately but I'll run a test with it.
> > If it works, would you recommend taking your patch alone
>
> Let's first figure out if it works.
>
> -ss



--
Best,
Daniel

On Thu, Nov 8, 2018 at 10:47 PM Sergey Senozhatsky
 wrote:
>
> On (11/01/18 09:05), Daniel Wang wrote:
> > > Another deadlock scenario could be the following one:
> > >
> > > printk()
> > >  console_trylock()
> > >   down_trylock()
> > >raw_spin_lock_irqsave(&sem->lock, flags)
> > > 
> > >  panic()
> > >   console_flush_on_panic()
> > >console_trylock()
> > > raw_spin_lock_irqsave(&sem->lock, flags)// 
> > > deadlock
> > >
> > > There are no patches addressing this one at the moment. And it's
> > > unclear if you are hitting this scenario.
> >
> > I am not sure, but Steven's patches did make the deadlock I saw go away...
>
> You certainly can find cases when "busy spin on console_sem owner" logic
> can reduce some possibilities.
>
> But spin_lock(&lock); NMI; spin_lock(&lock); code is still in the kernel.
>
> > A little swamped by other things lately but I'll run a test with it.
> > If it works, would you recommend taking your patch alone
>
> Let's first figure out if it works.
>
> -ss



-- 
Best,
Daniel


smime.p7s
Description: S/MIME Cryptographic Signature

RE: [PATCH 1/4] arm64: hyperv: Add core Hyper-V include files

2018-12-11 Thread Michael Kelley

From: Will Deacon  Sent: Friday, December 7, 2018 5:43 AM

> > hyperv-tlfs.h defines Hyper-V interfaces from the Hyper-V Top Level
> > Functional Spec (TLFS). The TLFS is distinctly oriented to x86/x64,
> > and Hyper-V has not separated out the architecture-dependent parts into
> > x86/x64 vs. ARM64. So hyperv-tlfs.h includes information for ARM64
> > that is not yet formally published. The TLFS is available here:
> 
> When do you plan to publish the spec? It's pretty hard to review this stuff
> without knowing what it's supposed to look like.

I don't have a commitment from the Hyper-V team on when an updated TLFS
that covers ARM64 will be published.  I'm on the Linux side, and Hyper-V is a
separate group, but I'll raise the topic again with them.

> 
> >   docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs
> >
> > mshyperv.h defines Linux-specific structures and routines for
> > interacting with Hyper-V. It is split into an ARM64 specific file
> > and an architecture independent file in include/asm-generic.
> >
> > Signed-off-by: Michael Kelley 
> > Signed-off-by: K. Y. Srinivasan 
> > ---
> >  MAINTAINERS  |   3 +
> >  arch/arm64/include/asm/hyperv-tlfs.h | 338 +++
> >  arch/arm64/include/asm/mshyperv.h| 116 +
> >  include/asm-generic/mshyperv.h   | 240 +++
> >  4 files changed, 697 insertions(+)
> >  create mode 100644 arch/arm64/include/asm/hyperv-tlfs.h
> >  create mode 100644 arch/arm64/include/asm/mshyperv.h
> >  create mode 100644 include/asm-generic/mshyperv.h
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index f4855974f325..72f19cef4c48 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -6835,6 +6835,8 @@ F:arch/x86/include/asm/trace/hyperv.h
> >  F: arch/x86/include/asm/hyperv-tlfs.h
> >  F: arch/x86/kernel/cpu/mshyperv.c
> >  F: arch/x86/hyperv
> > +F: arch/arm64/include/asm/hyperv-tlfs.h
> > +F: arch/arm64/include/asm/mshyperv.h
> >  F: drivers/hid/hid-hyperv.c
> >  F: drivers/hv/
> >  F: drivers/input/serio/hyperv-keyboard.c
> > @@ -6846,6 +6848,7 @@ F:drivers/video/fbdev/hyperv_fb.c
> >  F: net/vmw_vsock/hyperv_transport.c
> >  F: include/linux/hyperv.h
> >  F: include/uapi/linux/hyperv.h
> > +F: include/asm-generic/mshyperv.h
> >  F: tools/hv/
> >  F: Documentation/ABI/stable/sysfs-bus-vmbus
> >
> > diff --git a/arch/arm64/include/asm/hyperv-tlfs.h 
> > b/arch/arm64/include/asm/hyperv-
> tlfs.h
> > new file mode 100644
> > index ..924e37600e92
> > --- /dev/null
> > +++ b/arch/arm64/include/asm/hyperv-tlfs.h
> > @@ -0,0 +1,338 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +/*
> > + * This file contains definitions from the Hyper-V Hypervisor Top-Level
> > + * Functional Specification (TLFS):
> > + *
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.co
> m%2Fen-us%2Fvirtualization%2Fhyper-v-on-
> windows%2Freference%2Ftlfs&data=02%7C01%7Cmikelley%40microsoft.com%7Ce1b
> dbb31db064623174f08d65c49cc3b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63
> 6797869386921804&sdata=RiD05cDWC%2FPnXnis6U7EcfEfCjvb54uuKHRfifQhMEM%3D
> &reserved=0
> 
> As mentioned elsewhere, please use a better link here and drop the license
> boilerplate below.

Agreed.  Will do so here and in other files in the next version of the patch.

> 
> > + *
> > + * Copyright (C) 2018, Microsoft, Inc.
> > + *
> > + * Author : Michael Kelley 
> > + *
> > + * This program is free software; you can redistribute it and/or modify it
> > + * under the terms of the GNU General Public License version 2 as published
> > + * by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful, but
> > + * WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> > + * NON INFRINGEMENT.  See the GNU General Public License for more
> > + * details.
> > + */
> > +
> > +#ifndef _ASM_ARM64_HYPERV_H
> > +#define _ASM_ARM64_HYPERV_H
> 
> __ASM_HYPER_V_H please

OK.

> 
> > +
> > +#include 
> > +
> > +/*
> > + * These Hyper-V registers provide information equivalent to the CPUID
> > + * instruction on x86/x64.
> > + */
> > +#define HV_REGISTER_HYPERVISOR_VERSION 0x0100 /*CPUID
> 0x4002 */
> > +#defineHV_REGISTER_PRIVILEGES_AND_FEATURES 0x0200 /*CPUID
> 0x4003 */
> > +#defineHV_REGISTER_FEATURES0x0201 /*CPUID
> 0x4004 */
> > +#defineHV_REGISTER_IMPLEMENTATION_LIMITS   0x0202 /*CPUID
> 0x4005 */
> > +#define HV_ARM64_REGISTER_INTERFACE_VERSION0x00090006 /*CPUID
> 0x4001 */
> > +
> > +/*
> > + * Feature identification. HvRegisterPrivilegesAndFeaturesInfo returns a
> > + * 128-bit value with flags indicating which features are available to the
> > + * partition based upon the current partition privileges. The 128-bit
> > + * value is broken up with different por

Re: [PATCH v16 06/16] lib: fdt: add a helper function for handling memory range property

2018-12-11 Thread AKASHI, Takahiro

On Tue, Dec 11, 2018 at 10:09:07AM +, James Morse wrote:
> Hi Akashi,
> 
> On 11/12/2018 06:17, AKASHI, Takahiro wrote:
> > On Fri, Dec 07, 2018 at 10:12:47AM +, James Morse wrote:
> >> On 06/12/2018 15:54, Will Deacon wrote:
> >>> On Thu, Dec 06, 2018 at 08:47:04AM -0600, Rob Herring wrote:
>  On Wed, Nov 14, 2018 at 11:52 PM AKASHI Takahiro
>   wrote:
> >
> > Added function, fdt_setprop_reg(), will be used later to handle
> > kexec-specific property in arm64's kexec_file implementation.
> > It will possibly be merged into libfdt in the future.
> 
>  You generally can't modify libfdt files. Any changes will be blown
>  away with the next dtc sync (there's one in -next now). Though here
>  you are creating a new location with fdt code. lib/ is just a shim to
>  the actual libfdt code. Don't put any implementation there. You can
>  add this to drivers/of/fdt_address.c for the short term, but it still
>  needs to go upstream.
> 
>  Otherwise, the implementation looks fine to me.
> >>>
> >>> I agree, but I don't think there's a real need for us to hack
> >>> drivers/of/fdt_address.c in the meantime -- let's just target upstream
> >>> and not carry this in the kernel.
> >>>
> >>> Akashi -- for now, I'll drop the kdump parts of this series which rely
> >>> on this helper. The majority of the series is actually independent and
> >>> can go in as-is.
> >>>
> >>> I've pushed out a kexec branch to the arm64 tree for you to take a look
> >>> at:
> >>>
> >>> https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=kexec
> >>
> >> I gave this a quick spin. Without the elfcorehdr/usable-memory-range arm64 
> >> needs
> >> to explicitly forbid kdump via kexec_file_load. (like powerpc does 
> >> already).
> > 
> > Thank you for pointing this out.
> > 
> >> Without this kdump works, but the second kernel overwrites the first as 
> >> those DT
> >> properties are missing.
> >>
> >> I'll post a patch momentarily,
> > 
> > Fine, but anyhow I'm going to submit a new version (*without* kdump),
> > I will fix the issue along with others.
> 
> I had a quick look at the arm64 for-next/core branch. Will has queued the
> non-kdump parts of this series.
> 
> If you have changes, they need to be against the arm64 tree.

Okay!

-Takahiro Akashi

> 
> Thanks,
> 
> James

Re: [PATCH] nvme-rdma: complete requests from ->timeout

2018-12-11 Thread Jaesoo Lee

Please drop this patch. However, it would be happy if this bug can be
fixed as soon as possible.

Nitzan, do you mind if you send your patch for review?
On Tue, Dec 11, 2018 at 3:39 PM Sagi Grimberg  wrote:
>
> > I cannot reproduce the bug with the patch; in my failure scenarios, it
> > seems that completing the request on errors in nvme_rdma_send_done
> > makes __nvme_submit_sync_cmd to be unblocked. Also, I think this is
> > safe from the double completions.
> >
> > However, it seems that nvme_rdma_timeout code is still not free from
> > the double completion problem. So, it looks promising to me if you
> > could separate out the nvme_rdma_wr_error handling code as a new
> > patch.
>
> Guys, can you please send proper patches so we can review properly?

Re: [PATCH v2 2/3] reset: imx7: Add support for i.MX8MQ IP block variant

2018-12-11 Thread Andrey Smirnov

On Tue, Dec 11, 2018 at 2:43 PM Rob Herring  wrote:
>
> On Tue, Nov 27, 2018 at 08:37:37PM -0800, Andrey Smirnov wrote:
> > Add bits and pieces needed to support IP block variant found on
> > i.MX8MQ SoCs.
> >
> > Cc: p.za...@pengutronix.de
> > Cc: Fabio Estevam 
> > Cc: cphe...@gmail.com
> > Cc: l.st...@pengutronix.de
> > Cc: Leonard Crestez 
> > Cc: "A.s. Dong" 
> > Cc: Richard Zhu 
> > Cc: Rob Herring 
> > Cc: devicet...@vger.kernel.org
> > Cc: linux-...@nxp.com
> > Cc: linux-arm-ker...@lists.infradead.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Andrey Smirnov 
> > ---
> >  drivers/reset/Kconfig|   2 +-
> >  drivers/reset/reset-imx7.c   | 106 +++
> >  include/dt-bindings/reset/imx8mq-reset.h |  64 ++
>
> This goes in the binding patch.

OK, I was following the precedent of the original submission:

https://lore.kernel.org/linux-arm-kernel/20170227205408.ajeu3stjpl45dnqf@rob-hp-laptop/

Will split and submit v3.

Thanks,
Andrey Smirnov

Re: [PATCH] usb: typec: tcpm: Extend the matching rules on PPS APDO selection

2018-12-11 Thread Kyle Tso

On Fri, Dec 7, 2018 at 1:56 AM Guenter Roeck  wrote:
>
> On Thu, Dec 06, 2018 at 11:02:27AM +0800, Kyle Tso wrote:
> > Current matching rules ensure that the voltage range of selected Source
> > Capability is entirely within the range defined in one of the Sink
> > Capabilities. This is reasonable but not practical because Sink may not
> > support wide range of voltage when sinking power while Source could
> > advertise its capabilities in raletively wider range. For example, a
> > Source PDO advertising 3.3V-11V@3A (9V Prog of Fixed Nominal Voltage)
> > will not be selected if the Sink requires 5V-12V@3A PPS power. However,
> > the Sink could work well if the requested voltage range in RDOs is
> > 5V-11V@3A.
> >
>
> Maybe a graphical description would help ?
>
> Currently accepted:
> |- source -|
> |--- sink ---|
>
> Currently not accepted:
>
> |- source -|
> |--- sink ---|
>
> |- source -|
> |--- sink ---|
>
> |- source -|
> |-- sink ---|
>

Sorry for late reply. I was on vacation.
Thanks for the suggestion. I will update the commit message.

> > To improve the usability, change the matching rules to what listed
> > below:
> > a. The Source PDO is selectable if any portion of the voltage range
> >overlaps one of the Sink PDO's voltage range.
> > b. The maximum operational voltage will be the lower one between the
> >selected Source PDO and the matching Sink PDO.
> > c. The maximum power will be the maximum operational voltage times the
> >maximum current defined in the selected Source PDO
> > d. Select the Source PDO with the highest maximum power
> >
> > Signed-off-by: Kyle Tso 
>
> Makes sense to me. I am a bit concerned that it might cause odd regressions,
> though. Did you test it with a few adapters ?
>
> With the expectation that you did,
>
> Reviewed-by: Guenter Roeck 
>

Yes I have tested this patch with 3 different brands of adapters with
PPS capabilities.

thanks,
Kyle

Re: [PATCH] srcu: Remove srcu_queue_delayed_work_on()

2018-12-11 Thread Joel Fernandes

On Tue, Dec 11, 2018 at 12:12:38PM +0100, Sebastian Andrzej Siewior wrote:
> srcu_queue_delayed_work_on() disables preemption (and therefore CPU
> hotplug in RCU's case) and then checks based on its own accounting if a
> CPU is online. If the CPU is online it uses queue_delayed_work_on()
> otherwise it fallbacks to queue_delayed_work().
> The problem here is that queue_work() on -RT does not work with disabled
> preemption.
> 
> queue_work_on() works also on an offlined CPU. queue_delayed_work_on()
> has the problem that it is possible to program a timer on an offlined
> CPU. This timer will fire once the CPU is online again. But until then,
> the timer remains programmed and nothing will happen.
> Add a local timer which will fire (as requested per delay) on the local
> CPU and then enqueue the work on the specific CPU.
> 
> RCUtorture testing with SRCU-P for 24h showed no problems.
> 
> Signed-off-by: Sebastian Andrzej Siewior 
> ---
>  include/linux/srcutree.h |  3 ++-
>  kernel/rcu/srcutree.c| 57 ++--
>  kernel/rcu/tree.c|  4 ---
>  kernel/rcu/tree.h|  8 --
>  4 files changed, 27 insertions(+), 45 deletions(-)
> 
> diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
> index 6f292bd3e7db7..0faa978c98807 100644
> --- a/include/linux/srcutree.h
> +++ b/include/linux/srcutree.h
> @@ -45,7 +45,8 @@ struct srcu_data {
>   unsigned long srcu_gp_seq_needed;   /* Furthest future GP needed. */
>   unsigned long srcu_gp_seq_needed_exp;   /* Furthest future exp GP. */
>   bool srcu_cblist_invoking;  /* Invoking these CBs? */
> - struct delayed_work work;   /* Context for CB invoking. */
> + struct timer_list delay_work;   /* Delay for CB invoking */
> + struct work_struct work;/* Context for CB invoking. */
>   struct rcu_head srcu_barrier_head;  /* For srcu_barrier() use. */
>   struct srcu_node *mynode;   /* Leaf srcu_node. */
>   unsigned long grpmask;  /* Mask for leaf srcu_node */
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 3600d88d8956b..7f041f2435df9 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -58,6 +58,7 @@ static bool __read_mostly srcu_init_done;
>  static void srcu_invoke_callbacks(struct work_struct *work);
>  static void srcu_reschedule(struct srcu_struct *ssp, unsigned long delay);
>  static void process_srcu(struct work_struct *work);
> +static void srcu_delay_timer(struct timer_list *t);
>  
>  /* Wrappers for lock acquisition and release, see raw_spin_lock_rcu_node(). 
> */
>  #define spin_lock_rcu_node(p)\
> @@ -156,7 +157,8 @@ static void init_srcu_struct_nodes(struct srcu_struct 
> *ssp, bool is_static)
>   snp->grphi = cpu;
>   }
>   sdp->cpu = cpu;
> - INIT_DELAYED_WORK(&sdp->work, srcu_invoke_callbacks);
> + INIT_WORK(&sdp->work, srcu_invoke_callbacks);
> + timer_setup(&sdp->delay_work, srcu_delay_timer, 0);
>   sdp->ssp = ssp;
>   sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
>   if (is_static)
> @@ -386,13 +388,19 @@ void _cleanup_srcu_struct(struct srcu_struct *ssp, bool 
> quiesced)
>   } else {
>   flush_delayed_work(&ssp->work);
>   }
> - for_each_possible_cpu(cpu)
> + for_each_possible_cpu(cpu) {
> + struct srcu_data *sdp = per_cpu_ptr(ssp->sda, cpu);
> +
>   if (quiesced) {
> - if (WARN_ON(delayed_work_pending(&per_cpu_ptr(ssp->sda, 
> cpu)->work)))
> + if (WARN_ON(timer_pending(&sdp->delay_work)))
> + return; /* Just leak it! */
> + if (WARN_ON(work_pending(&sdp->work)))
>   return; /* Just leak it! */
>   } else {
> - flush_delayed_work(&per_cpu_ptr(ssp->sda, cpu)->work);
> + del_timer_sync(&sdp->delay_work);
> + flush_work(&sdp->work);
>   }
> + }
>   if (WARN_ON(rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)) != 
> SRCU_STATE_IDLE) ||
>   WARN_ON(srcu_readers_active(ssp))) {
>   pr_info("%s: Active srcu_struct %p state: %d\n",
> @@ -463,39 +471,23 @@ static void srcu_gp_start(struct srcu_struct *ssp)
>   WARN_ON_ONCE(state != SRCU_STATE_SCAN1);
>  }
>  
> -/*
> - * Track online CPUs to guide callback workqueue placement.
> - */
> -DEFINE_PER_CPU(bool, srcu_online);
>  
> -void srcu_online_cpu(unsigned int cpu)
> +static void srcu_delay_timer(struct timer_list *t)
>  {
> - WRITE_ONCE(per_cpu(srcu_online, cpu), true);
> + struct srcu_data *sdp = container_of(t, struct srcu_data, delay_work);
> +
> + queue_work_on(sdp->cpu, rcu_gp_wq, &sdp->work);
>  }
>  
> -void srcu_offline_cpu(unsigned int cpu)
>

[PATCH] MAINTAINERS: Change entry name of samsung exynos bus frequency driver

2018-12-11 Thread Chanwoo Choi

The Samsung Exynos's device drivers have the 'SASMUNG EXYNOS' prefix
in front of the specific device driver name. In order to keep the
consistent naming format, change the entry name of bus frequency driver
for Samsung Exynos and then reorder it alpabetically.
- old : BUS FREQUENCY DRIVER FOR SAMSUNG EXYNOS
- new : SAMSUNG EXYNOS BUS FREQUENCY DRIVER

Also, remove the unneeded git repository information.

Signed-off-by: Chanwoo Choi 
---
 MAINTAINERS | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8119141a926f..f180b8499036 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3248,15 +3248,6 @@ S:   Odd fixes
 F: Documentation/media/v4l-drivers/bttv*
 F: drivers/media/pci/bt8xx/bttv*
 
-BUS FREQUENCY DRIVER FOR SAMSUNG EXYNOS
-M: Chanwoo Choi 
-L: linux...@vger.kernel.org
-L: linux-samsung-...@vger.kernel.org
-T: git git://git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq.git
-S: Maintained
-F: drivers/devfreq/exynos-bus.c
-F: Documentation/devicetree/bindings/devfreq/exynos-bus.txt
-
 BUSLOGIC SCSI DRIVER
 M: Khalid Aziz 
 L: linux-s...@vger.kernel.org
@@ -13102,6 +13093,14 @@ S: Supported
 F: sound/soc/samsung/
 F: Documentation/devicetree/bindings/sound/samsung*
 
+SAMSUNG EXYNOS BUS FREQUENCY DRIVER
+M: Chanwoo Choi 
+L: linux...@vger.kernel.org
+L: linux-samsung-...@vger.kernel.org
+S: Maintained
+F: drivers/devfreq/exynos-bus.c
+F: Documentation/devicetree/bindings/devfreq/exynos-bus.txt
+
 SAMSUNG EXYNOS PSEUDO RANDOM NUMBER GENERATOR (RNG) DRIVER
 M: Krzysztof Kozlowski 
 L: linux-cry...@vger.kernel.org
-- 
1.9.1

Re: [PATCH 1/1] sched/headers: fix thread_info. is overwritten by STACK_END_MAGIC

2018-12-11 Thread Wang, Dongsheng

Hello all,

Any comments about this patch?

Cheers,
Dongsheng

On 2018/11/30 10:19, Wang, Dongsheng wrote:
> On 2018/11/30 10:04, Wang, Dongsheng wrote:
>> On 2018/11/30 5:22, Kees Cook wrote:
>>> On Tue, Nov 27, 2018 at 8:38 PM Wang, Dongsheng
>>>  wrote:
 Hello Kees,

 On 2018/11/28 6:38, Kees Cook wrote:
> On Thu, Nov 22, 2018 at 11:54 PM, Wang Dongsheng
>  wrote:
>> When select ARCH_TASK_STRUCT_ON_STACK the first of thread_info variable
>> is overwritten by STACK_END_MAGIC. In fact, the ARCH_TASK_STRUCT_ON_STACK
>> is not a real task on stack, it's only init_task on init_stack.
>>
>> Commit 0500871f21b2 ("Construct init thread stack in the linker script
>> rather than by union") added this macro and put task_strcut into
>> thread_union. This brings us the following possibilities:
>> TASK_ON_STACKTHREAD_INFO_IN_TASKSTACK
>> - <-- thread_info & stack
>> NN | | --- <-- task
>>| ||   |
>> -  ---
>>
>> - <-- stack
>> NY | | --- <-- 
>> task(Including thread_info)
>>| ||   |
>> -  ---
>>
>> - <-- stack & task & 
>> thread_info
>> YN | |
>>| |
>> -
>>
>> - <-- stack & 
>> task(Including thread_info)
>> YY | |
>>| |
>> -
>> The kernel has handled the first two cases correctly.
>>
>> For the third case:
>> TASK_ON_STACK: Y. THREAD_INFO_IN_TASK: N. this case
>> should never happen, because the task and thread_info will overlap. So
>> when TASK_ON_STACK is selected, THREAD_INFO_IN_TASK must be selected too.
>>
>> For the fourth case:
>> When task on stack, the end of stack should add a sizeof(task_struct) 
>> offset.
>>
>> This patch handled with the third and fourth case.
>>
>> Fixes: 0500871f21b2 ("Construct init thread stack in the linker ...")
>>
>> Signed-off-by: Wang Dongsheng 
>> Signed-off-by: Shunyong Yang 
>> ---
>>  arch/Kconfig | 1 +
>>  include/linux/sched/task_stack.h | 5 -
>>  2 files changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/Kconfig b/arch/Kconfig
>> index e1e540ffa979..0a2c73e73195 100644
>> --- a/arch/Kconfig
>> +++ b/arch/Kconfig
>> @@ -251,6 +251,7 @@ config ARCH_HAS_SET_MEMORY
>>  # Select if arch init_task must go in the __init_task_data section
>>  config ARCH_TASK_STRUCT_ON_STACK
>> bool
>> +   depends on THREAD_INFO_IN_TASK || IA64
> The "IA64" part shouldn't be needed since IA64 already selects it.
>
> Since it's selected, it also can't have a depends, IIUC.
 Since the IA64 thread_info including task_struct, it doesn't need to
 select THREAD_INFO_IN_TASK.
 So we need to allow IA64 select ARCH_TASK_STRUCT_ON_STACK without
 THREAD_INFO.
>>> Okay.
>>>
>>  # Select if arch has its private alloc_task_struct() function
>>  config ARCH_TASK_STRUCT_ALLOCATOR
>> diff --git a/include/linux/sched/task_stack.h 
>> b/include/linux/sched/task_stack.h
>> index 6a841929073f..624c48defb9e 100644
>> --- a/include/linux/sched/task_stack.h
>> +++ b/include/linux/sched/task_stack.h
>> @@ -7,6 +7,7 @@
>>   */
>>
>>  #include 
>> +#include 
>>  #include 
>>
>>  #ifdef CONFIG_THREAD_INFO_IN_TASK
>> @@ -25,7 +26,9 @@ static inline void *task_stack_page(const struct 
>> task_struct *task)
>>
>>  static inline unsigned long *end_of_stack(const struct task_struct 
>> *task)
>>  {
>> -   return task->stack;
>> +   if (!IS_ENABLED(CONFIG_ARCH_TASK_STRUCT_ON_STACK) || task != 
>> &init_task)
>> +   return task->stack;
>> +   return (unsigned long *)(task + 1);
>>  }
> This seems like a strange place for the change. It feels more like
> init_task has been defined incorrectly.
 The init_task will put into init_stack when ARCH_TASK_STRUCT_ON_STACK is
 selected.
 include/asm-generic/vmlinux.lds.h:
 #define INIT_TASK_DATA(align)\
 . = ALIGN(align);\
 __start_init_task = .;

Re: [PATCH][RFC v2] ACPI: acpi_pad: Do not launch acpi_pad threads on idle cpus

2018-12-11 Thread Yu Chen

Hi,
On Tue, Dec 11, 2018 at 04:37:54PM +0800, joeyli wrote:
> Hi Yu Chen,
> 
> Thanks for your response!
> 
> On Tue, Dec 11, 2018 at 11:12:21AM +0800, Yu Chen wrote:
> > Hi Joey,
> > On Mon, Dec 10, 2018 at 02:31:53PM +0800, joeyli wrote:
> > > Hi Chen Yu and ACPI experts,
> > > 
> > > On Sat, May 05, 2018 at 07:53:22PM +0800, Chen Yu wrote:
> > > > According to current implementation of acpi_pad driver,
> > > > it does not make sense to spawn any power saving threads
> > > > on the cpus which are already idle - it might bring
> > > > unnecessary overhead on these idle cpus and causes power
> > > > waste. So verify the condition that if the number of 'busy'
> > > > cpus exceeds the amount of the 'forced idle' cpus is met.
> > > > This is applicable due to round-robin attribute of the
> > > > power saving threads, otherwise ignore the setting/ACPI
> > > > notification.
> > > > 
> > > > Suggested-by: Lenny Szubowicz 
> > > > Suggested-by: Len Brown 
> > > > Cc: "Rafael J. Wysocki" 
> > > > Cc: Lenny Szubowicz 
> > > > Cc: Len Brown 
> > > > Cc: Jacob Pan 
> > > > Cc: Rui Zhang 
> > > > Cc: linux-a...@vger.kernel.org
> > > > Signed-off-by: Chen Yu 
> > > 
> > > Do you have any news for this patch? Why it did not merged by kernel
> > > maineline?
> > > 
> > We are evaluating if this could be integrated into idle injection framework.
> > May I know if there's any requirement/background from SUSE on this?
> > 
> 
> I am also looking at your patch and idle injection framework. Currently I do 
> not
> have good suggestion for your patches yet. But I will try to ready my 
> knowledge when
> you send out new version.
>
Thanks. I mean, does SUSE get report from customers who encountered this issue?
BTW, may I know the status of the encryption hibernation please?(Usin the TPM?)

Best,
Yu(Ryan)
> Thanks a lot!
> Joey Lee

[PATCH] Fix mm->owner point to a task that does not exists.

2018-12-11 Thread gchen . guomin

From: guomin chen 

 Under normal circumstances,When do_exit exits, mm->owner will
 be updated on exit_mm(). but when the kernel process calls
 unuse_mm() and then exits,mm->owner cannot be updated. And it
 will point to a task that has been released.

 Below is my issue on vhost_net:
A, B are two kernel processes(such as vhost_worker),
C is a user space process(such as qemu), and all
three use the mm of the user process C.
Now, because user process C exits abnormally, the owner of this
mm becomes A. When A calls unuse_mm and exits, this mm->ower
still points to the A that has been released.
When B accesses this mm->owner again, A has been released.

 Process A Process B
 vhost_worker()vhost_worker()
  - -
  use_mm()  use_mm()
   ...
  unuse_mm()
 tsk->mm=NULL
   do_exit()page fault
exit_mm()   access mm->owner
   can't update owner   kernel Oops

unuse_mm()

Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
Cc: 
Cc: 
Cc: 
Cc: 
Signed-off-by: guomin chen 
---
 drivers/vhost/vhost.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 6b98d8e..7f8586a 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -368,6 +368,10 @@ static int vhost_worker(void *data)
}
}
unuse_mm(dev->mm);
+   /* current->mm needs to be processed by the later exit_mm() */
+   task_lock(current);
+   current->mm = dev->mm;
+   task_unlock(current);
set_fs(oldfs);
return 0;
 }
-- 
1.8.3.1

Re: [PATCH 1/2 v8] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'

2018-12-11 Thread lijiang

在 2018年12月05日 05:33, Lendacky, Thomas 写道:
> On 11/29/2018 09:37 PM, Dave Young wrote:
>> + more people
>>
>> On 11/29/18 at 04:09pm, Lianbo Jiang wrote:
>>> When doing kexec_file_load, the first kernel needs to pass the e820
>>> reserved ranges to the second kernel. But kernel can not exactly
>>> match the e820 reserved ranges when walking through the iomem resources
>>> with the descriptor 'IORES_DESC_NONE', because several e820 types(
>>> e.g. E820_TYPE_RESERVED_KERN/E820_TYPE_RAM/E820_TYPE_UNUSABLE/E820
>>> _TYPE_RESERVED) are converted to the descriptor 'IORES_DESC_NONE'. It
>>> may pass these four types to the kdump kernel, that is not desired result.
>>>
>>> So, this patch adds a new I/O resource descriptor 'IORES_DESC_RESERVED'
>>> for the iomem resources search interfaces. It is helpful to exactly
>>> match the reserved resource ranges when walking through iomem resources.
>>>
>>> In addition, since the new descriptor 'IORES_DESC_RESERVED' is introduced,
>>> these code originally related to the descriptor 'IORES_DESC_NONE' need to
>>> be updated. Otherwise, it will be easily confused and also cause some
>>> errors. Because the 'E820_TYPE_RESERVED' type is converted to the new
>>> descriptor 'IORES_DESC_RESERVED' instead of 'IORES_DESC_NONE', it has been
>>> changed.
>>>
>>> Suggested-by: Dave Young 
>>> Signed-off-by: Lianbo Jiang 
>>> ---
>>>  arch/ia64/kernel/efi.c |  4 
>>>  arch/x86/kernel/e820.c |  2 +-
>>>  arch/x86/mm/ioremap.c  | 13 -
>>>  include/linux/ioport.h |  1 +
>>>  kernel/resource.c  |  6 +++---
>>>  5 files changed, 21 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c
>>> index 8f106638913c..1841e9b4db30 100644
>>> --- a/arch/ia64/kernel/efi.c
>>> +++ b/arch/ia64/kernel/efi.c
>>> @@ -1231,6 +1231,10 @@ efi_initialize_iomem_resources(struct resource 
>>> *code_resource,
>>> break;
>>>  
>>> case EFI_RESERVED_TYPE:
>>> +   name = "reserved";
>>
>> Ingo updated X86 code to use "Reserved",  I think it would be good to do
>> same for this case as well
>>
>>> +   desc = IORES_DESC_RESERVED;
>>> +   break;
>>> +
>>> case EFI_RUNTIME_SERVICES_CODE:
>>> case EFI_RUNTIME_SERVICES_DATA:
>>> case EFI_ACPI_RECLAIM_MEMORY:
>>
>> Originally, above 3 are all "reserved", so probably they all should be
>> IORES_DESC_RESERVED.
>>
>> Can any IA64 people to review this?
>>
>>> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
>>> index 50895c2f937d..57fafdafb860 100644
>>> --- a/arch/x86/kernel/e820.c
>>> +++ b/arch/x86/kernel/e820.c
>>> @@ -1048,10 +1048,10 @@ static unsigned long __init 
>>> e820_type_to_iores_desc(struct e820_entry *entry)
>>> case E820_TYPE_NVS: return IORES_DESC_ACPI_NV_STORAGE;
>>> case E820_TYPE_PMEM:return IORES_DESC_PERSISTENT_MEMORY;
>>> case E820_TYPE_PRAM:return 
>>> IORES_DESC_PERSISTENT_MEMORY_LEGACY;
>>> +   case E820_TYPE_RESERVED:return IORES_DESC_RESERVED;
>>> case E820_TYPE_RESERVED_KERN:   /* Fall-through: */
>>> case E820_TYPE_RAM: /* Fall-through: */
>>> case E820_TYPE_UNUSABLE:/* Fall-through: */
>>> -   case E820_TYPE_RESERVED:/* Fall-through: */
>>> default:return IORES_DESC_NONE;
>>> }
>>>  }
>>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>>> index 5378d10f1d31..fea2ef99415d 100644
>>> --- a/arch/x86/mm/ioremap.c
>>> +++ b/arch/x86/mm/ioremap.c
>>> @@ -83,7 +83,18 @@ static bool __ioremap_check_ram(struct resource *res)
>>>  
>>>  static int __ioremap_check_desc_other(struct resource *res)
>>>  {
>>> -   return (res->desc != IORES_DESC_NONE);
>>> +   /*
>>> +* But now, the 'E820_TYPE_RESERVED' type is converted to the new
>>> +* descriptor 'IORES_DESC_RESERVED' instead of 'IORES_DESC_NONE',
>>> +* it has been changed. And the value of 'mem_flags.desc_other'
>>> +* is equal to 'true' if we don't strengthen the condition in this
>>> +* function, that is wrong. Because originally it is equal to
>>> +* 'false' for the same reserved type.
>>> +*
>>> +* So, that would be nice to keep it the same as before.
>>> +*/
>>> +   return ((res->desc != IORES_DESC_NONE) &&
>>> +   (res->desc != IORES_DESC_RESERVED));
>>>  }
>>
>> Added Tom since he added the check function.  Is it possible to only
>> check explict valid desc types instead of exclude IORES_DESC_NONE?
> 
> Sorry for the delay...
> 
> The original intent of the check was to map most memory as encrypted under
> SEV if it was marked with a specific descriptor, since it was likely to
> not be MMIO. I tried converting most things that mapped memory to memremap
> vs ioremap, but ACPI was one area that I left alone and this check catches
> the mapping of the ACPI tables. I supp

RE: [PATCH 1/2 v3] powerpc/fsl: Use new clockgen binding

2018-12-11 Thread Andy Tang



> -Original Message-
> From: Scott Wood 
> Sent: 2018年11月26日 9:19
> To: Andy Tang 
> Cc: mturque...@baylibre.com; sb...@kernel.org; robh...@kernel.org;
> mark.rutl...@arm.com; b...@kernel.crashing.org; pau...@samba.org;
> m...@ellerman.id.au; linux-...@vger.kernel.org;
> devicet...@vger.kernel.org; linux-kernel@vger.kernel.org;
> linuxppc-...@lists.ozlabs.org
> Subject: Re: [PATCH 1/2 v3] powerpc/fsl: Use new clockgen binding
> 
> On Wed, 2018-10-31 at 14:57 +0800, Yuantian Tang wrote:
> > From: Scott Wood 
> >
> > The driver retains compatibility with old device trees, but we don't
> > want the old nodes lying around to be copied, or used as a reference
> > (some of the mux options are incorrect), or even just being clutter.
> >
> >
> > +sysclk: sysclk {
> > +   compatible = "fixed-clock";
> > +   #clock-cells = <0>;
> > +   clock-frequency = <1>;
> > +   clock-output-names = "sysclk";
> > +};
> > +
> >  clockgen: global-utilities@e1000 {
> 
> The U-Boot fixup won't work with this.  U-Boot patches the frequency
> directly into the clockgen node (BTW, this is another reason to preserve
> the generic
> 1.0/2.0 compatible string).  The new binding does not require an input
> clock node when it is provided as clock-frequency directly in the clockgen
> node -- and the sysclk node was not in my original patch (nor did you note
> that you made changes from that original).  Why did you add it?
> 
> I would just remove it when applying, but I'm concerned that this indicates
> a lack of testing (and I don't have the hardware access to test it myself,
> except on t4240) -- unless the 100 MHz sysclk just happened to be correct
> on the machines you tested (which would also be a test coverage
> problem)?
[Andy] You are right. Sysclk may not be useful anymore. 
Uboot will fixup the clockgen node correctly. Please apply this patch without 
sysclk. We will
test it and catch the error if the clock is not fixed correctly.

BTW, which git tree are you going to apply it on? This one?
https://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux.git/log/?h=next

BR,
Andy
> 
> -Scott
>

Re: [PATCH 1/1] ACPI / tables: add DSDT AmlCode new declaration name support

2018-12-11 Thread Wang, Dongsheng

Hello all,

Any comments about this patch?

Cheers
Dongsheng

On 2018/11/23 16:12, Wang, Dongsheng wrote:
> Hello Robert,
>
> Do you have any comments about this patch?
> Thanks.
>
>
> Cheers
> Dongsheng
>
> On 2018/11/13 18:46, Wang, Dongsheng wrote:
>> The new naming rule is added in acpica version 20180427.
>> So the dsdt aml code name changes from "AmlCode" to "dsdt_aml_code".
>>
>> The patch that introduces naming rules is:
>> https://github.com/acpica/acpica/commit/f9a88a4c1cd020b6a5475d63b29626852a0b5f37
>>
>> Tested:
>> ACPICA release version 20180427+.
>> ARM64: QCOM QDF2400
>> GCC: 4.8.5 20150623
>>
>> Signed-off-by: Wang Dongsheng 
>> ---
>>  drivers/acpi/Kconfig  |  2 +-
>>  drivers/acpi/tables.c | 10 --
>>  2 files changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
>> index 9705fc986da9..15ab53a52fdc 100644
>> --- a/drivers/acpi/Kconfig
>> +++ b/drivers/acpi/Kconfig
>> @@ -336,7 +336,7 @@ config ACPI_CUSTOM_DSDT_FILE
>>See Documentation/acpi/dsdt-override.txt
>>  
>>Enter the full path name to the file which includes the AmlCode
>> -  declaration.
>> +  or dsdt_aml_code declaration.
>>  
>>If unsure, don't enter a file name.
>>  
>> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
>> index a3d012b08fc5..297020bbaade 100644
>> --- a/drivers/acpi/tables.c
>> +++ b/drivers/acpi/tables.c
>> @@ -713,6 +713,9 @@ acpi_os_physical_table_override(struct acpi_table_header 
>> *existing_table,
>>table_length);
>>  }
>>  
>> +static void *amlcode __attribute__ ((weakref("AmlCode")));
>> +static void *dsdt_amlcode __attribute__ ((weakref("dsdt_aml_code")));
>> +
>>  acpi_status
>>  acpi_os_table_override(struct acpi_table_header *existing_table,
>> struct acpi_table_header **new_table)
>> @@ -723,8 +726,11 @@ acpi_os_table_override(struct acpi_table_header 
>> *existing_table,
>>  *new_table = NULL;
>>  
>>  #ifdef CONFIG_ACPI_CUSTOM_DSDT
>> -if (strncmp(existing_table->signature, "DSDT", 4) == 0)
>> -*new_table = (struct acpi_table_header *)AmlCode;
>> +if (!strncmp(existing_table->signature, "DSDT", 4)) {
>> +*new_table = (struct acpi_table_header *)&amlcode;
>> +if (!(*new_table))
>> +*new_table = (struct acpi_table_header *)&dsdt_amlcode;
>> +}
>>  #endif
>>  if (*new_table != NULL)
>>  acpi_table_taint(existing_table);
>
>

Re: [PATCH v1 2/2] usb:cdns3 Add Cadence USB3 DRD Driver

2018-12-11 Thread Peter Chen

> >> +tmode = le16_to_cpu(ctrl->wIndex);
> >> +
> >> +if (!set || (tmode & 0xff) != 0)
> >> +return -EINVAL;
> >> +
> >> +switch (tmode >> 8) {
> >> +case TEST_J:
> >> +case TEST_K:
> >> +case TEST_SE0_NAK:
> >> +case TEST_PACKET:
> >> +cdns3_set_register_bit(&priv_dev->regs->usb_cmd,
> >> +   USB_CMD_STMODE |
> >> +   USB_STS_TMODE_SEL(tmode - 1));
> >
> >I'm 90% sure this won't work. There's a reason why we only enter the
> >requested test mode from status stage. How have you tested this?
>

What's the reason?
It can work although the code is a little different with above, I
tested it using USBxHSETT tool at Windows.

> From USB spec:
> "The transition to test mode must be complete no later than 3 ms after the 
> completion of the status stage of the
> request."
> But I don't remember any issues with this test on other ours drivers. Maybe 
> status stage
> is armed in this case by controller. I have to confirm how it works with 
> hardware team.
> Driver doesn't know when status stage was completed. We don't have
> any event on status stage completion.  I haven't checked it yet with tester 
> on this driver.
>

> >> +irqreturn_t ret = IRQ_NONE;
> >> +unsigned long flags;
> >> +u32 reg;
> >> +
> >> +priv_dev = cdns->gadget_dev;
> >> +spin_lock_irqsave(&priv_dev->lock, flags);
> >
> >you're already running in hardirq context. Why do you need this lock at
> >all? I would be better to use the hardirq handler to mask your
> >interrupts, so they don't fire again, then used the top-half (softirq)
> >handler to actually handle the interrupts.
>

This controller may be ran at SMP environment, register and flag access
needs to be protected among CPUs running.

> Yes, spin_lock_irqsave is not necessary here.
>
> Do you mean replacing devm_request_irq with a request_threaded_irq ?
> I have single interrupt line shared between  Host, Driver, DRD/OTG.
> I'm not sure if it will work more efficiently.
>
> >
> >> +/* check USB device interrupt */
> >> +reg = readl(&priv_dev->regs->usb_ists);
> >> +writel(reg, &priv_dev->regs->usb_ists);
> >> +
> >> +if (reg) {
> >> +dev_dbg(priv_dev->dev, "IRQ: usb_ists: %08X\n", reg);
> >
> >I strongly advise against using dev_dbg() for debugging. Even more so
> >inside your IRQ handler.

Felipe, I use Dynamic Debug for debugging, and show debug messages with
"dmesg" after testing/debugging. I see dwc3 using trace,  any benefits
for switching
to trace?

> Ok, It's not necessary in this place, especially, that there is invoked trace 
> point
> inside cdns3_check_usb_interrupt_proceed which makes the same.
>

Peter

RE: [PATCH] spi: lpspi: Fix CLK pin becomes low before one transfer

2018-12-11 Thread Clark Wang

Hi Mark,

These five patches are based on patch series spi: lpspi: Add Slave Mode support 
for LPSPI. For the patch series make a big change, should I hold these five 
patches until the patch series are applied in your git branch?

Thanks for your patience.

Regards,
Clark Wang

> -Original Message-
> From: Mark Brown 
> Sent: Tuesday, December 11, 2018 22:30
> To: Clark Wang 
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] spi: lpspi: Fix CLK pin becomes low before one transfer
> 
> On Tue, Dec 04, 2018 at 06:23:19AM +, Clark Wang wrote:
> > Remove Reset operation in fsl_lpspi_config(). This RST may cause both
> > CLK and CS pins go from high to low level under cs-gpio mode.
> > Add fsl_lpspi_reset() function after one message transfer to clear all
> > flags in use.
> 
> This doesn't apply against current code, please check and resend.

RE: [PATCH] usb: renesas_usbhs: mark PM functions as __maybe_unused

2018-12-11 Thread Yoshihiro Shimoda

Hi Arnd,

Thank you for the patch!

> From: Arnd Bergmann, Sent: Tuesday, December 11, 2018 7:06 PM
> To: Greg Kroah-Hartman 
> Cc: Arnd Bergmann ; Yoshihiro Shimoda 
> ; Felipe Balbi
> ; Simon Horman ; 
> Chris Brandt ;
> linux-...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [PATCH] usb: renesas_usbhs: mark PM functions as __maybe_unused
> 
> Without CONFIG_PM, we get a new build warning here:
> 
> drivers/usb/renesas_usbhs/common.c:860:12: error: 'usbhsc_resume' defined but 
> not used [-Werror=unused-function]
>  static int usbhsc_resume(struct device *dev)
> ^
> drivers/usb/renesas_usbhs/common.c:844:12: error: 'usbhsc_suspend' defined 
> but not used [-Werror=unused-function]
>  static int usbhsc_suspend(struct device *dev)
> ^~
> 
> No idea why this never happened before, but it's trivial to work
> around by marking the functions as __maybe_unused so the compiler
> can silently discard them.

This build warning cause the commit d54d334e75b9 ("usb: renesas_usbhs:
Use SIMPLE_DEV_PM_OPS macro").

So, I'd like to add the following tag on this commit log.

Fixes: d54d334e75b9 ("usb: renesas_usbhs: Use SIMPLE_DEV_PM_OPS macro")

Also, we can remove "No idea why this never happened before, but"
from the commit log.

> Signed-off-by: Arnd Bergmann 

After revised the commit log a little:

Reviewed-by: Yoshihiro Shimoda 

Best regards,
Yoshihiro Shimoda

> ---
>  drivers/usb/renesas_usbhs/common.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/usb/renesas_usbhs/common.c 
> b/drivers/usb/renesas_usbhs/common.c
> index 02c1d2bf4f86..2ff7991f4517 100644
> --- a/drivers/usb/renesas_usbhs/common.c
> +++ b/drivers/usb/renesas_usbhs/common.c
> @@ -841,7 +841,7 @@ static int usbhs_remove(struct platform_device *pdev)
>   return 0;
>  }
> 
> -static int usbhsc_suspend(struct device *dev)
> +static __maybe_unused int usbhsc_suspend(struct device *dev)
>  {
>   struct usbhs_priv *priv = dev_get_drvdata(dev);
>   struct usbhs_mod *mod = usbhs_mod_get_current(priv);
> @@ -857,7 +857,7 @@ static int usbhsc_suspend(struct device *dev)
>   return 0;
>  }
> 
> -static int usbhsc_resume(struct device *dev)
> +static __maybe_unused int usbhsc_resume(struct device *dev)
>  {
>   struct usbhs_priv *priv = dev_get_drvdata(dev);
>   struct platform_device *pdev = usbhs_priv_to_pdev(priv);
> --
> 2.20.0

Re: [PATCH 0/2] Graph fixes for using multiple endpoints per port

2018-12-11 Thread Kuninori Morimoto



Hi Tony

> > https://patchwork.kernel.org/patch/10712877/
> 
> Hmm, so do you have multiple separate ports at the "&sound" node
> hardware? If so then yeah multiple ports make sense.
>
> But if you only a single physical (I2S?) port at the
> "&sound" node hardware, then IMO you should only have one
> port and multiple endpoints there according to the graph.txt
> binding doc.
> 
> In my McBSP case there is only a single physical I2S port
> that can be TDM split into timeslots.

Mine has 4 DAIs. Each DAI can output 2ch.
These will be merged and wil be 8ch TDM and goes to Codec.
But hmm.. it is 4 DAIs, but 1 "physical" interface...

So, your patch seems correct, but will breaks DPCM...
I will confirm it.

Best regards
---
Kuninori Morimoto

RE: [PATCH] spi: lpspi: Add cs-gpio support

2018-12-11 Thread Clark Wang



> -Original Message-
> From: Mark Brown 
> Sent: Tuesday, December 11, 2018 22:26
> To: Clark Wang 
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] spi: lpspi: Add cs-gpio support
> 
> On Tue, Dec 04, 2018 at 06:24:59AM +, Clark Wang wrote:
> 
> > Add cs-gpio feature for LPSPI. Use fsl_lpspi_prepare_message() and
> > fsl_lpspi_unprepare_message() to enable and control cs line.
> > These two functions will be only called at the beginning and the
> > ending of a message transfer.
> 
> > Still support using the mode without cs-gpio. It depends on if
> > attribute cs-gpio has been configured in dts file.
> 
> Why is this not using the core support for GPIO chip selects?  Note that you
> can't just implement chip select in the prepare and unprepare, drivers can
> toggle chip select within a message so the code should be looking at the
> individual transfers to see if cs_change is set and acting accordingly.

Ok, I will try to use the core support for GPIO chip selects.

Regards,
Clark Wang

RE: [PATCH] spi: lpspi: enable runtime pm for lpspi

2018-12-11 Thread Clark Wang



> -Original Message-
> From: Mark Brown 
> Sent: Tuesday, December 11, 2018 22:28
> To: Clark Wang 
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] spi: lpspi: enable runtime pm for lpspi
> 
> On Tue, Dec 04, 2018 at 06:19:48AM +, Clark Wang wrote:
> > From: Han Xu 
> >
> > Enable the runtime power management for lpspi module.
> >
> > Signed-off-by: Han Xu 
> > Reviewed-by: Frank Li 
> > ---
> 
> There's no signed-off-by from you here?  These are required, please see
> submitting-patches.rst.

Oh, there should have my signed-off-by here. For the big changes with 4.9 in 
our git branch, I did some adaptation work.

Regards,
Clark Wang

[tip:timers/core] timekeeping: Convert to DEFINE_SHOW_ATTRIBUTE

2018-12-11 Thread tip-bot for Yangtao Li

Commit-ID:  5b20c6fd6a60e182243da31c47f2ebff5b0e3d57
Gitweb: https://git.kernel.org/tip/5b20c6fd6a60e182243da31c47f2ebff5b0e3d57
Author: Yangtao Li 
AuthorDate: Tue, 11 Dec 2018 11:37:44 -0500
Committer:  Thomas Gleixner 
CommitDate: Tue, 11 Dec 2018 18:13:35 -0800

timekeeping: Convert to DEFINE_SHOW_ATTRIBUTE

Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.

Signed-off-by: Yangtao Li 
Signed-off-by: Thomas Gleixner 
Cc: john.stu...@linaro.org
Cc: sb...@kernel.org
Link: https://lkml.kernel.org/r/20181211163744.22133-1-tiny.win...@gmail.com

---
 kernel/time/timekeeping_debug.c | 15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/kernel/time/timekeeping_debug.c b/kernel/time/timekeeping_debug.c
index f811882cfd13..86489950d690 100644
--- a/kernel/time/timekeeping_debug.c
+++ b/kernel/time/timekeeping_debug.c
@@ -19,7 +19,7 @@
 
 static unsigned int sleep_time_bin[NUM_BINS] = {0};
 
-static int tk_debug_show_sleep_time(struct seq_file *s, void *data)
+static int tk_debug_sleep_time_show(struct seq_file *s, void *data)
 {
unsigned int bin;
seq_puts(s, "  time (secs)count\n");
@@ -33,18 +33,7 @@ static int tk_debug_show_sleep_time(struct seq_file *s, void 
*data)
}
return 0;
 }
-
-static int tk_debug_sleep_time_open(struct inode *inode, struct file *file)
-{
-   return single_open(file, tk_debug_show_sleep_time, NULL);
-}
-
-static const struct file_operations tk_debug_sleep_time_fops = {
-   .open   = tk_debug_sleep_time_open,
-   .read   = seq_read,
-   .llseek = seq_lseek,
-   .release= single_release,
-};
+DEFINE_SHOW_ATTRIBUTE(tk_debug_sleep_time);
 
 static int __init tk_debug_sleep_time_init(void)
 {

Re: [RFT PATCH v1 1/4] Documentation: DT: arm: add support for sockets defining package boundaries

2018-12-11 Thread Rob Herring

On Thu, 29 Nov 2018 15:28:17 -0800, Atish Patra wrote:
> From: Sudeep Holla 
> 
> The current ARM DT topology description provides the operating system
> with a topological view of the system that is based on leaf nodes
> representing either cores or threads (in an SMT system) and a
> hierarchical set of cluster nodes that creates a hierarchical topology
> view of how those cores and threads are grouped.
> 
> However this hierarchical representation of clusters does not allow to
> describe what topology level actually represents the physical package or
> the socket boundary, which is a key piece of information to be used by
> an operating system to optimize resource allocation and scheduling.
> 
> Lets add a new "socket" node type in the cpu-map node to describe the
> same.
> 
> Signed-off-by: Sudeep Holla 
> ---
>  Documentation/devicetree/bindings/arm/topology.txt | 52 
> --
>  1 file changed, 39 insertions(+), 13 deletions(-)
> 

Reviewed-by: Rob Herring

Re: [PATCH v2 1/4] vmalloc: New flags for safe vfree on special perms

2018-12-11 Thread Andy Lutomirski

On Tue, Dec 11, 2018 at 4:12 PM Rick Edgecombe
 wrote:
>
> This adds two new flags VM_IMMEDIATE_UNMAP and VM_HAS_SPECIAL_PERMS, for
> enabling vfree operations to immediately clear executable TLB entries to freed
> pages, and handle freeing memory with special permissions.
>
> In order to support vfree being called on memory that might be RO, the vfree
> deferred list node is moved to a kmalloc allocated struct, from where it is
> today, reusing the allocation being freed.
>
> arch_vunmap is a new __weak function that implements the actual unmapping and
> resetting of the direct map permissions. It can be overridden by more 
> efficient
> architecture specific implementations.
>
> For the default implementation, it uses architecture agnostic methods which 
> are
> equivalent to what most usages do before calling vfree. So now it is just
> centralized here.
>
> This implementation derives from two sketches from Dave Hansen and Andy
> Lutomirski.
>
> Suggested-by: Dave Hansen 
> Suggested-by: Andy Lutomirski 
> Suggested-by: Will Deacon 
> Signed-off-by: Rick Edgecombe 
> ---
>  include/linux/vmalloc.h |  2 ++
>  mm/vmalloc.c| 73 +
>  2 files changed, 69 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 398e9c95cd61..872bcde17aca 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -21,6 +21,8 @@ struct notifier_block;/* in notifier.h */
>  #define VM_UNINITIALIZED   0x0020  /* vm_struct is not fully 
> initialized */
>  #define VM_NO_GUARD0x0040  /* don't add guard page */
>  #define VM_KASAN   0x0080  /* has allocated kasan shadow 
> memory */
> +#define VM_IMMEDIATE_UNMAP 0x0200  /* flush before releasing 
> pages */
> +#define VM_HAS_SPECIAL_PERMS   0x0400  /* may be freed with special 
> perms */
>  /* bits [20..32] reserved for arch specific ioremap internals */
>
>  /*
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 97d4b25d0373..02b284d2245a 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -38,6 +39,11 @@
>
>  #include "internal.h"
>
> +struct vfree_work {
> +   struct llist_node node;
> +   void *addr;
> +};
> +
>  struct vfree_deferred {
> struct llist_head list;
> struct work_struct wq;
> @@ -50,9 +56,13 @@ static void free_work(struct work_struct *w)
>  {
> struct vfree_deferred *p = container_of(w, struct vfree_deferred, wq);
> struct llist_node *t, *llnode;
> +   struct vfree_work *cur;
>
> -   llist_for_each_safe(llnode, t, llist_del_all(&p->list))
> -   __vunmap((void *)llnode, 1);
> +   llist_for_each_safe(llnode, t, llist_del_all(&p->list)) {
> +   cur = container_of(llnode, struct vfree_work, node);
> +   __vunmap(cur->addr, 1);
> +   kfree(cur);
> +   }
>  }
>
>  /*** Page table manipulation functions ***/
> @@ -1494,6 +1504,48 @@ struct vm_struct *remove_vm_area(const void *addr)
> return NULL;
>  }
>
> +/*
> + * This function handles unmapping and resetting the direct map as 
> efficiently
> + * as it can with cross arch functions. The three categories of architectures
> + * are:
> + *   1. Architectures with no set_memory implementations and no direct map
> + *  permissions.
> + *   2. Architectures with set_memory implementations but no direct map
> + *  permissions
> + *   3. Architectures with set_memory implementations and direct map 
> permissions
> + */
> +void __weak arch_vunmap(struct vm_struct *area, int deallocate_pages)

My general preference is to avoid __weak functions -- they don't
optimize well.  Instead, I prefer either:

#ifndef arch_vunmap
void arch_vunmap(...);
#endif

or

#ifdef CONFIG_HAVE_ARCH_VUNMAP
...
#endif


> +{
> +   unsigned long addr = (unsigned long)area->addr;
> +   int immediate = area->flags & VM_IMMEDIATE_UNMAP;
> +   int special = area->flags & VM_HAS_SPECIAL_PERMS;
> +
> +   /*
> +* In case of 2 and 3, use this general way of resetting the 
> permissions
> +* on the directmap. Do NX before RW, in case of X, so there is no W^X
> +* violation window.
> +*
> +* For case 1 these will be noops.
> +*/
> +   if (immediate)
> +   set_memory_nx(addr, area->nr_pages);
> +   if (deallocate_pages && special)
> +   set_memory_rw(addr, area->nr_pages);

Can you elaborate on the intent here?  VM_IMMEDIATE_UNMAP means "I
want that alias gone before any deallocation happens".
VM_HAS_SPECIAL_PERMS means "I mucked with the direct map -- fix it for
me, please".  deallocate means "this was vfree -- please free the
pages".  I'm not convinced that all the various combinations make
sense.  Do we really need both flags?

(VM_IMMED

Re: [PATCH] printk: Remove print_prefix() calls with NULL buffer.

2018-12-11 Thread Sergey Senozhatsky

On (12/11/18 18:49), Tetsuo Handa wrote:
> 
> We can save lines/size by removing print_prefix() with buf == NULL.
> This patch makes no functional change.
> 
> Signed-off-by: Tetsuo Handa 

Looks good to me,
Reviewed-by: Sergey Senozhatsky 


Shouldn't this also have "Suggested-by: Petr Mladek" ?

-ss

Re: [RFT PATCH v1 2/4] dt-binding: cpu-topology: Move cpu-map to a common binding.

2018-12-11 Thread Rob Herring

On Mon, Dec 03, 2018 at 09:23:42AM -0800, Atish Patra wrote:
> On 12/3/18 8:55 AM, Sudeep Holla wrote:
> > On Thu, Nov 29, 2018 at 03:28:18PM -0800, Atish Patra wrote:
> > > cpu-map binding can be used to described cpu topology for both
> > > RISC-V & ARM. It makes more sense to move the binding to document
> > > to a common place.
> > > 
> > > The relevant discussion can be found here.
> > > https://lkml.org/lkml/2018/11/6/19
> > > 
> > 
> > Looks good to me apart from a minor query below in the example.
> > 
> > Reviewed-by: Sudeep Holla 
> > 
> > > Signed-off-by: Atish Patra 
> > > ---
> > >   .../{arm/topology.txt => cpu/cpu-topology.txt} | 81 
> > > ++
> > >   1 file changed, 67 insertions(+), 14 deletions(-)
> > >   rename Documentation/devicetree/bindings/{arm/topology.txt => 
> > > cpu/cpu-topology.txt} (86%)
> > > 
> > > diff --git a/Documentation/devicetree/bindings/arm/topology.txt 
> > > b/Documentation/devicetree/bindings/cpu/cpu-topology.txt
> > > similarity index 86%
> > > rename from Documentation/devicetree/bindings/arm/topology.txt
> > > rename to Documentation/devicetree/bindings/cpu/cpu-topology.txt
> > > index 66848355..1de6fbce 100644
> > > --- a/Documentation/devicetree/bindings/arm/topology.txt
> > > +++ b/Documentation/devicetree/bindings/cpu/cpu-topology.txt
> > 
> > [...]
> > 
> > > +Example 3: HiFive Unleashed (RISC-V 64 bit, 4 core system)
> > > +
> > > +cpus {
> > > + #address-cells = <2>;
> > > + #size-cells = <2>;
> > > + compatible = "sifive,fu540g", "sifive,fu500";
> > > + model = "sifive,hifive-unleashed-a00";
> > > +
> > > + ...
> > > +
> > > + cpu-map {
> > > + cluster0 {
> > > + core0 {
> > > + cpu = <&L12>;
> > > + };
> > > + core1 {
> > > + cpu = <&L15>;
> > > + };
> > > + core2 {
> > > + cpu0 = <&L18>;
> > > + };
> > > + core3 {
> > > + cpu0 = <&L21>;
> > > + };
> > > + };
> > > + };
> > > +
> > > + L12: cpu@1 {
> > > + device_type = "cpu";
> > > + compatible = "sifive,rocket0", "riscv";
> > > + reg = <0x1>;
> > > + }
> > > +
> > > + L15: cpu@2 {
> > > + device_type = "cpu";
> > > + compatible = "sifive,rocket0", "riscv";
> > > + reg = <0x2>;
> > > + }
> > > + L18: cpu@3 {
> > > + device_type = "cpu";
> > > + compatible = "sifive,rocket0", "riscv";
> > > + reg = <0x3>;
> > > + }
> > > + L21: cpu@4 {
> > > + device_type = "cpu";
> > > + compatible = "sifive,rocket0", "riscv";
> > > + reg = <0x4>;
> > > + }
> > > +};
> > 
> > The labels for the CPUs drew my attention. Is it intentionally random
> > (or even specific) or just chosen to show anything can be used as labels ?
> 
> SiFive generates the device tree from RTL directly. So I am not sure if they
> assign random numbers or a particular algorithm chooses the label. I tried
> to put the exact ones that is available publicly.
> 
> https://github.com/riscv/riscv-device-tree-doc/blob/master/examples/sifive-hifive_unleashed-microsemi.dts

Oh, that's really terrible. I wouldn't care as this was just source 
level stuff, but labels are part of the ABI with overlays.

Rob

Re: [PATCH v2 4/4] x86/vmalloc: Add TLB efficient x86 arch_vunmap

2018-12-11 Thread Andy Lutomirski

On Tue, Dec 11, 2018 at 4:12 PM Rick Edgecombe
 wrote:
>
> This adds a more efficient x86 architecture specific implementation of
> arch_vunmap, that can free any type of special permission memory with only 1 
> TLB
> flush.
>
> In order to enable this, _set_pages_p and _set_pages_np are made non-static 
> and
> renamed set_pages_p_noflush and set_pages_np_noflush to better communicate
> their different (non-flushing) behavior from the rest of the set_pages_*
> functions.
>
> The method for doing this with only 1 TLB flush was suggested by Andy
> Lutomirski.
>
> Suggested-by: Andy Lutomirski 
> Signed-off-by: Rick Edgecombe 
> ---
>  arch/x86/include/asm/set_memory.h |  2 +
>  arch/x86/mm/Makefile  |  3 +-
>  arch/x86/mm/pageattr.c| 11 +++--
>  arch/x86/mm/vmalloc.c | 71 +++
>  4 files changed, 80 insertions(+), 7 deletions(-)
>  create mode 100644 arch/x86/mm/vmalloc.c
>
> diff --git a/arch/x86/include/asm/set_memory.h 
> b/arch/x86/include/asm/set_memory.h
> index 07a25753e85c..70ee81e8914b 100644
> --- a/arch/x86/include/asm/set_memory.h
> +++ b/arch/x86/include/asm/set_memory.h
> @@ -84,6 +84,8 @@ int set_pages_x(struct page *page, int numpages);
>  int set_pages_nx(struct page *page, int numpages);
>  int set_pages_ro(struct page *page, int numpages);
>  int set_pages_rw(struct page *page, int numpages);
> +int set_pages_np_noflush(struct page *page, int numpages);
> +int set_pages_p_noflush(struct page *page, int numpages);
>
>  extern int kernel_set_to_readonly;
>  void set_kernel_text_rw(void);
> diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
> index 4b101dd6e52f..189681f863a6 100644
> --- a/arch/x86/mm/Makefile
> +++ b/arch/x86/mm/Makefile
> @@ -13,7 +13,8 @@ CFLAGS_REMOVE_mem_encrypt_identity.o  = -pg
>  endif
>
>  obj-y  :=  init.o init_$(BITS).o fault.o ioremap.o extable.o pageattr.o 
> mmap.o \
> -   pat.o pgtable.o physaddr.o setup_nx.o tlb.o cpu_entry_area.o
> +   pat.o pgtable.o physaddr.o setup_nx.o tlb.o cpu_entry_area.o \
> +   vmalloc.o
>
>  # Make sure __phys_addr has no stackprotector
>  nostackp := $(call cc-option, -fno-stack-protector)
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index db7a10082238..db0a4dfb5a7f 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -2248,9 +2248,7 @@ int set_pages_rw(struct page *page, int numpages)
> return set_memory_rw(addr, numpages);
>  }
>
> -#ifdef CONFIG_DEBUG_PAGEALLOC
> -
> -static int __set_pages_p(struct page *page, int numpages)
> +int set_pages_p_noflush(struct page *page, int numpages)

Maybe set_pages_rwp_noflush()?

> diff --git a/arch/x86/mm/vmalloc.c b/arch/x86/mm/vmalloc.c
> new file mode 100644
> index ..be9ea42c3dfe
> --- /dev/null
> +++ b/arch/x86/mm/vmalloc.c
> @@ -0,0 +1,71 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * vmalloc.c: x86 arch version of vmalloc.c
> + *
> + * (C) Copyright 2018 Intel Corporation
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; version 2
> + * of the License.

This paragraph may be redundant with the SPDX line.

> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +static void set_area_direct_np(struct vm_struct *area)
> +{
> +   int i;
> +
> +   for (i = 0; i < area->nr_pages; i++)
> +   set_pages_np_noflush(area->pages[i], 1);
> +}
> +
> +static void set_area_direct_prw(struct vm_struct *area)
> +{
> +   int i;
> +
> +   for (i = 0; i < area->nr_pages; i++)
> +   set_pages_p_noflush(area->pages[i], 1);
> +}
> +
> +void arch_vunmap(struct vm_struct *area, int deallocate_pages)
> +{
> +   int immediate = area->flags & VM_IMMEDIATE_UNMAP;
> +   int special = area->flags & VM_HAS_SPECIAL_PERMS;
> +
> +   /* Unmap from vmalloc area */
> +   remove_vm_area(area->addr);
> +
> +   /* If no need to reset directmap perms, just check if need to flush */
> +   if (!(deallocate_pages || special)) {
> +   if (immediate)
> +   vm_unmap_aliases();
> +   return;
> +   }
> +
> +   /* From here we need to make sure to reset the direct map perms */
> +
> +   /*
> +* If the area being freed does not have any extra capabilities, we 
> can
> +* just reset the directmap to RW before freeing.
> +*/
> +   if (!immediate) {
> +   set_area_direct_prw(area);
> +   vm_unmap_aliases();
> +   return;
> +   }
> +
> +   /*
> +* If the vm being freed has security sensitive capabilities such as
> +* executable we need to make sure there is no W window on the 
> directmap
> +* before removing the X in the TLB. So we set not present first so we
> +* can flush without any other CPU picking up the mapping. T

Re: rcu_preempt caused oom

2018-12-11 Thread Paul E. McKenney

On Wed, Dec 12, 2018 at 01:37:40AM +, He, Bo wrote:
> We reproduced the issue panic in hung_task with the patch "Improve 
> diagnostics for failed RCU grace-period start", but unfortunately maybe it's 
> due to the loglevel, the show_rcu_gp_kthreads doesn't print any logs, we will 
> improve the build and run the test to double check.

Well, at least the diagnostics didn't prevent the problem from happening.  ;-)

Thanx, Paul

> -Original Message-
> From: Paul E. McKenney  
> Sent: Tuesday, December 11, 2018 12:47 PM
> To: He, Bo 
> Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; 
> j...@joshtriplett.org; mathieu.desnoy...@efficios.com; 
> jiangshan...@gmail.com; Zhang, Jun ; Xiao, Jin 
> ; Zhang, Yanmin ; Bai, Jie A 
> 
> Subject: Re: rcu_preempt caused oom
> 
> On Mon, Dec 10, 2018 at 04:38:38PM -0800, Paul E. McKenney wrote:
> > On Mon, Dec 10, 2018 at 06:56:18AM +, He, Bo wrote:
> > > Hi, 
> > >We have start the test with the CONFIG_PROVE_RCU=y, and also add one 
> > > 2s to detect the preempt rcu hang, hope we can get more useful logs 
> > > tomorrow.
> > >I also enclosed the config and the debug patches for you review.
> > 
> > I instead suggest the (lightly tested) debug patch shown below, which 
> > tracks wakeups of RCU's grace-period kthreads and dumps them out if a 
> > given requested grace period fails to start.  Again, it is necessary 
> > to build with CONFIG_PROVE_RCU=y, that is, with CONFIG_PROVE_LOCKING=y.
> 
> Right.  This time without commenting out the wakeup as a test of the 
> diagnostic.  :-/
> 
> Please use the patch below instead of the one that I sent in my previous 
> email.
> 
>   Thanx, Paul
> 
> 
> 
> commit adfc7dff659495a3433d5084256be59eee0ac6df
> Author: Paul E. McKenney 
> Date:   Mon Dec 10 16:33:59 2018 -0800
> 
> rcu: Improve diagnostics for failed RCU grace-period start
> 
> Backported from v4.21/v5.0
> 
> If a grace period fails to start (for example, because you commented
> out the last two lines of rcu_accelerate_cbs_unlocked()), rcu_core()
> will invoke rcu_check_gp_start_stall(), which will notice and complain.
> However, this complaint is lacking crucial debugging information such
> as when the last wakeup executed and what the value of ->gp_seq was at
> that time.  This commit therefore removes the current pr_alert() from
> rcu_check_gp_start_stall(), instead invoking show_rcu_gp_kthreads(),
> which has been updated to print the needed information, which is collected
> by rcu_gp_kthread_wake().
> 
> Signed-off-by: Paul E. McKenney 
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 
> 0b760c1369f7..4bcd8753e293 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -626,25 +626,57 @@ void rcu_sched_force_quiescent_state(void)
>  }
>  EXPORT_SYMBOL_GPL(rcu_sched_force_quiescent_state);
>  
> +/*
> + * Convert a ->gp_state value to a character string.
> + */
> +static const char *gp_state_getname(short gs) {
> + if (gs < 0 || gs >= ARRAY_SIZE(gp_state_names))
> + return "???";
> + return gp_state_names[gs];
> +}
> +
> +/*
> + * Return the root node of the specified rcu_state structure.
> + */
> +static struct rcu_node *rcu_get_root(struct rcu_state *rsp) {
> + return &rsp->node[0];
> +}
> +
>  /*
>   * Show the state of the grace-period kthreads.
>   */
>  void show_rcu_gp_kthreads(void)
>  {
>   int cpu;
> + unsigned long j;
> + unsigned long ja;
> + unsigned long jr;
> + unsigned long jw;
>   struct rcu_data *rdp;
>   struct rcu_node *rnp;
>   struct rcu_state *rsp;
>  
> + j = jiffies;
>   for_each_rcu_flavor(rsp) {
> - pr_info("%s: wait state: %d ->state: %#lx\n",
> - rsp->name, rsp->gp_state, rsp->gp_kthread->state);
> + ja = j - READ_ONCE(rsp->gp_activity);
> + jr = j - READ_ONCE(rsp->gp_req_activity);
> + jw = j - READ_ONCE(rsp->gp_wake_time);
> + pr_info("%s: wait state: %s(%d) ->state: %#lx delta 
> ->gp_activity %lu ->gp_req_activity %lu ->gp_wake_time %lu ->gp_wake_seq %ld 
> ->gp_seq %ld ->gp_seq_needed %ld ->gp_flags %#x\n",
> + rsp->name, gp_state_getname(rsp->gp_state),
> + rsp->gp_state,
> + rsp->gp_kthread ? rsp->gp_kthread->state : 0x1L,
> + ja, jr, jw, (long)READ_ONCE(rsp->gp_wake_seq),
> + (long)READ_ONCE(rsp->gp_seq),
> + (long)READ_ONCE(rcu_get_root(rsp)->gp_seq_needed),
> + READ_ONCE(rsp->gp_flags));
>   rcu_for_each_node_breadth_first(rsp, rnp) {
>   if (ULONG_CMP_GE(rsp->gp_seq, rnp->gp_seq_needed))
>   continue;
>

Re: [RFC PATCH 4/4] x86/TSC: Use RDTSCP

2018-12-11 Thread Andy Lutomirski

> On Dec 11, 2018, at 3:39 PM, Borislav Petkov  wrote:
>
>> On Tue, Dec 11, 2018 at 11:12:41PM +, Lendacky, Thomas wrote:
>> It does seem overloaded in that sense, but the feature means that LFENCE
>> is serializing and so can be used in rdtsc_ordered. In the same sense,
>> barrier_nospec is looking for whether LFENCE is serializing and preferring
>> that over MFENCE since it is lighter weight.
>>
>> In light of how they're being used now, they could probably stand to be
>> renamed in some way.
>
> Actually, come to think of it, what really matters here is whether
> LFENCE is serializing or not. Because if so, you wanna replace with LFENCE
> as it is lighter. And in that case a single alternative() - not _2() -
> should suffice.
>
> BUT(!), that still is not good enough if you do some qemu CPU models
> like pentium or so which don't even have MFENCE and cause stuff like
> this:
>
> https://lkml.kernel.org/r/20181123200307.ga6...@roeck-us.net
>
> Which means, that you *do* have to alternate between
>
> * no insn at all
> * MFENCE
> * LFENCE, if it is serializing
>
> so barrier_nospec() does the right thing, AFAICS. And this is why we
> need an ALTERNATIVE_3() to add RDTSCP into the mix too.
>
> WRT renaming, I guess we can do something like:
>
> * X86_FEATURE_MFENCE_RDTSC -> X86_FEATURE_MFENCE - to mean that CPU has
> MFENCE support.
>
> and
>
> * X86_FEATURE_LFENCE_RDTSC -> X86_FEATURE_LFENCE_SERIALIZING
>
> Or something to that effect.

This makes me nervous, since no one knows what “serializing” means.
IIRC AMD specifically documents that MFENCE is required before RDTSC
to get sensible ordering.  So it’s entirely plausible to me that
LFENCE is okay for Spectre mitigation but MFENCE is needed for RDTSC
on some CPU.

Re: Can we drop upstream Linux x32 support?

2018-12-11 Thread Andy Lutomirski

> On Dec 11, 2018, at 3:35 PM, Thorsten Glaser  wrote:
>
> Andy Lutomirski dixit:
>
>> What happens if someone adds a struct like:
>>
>> struct nasty_on_x32 {
>> __kernel_long_t a;
>> void * __user b;
>> };
>>
>> On x86_64, that's two 8-byte fields.  On x86_32, it's two four-byte
>> fields.  On x32, it's an 8-byte field and a 4-byte field.  Now what?
>
> Yes, that’s indeed ugly. I understand. But don’t we already have
> this problem with architectures which support multiple ABIs at the
> same time? An amd64 kernel with i386 userspace comes to mind, or
> the multiple MIPS ABIs.

That’s the thing, though: the whole generic kernel compat
infrastructure assumes there are at most two ABIs: native and, if
enabled and relevant, compat. x32 breaks this entirely.

>
>> I'm sure we could have some magic gcc plugin or other nifty tool that
>> gives us:
>>
>> copy_from_user(struct struct_name, kernel_ptr, user_ptr);
>
> Something like that might be useful. Generate call stubs, which
> then call the syscall implementation with the actual user-space
> struct contents as arguments. Hm, that might be too generic to
> be useful. Generate macros that can read from or write specific
> structures to userspace?
>
> I think something like this could solve other more general problems
> as well, so it might be “nice to have anyway”. Of course it’s work,
> and I’m not involved enough in Linux kernel programming to be able
> to usefully help with it (doing too much elsewhere already).
>
>> actually do this work.  Instead we get ad hoc fixes for each syscall,
>> along the lines of preadv64v2(), which get done when somebody notices
>
> Yes, that’s absolutely ugly and ridiculous and all kinds of bad.
>
> On the other hand, from my current experience, someone (Arnd?) noticed
> all the currently existing baddies for x32 already and fixed them.
>
> New syscalls are indeed an issue, but perhaps something generating
> copyinout stubs could help. This might allow other architectures
> that could do with a new ABI but have until now feared the overhead
> as well. (IIRC, m68k could do with a new ABI that reserves a register
> for TLS, but Geert would know. At the same time, time_t and off_t could
> be bumped to 64 bit. Something like that. If changing sizes of types
> shared between kernel and user spaces is not something feared…)

Magic autogenerated stubs would be great.  Difficult, too, given
unions, multiplexers, cmsg, etc.

I suppose I will see how bad it would be to split out the x32 syscall
table and at least isolate the mess to some extent.

IMO the real right solution would be to push the whole problem to
userspace: get an ILP32 system working with almost or entirely LP64
syscalls.  POSIX support might have to be a bit flexible, but still.
How hard would it be to have __attribute__((ilp64)), with an optional
warning if any embedded structs are not ilp64?  This plus a wrapper to
make sure that mmap puts everything below 4GB ought to do the trick.
Or something like what arm64 is proposing where the kernel ABI has
32-bit long doesn’t seem too horrible.

Re: [PATCH] printk: Add caller information to printk() output.

2018-12-11 Thread Sergey Senozhatsky

On (12/11/18 19:26), Tetsuo Handa wrote:
> @@ -688,12 +701,21 @@ static ssize_t msg_print_ext_header(char *buf, size_t 
> size,
>   struct printk_log *msg, u64 seq)
>  {
>   u64 ts_usec = msg->ts_nsec;
> + char from[18];

[..]

> +#ifdef CONFIG_PRINTK_FROM
> +static size_t print_from(u32 id, char *buf)
> +{
> + char from[12];

Are those supposed to be of different sizes: 18 and 12?

-ss

RFC [PATCH 1/1] locking/lockdep: Fix nest lock warning on unlock

2018-12-11 Thread Derek Basehore

The function __lock_acquire checks that the nest lock is held passed
in as an argument. The issue with this is that __lock_acquire is used
for internal bookkeeping on lock_release. This produces a false
positive lockdep warning on unlock. Since you explicitly don't need to
hold the nest lock on unlock, this is an issue.

This fixes the problem by only checking the nest lock on the actual
lock acquire step.

Signed-off-by: Derek Basehore 
---
 kernel/locking/lockdep.c | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 1efada2dd9dd..2e7297ee6596 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3155,15 +3155,15 @@ EXPORT_SYMBOL_GPL(lockdep_init_map);
 struct lock_class_key __lockdep_no_validate__;
 EXPORT_SYMBOL_GPL(__lockdep_no_validate__);
 
-static int
-print_lock_nested_lock_not_held(struct task_struct *curr,
-   struct held_lock *hlock,
+static void
+print_lock_nested_lock_not_held(struct lockdep_map *lock,
+   struct lockdep_map *nest_lock,
unsigned long ip)
 {
if (!debug_locks_off())
-   return 0;
+   return;
if (debug_locks_silent)
-   return 0;
+   return;
 
pr_warn("\n");
pr_warn("==\n");
@@ -3171,22 +3171,21 @@ print_lock_nested_lock_not_held(struct task_struct 
*curr,
print_kernel_ident();
pr_warn("--\n");
 
-   pr_warn("%s/%d is trying to lock:\n", curr->comm, task_pid_nr(curr));
-   print_lock(hlock);
+   pr_warn("%s/%d is trying to lock:\n", current->comm,
+   task_pid_nr(current));
+   pr_warn("%s\n", lock->name);
 
pr_warn("\nbut this task is not holding:\n");
-   pr_warn("%s\n", hlock->nest_lock->name);
+   pr_warn("%s\n", nest_lock->name);
 
pr_warn("\nstack backtrace:\n");
dump_stack();
 
pr_warn("\nother info that might help us debug this:\n");
-   lockdep_print_held_locks(curr);
+   lockdep_print_held_locks(current);
 
pr_warn("\nstack backtrace:\n");
dump_stack();
-
-   return 0;
 }
 
 static int __lock_is_held(const struct lockdep_map *lock, int read);
@@ -3335,9 +3334,6 @@ static int __lock_acquire(struct lockdep_map *lock, 
unsigned int subclass,
}
chain_key = iterate_chain_key(chain_key, class_idx);
 
-   if (nest_lock && !__lock_is_held(nest_lock, -1))
-   return print_lock_nested_lock_not_held(curr, hlock, ip);
-
if (!validate_chain(curr, lock, hlock, chain_head, chain_key))
return 0;
 
@@ -3843,6 +3839,9 @@ void lock_acquire(struct lockdep_map *lock, unsigned int 
subclass,
trace_lock_acquire(lock, subclass, trylock, read, check, nest_lock, ip);
__lock_acquire(lock, subclass, trylock, read, check,
   irqs_disabled_flags(flags), nest_lock, ip, 0, 0);
+   if (nest_lock && !__lock_is_held(nest_lock, -1))
+   print_lock_nested_lock_not_held(lock, nest_lock, ip);
+
current->lockdep_recursion = 0;
raw_local_irq_restore(flags);
 }
-- 
2.20.0.rc2.403.gdbc3b29805-goog

RFC [PATCH 0/1] Fix lockdep false positive

2018-12-11 Thread Derek Basehore

I'm not sure if I'm breaking any detection with this patch since I
haven't looked at that lockdep code before. I do know that the unlock
order for locks with a nest lock should not matter, though.
Specifically, you should be able to unlock the nest lock followed by
all the locks nested underneath it.

Derek Basehore (1):
  locking/lockdep: Fix nest lock warning on unlock

 kernel/locking/lockdep.c | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

-- 
2.20.0.rc2.403.gdbc3b29805-goog

Re: [PATCH v9 3/4] dt-bindings: pps: pps-gpio PPS ECHO implementation

2018-12-11 Thread tom burkart


Quoting Rob Herring :


On Mon, Dec 10, 2018 at 8:17 PM tom burkart  wrote:


Quoting Rob Herring :

> On Mon, Nov 26, 2018 at 10:06 PM tom burkart  wrote:
>>
>> Quoting Rob Herring :
>>
>> > On Thu, Nov 22, 2018 at 3:49 AM Tom Burkart  wrote:
>> >>
>> >> This patch implements the device tree changes required for the pps
>> >> echo functionality for pps-gpio, that sysfs claims is available
>> >> already.
>> >>
>> >> This patch was originally written by Lukas Senger as part of a masters
>> >> thesis project and modified for inclusion into the linux kernel by Tom
>> >> Burkart.
>> >>
>> >> Signed-off-by: Lukas Senger 
>> >> Signed-off-by: Tom Burkart 
>> >> ---
>> >>  Documentation/devicetree/bindings/pps/pps-gpio.txt | 9 +
>> >>  1 file changed, 9 insertions(+)
>> >>
>> >> diff --git a/Documentation/devicetree/bindings/pps/pps-gpio.txt
>> >> b/Documentation/devicetree/bindings/pps/pps-gpio.txt
>> >> index 1155d49c2699..e09f6f2405c5 100644
>> >> --- a/Documentation/devicetree/bindings/pps/pps-gpio.txt
>> >> +++ b/Documentation/devicetree/bindings/pps/pps-gpio.txt
>> >> @@ -7,10 +7,15 @@ Required properties:
>> >>  - compatible: should be "pps-gpio"
>> >>  - gpios: one PPS GPIO in the format described by ../gpio/gpio.txt
>> >>
>> >> +Additional required properties for the PPS ECHO functionality:
>> >> +- echo-gpios: one PPS ECHO GPIO in the format described by
>> ../gpio/gpio.txt
>> >> +- echo-active-ms: duration in ms of the active portion of  
the echo pulse

>> >> +
>> >>  Optional properties:
>> >>  - assert-falling-edge: when present, assert is indicated by a
>> falling edge
>> >> (instead of by a rising edge)
>> >>  - capture-clear: when present, also capture the PPS clear event
>> >> +- invert-pps-echo: when present, invert the PPS ECHO pulse
>> >
>> > Why do you need this? Can't you just make the echo gpio  
GPIO_ACTIVE_LOW?

>> >
>> > BTW, using the flag probably should have been done for
>> > 'assert-falling-edge' as well.
>>
>> The hardware I use expects a positive-going echo pulse, however, it
>> was really easy to give users the option to have it inverted in case
>> they use different hardware that expects a negative-going edge.
>
> It will be even easier to implement if you use GPIO_ACTIVE_LOW or
> GPIO_ACTIVE_HIGH as appropriate. If the flag is set appropriately,
> then gpiod_set_value(gpio, 1) asserts the pulse and
> gpiod_set_value(gpio, 0) deasserts it no matter which way the h/w is
> wired. You can then get rid of invert_pps_echo in the driver.

Hi Rob,
I have looked at the appropriate changes to my code to implement the
above.  However, there is no GPIO_ACTIVE_* as part of the gpiod_flags
enum (include/linux/gpio/consumer.h).

What am I missing?


You don't need to know in the driver. You set the gpio state to 1 for
active, 0 for inactive and that does the right thing based on the flag
in the DT. IOW, if the DT defines the GPIO as active low, then setting
the gpio state to 1 will result in the GPIO being 0V. It's a bit
annoying and confusing at first until you realize the driver can just
handle either polarity transparently.


Hi Rob,
thanks a lot!  The penny finally dropped.

My patch v12 will have the changes implemented (no invert-pps-echo).

Tom

Re: [PATCH] printk: Add caller information to printk() output.

2018-12-11 Thread Sergey Senozhatsky

On (12/12/18 11:25), Sergey Senozhatsky wrote:
> On (12/11/18 19:26), Tetsuo Handa wrote:
> > @@ -688,12 +701,21 @@ static ssize_t msg_print_ext_header(char *buf, size_t 
> > size,
> > struct printk_log *msg, u64 seq)
> >  {
> > u64 ts_usec = msg->ts_nsec;
> > +   char from[18];
> 
> [..]
> 
> > +#ifdef CONFIG_PRINTK_FROM
> > +static size_t print_from(u32 id, char *buf)
> > +{
> > +   char from[12];
> 
> Are those supposed to be of different sizes: 18 and 12?

Yeah, they are. strlen(",from="). Sorry for the noise.

The patch looks good to me.

-ss

Re: [RFT PATCH v1 2/4] dt-binding: cpu-topology: Move cpu-map to a common binding.

2018-12-11 Thread Rob Herring

On Thu, Nov 29, 2018 at 03:28:18PM -0800, Atish Patra wrote:
> cpu-map binding can be used to described cpu topology for both
> RISC-V & ARM. It makes more sense to move the binding to document
> to a common place.
> 
> The relevant discussion can be found here.
> https://lkml.org/lkml/2018/11/6/19
> 
> Signed-off-by: Atish Patra 
> ---
>  .../{arm/topology.txt => cpu/cpu-topology.txt} | 81 
> ++
>  1 file changed, 67 insertions(+), 14 deletions(-)
>  rename Documentation/devicetree/bindings/{arm/topology.txt => 
> cpu/cpu-topology.txt} (86%)
> 
> diff --git a/Documentation/devicetree/bindings/arm/topology.txt 
> b/Documentation/devicetree/bindings/cpu/cpu-topology.txt
> similarity index 86%
> rename from Documentation/devicetree/bindings/arm/topology.txt
> rename to Documentation/devicetree/bindings/cpu/cpu-topology.txt
> index 66848355..1de6fbce 100644
> --- a/Documentation/devicetree/bindings/arm/topology.txt
> +++ b/Documentation/devicetree/bindings/cpu/cpu-topology.txt
> @@ -1,12 +1,12 @@
>  ===
> -ARM topology binding description
> +CPU topology binding description
>  ===
>  
>  ===
>  1 - Introduction
>  ===
>  
> -In an ARM system, the hierarchy of CPUs is defined through three entities 
> that
> +In a SMP system, the hierarchy of CPUs is defined through three entities that
>  are used to describe the layout of physical CPUs in the system:
>  
>  - socket
> @@ -14,9 +14,6 @@ are used to describe the layout of physical CPUs in the 
> system:
>  - core
>  - thread
>  
> -The cpu nodes (bindings defined in [1]) represent the devices that
> -correspond to physical CPUs and are to be mapped to the hierarchy levels.
> -
>  The bottom hierarchy level sits at core or thread level depending on whether
>  symmetric multi-threading (SMT) is supported or not.
>  
> @@ -25,33 +22,37 @@ threads existing in the system and map to the hierarchy 
> level "thread" above.
>  In systems where SMT is not supported "cpu" nodes represent all cores present
>  in the system and map to the hierarchy level "core" above.
>  
> -ARM topology bindings allow one to associate cpu nodes with hierarchical 
> groups
> +CPU topology bindings allow one to associate cpu nodes with hierarchical 
> groups
>  corresponding to the system hierarchy; syntactically they are defined as 
> device
>  tree nodes.
>  
> -The remainder of this document provides the topology bindings for ARM, based
> -on the Devicetree Specification, available from:
> +Currently, only ARM/RISC-V intend to use this cpu topology binding but it 
> may be
> +used for any other architecture as well.
>  
> -https://www.devicetree.org/specifications/
> +The remainder of this document provides the topology bindings for 
> ARM/RISC-V, based

You already said who are current users, why restrict it to ARM and 
RISC-V here?

> +on the Devicetree Specification, available at [4].
> +
> +The cpu nodes (bindings defined in [1] for ARM or [2] for RISC-V) represent 
> the devices that
> +correspond to physical CPUs and are to be mapped to the hierarchy levels.

The cpu topology isn't dependent on anything beyond what the DT spec 
says for cpu nodes so I think this can be simplified to just refer to 
the spec.

Plus, shouldn't [2] (numa) be [3] here.

>  If not stated otherwise, whenever a reference to a cpu node phandle is made 
> its
>  value must point to a cpu node compliant with the cpu node bindings as
> -documented in [1].
> +documented in [1] or [3] for respective ISA.
>  A topology description containing phandles to cpu nodes that are not 
> compliant
> -with bindings standardized in [1] is therefore considered invalid.
> +with bindings standardized in [1] or [3] is therefore considered invalid.
>  
>  ===
>  2 - cpu-map node
>  ===
>  
> -The ARM CPU topology is defined within the cpu-map node, which is a direct
> +The ARM/RISC-V CPU topology is defined within the cpu-map node, which is a 
> direct
>  child of the cpus node and provides a container where the actual topology
>  nodes are listed.
>  
>  - cpu-map node
>  
> - Usage: Optional - On ARM SMP systems provide CPUs topology to the OS.
> -   ARM uniprocessor systems do not require a topology
> + Usage: Optional - On SMP systems provide CPUs topology to the OS.
> +   Uniprocessor systems do not require a topology
> description and therefore should not define a
> cpu-map node.
>  
> @@ -494,8 +495,60 @@ cpus {
>   };
>  };
>  
> +Example 3: HiFive Unleashed (RISC-V 64 bit, 4 core system)
> +
> +cpus {
> + #address-cells = <2>;
> + #size-cells = <2>;
> + compatible = "sifive,fu540g", "sifive,fu500";
> + model = "sifive,hifive-unleashed-a00";

This i

Re: [PATCH 1/2] dt-binding: remoteproc: Remove lpass_aon clock from adsp pil clock list

2018-12-11 Thread Rob Herring

On Fri, 30 Nov 2018 12:59:09 +0530, Rohit kumar wrote:
> LPASS_Audio_Wrapper_AON clock is on by default. Remove
> it from lpass clock list to avoid voting for it.
> 
> Signed-off-by: Rohit kumar 
> ---
>  Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 

Reviewed-by: Rob Herring

Re: [PATCH 1/6] media: dt-bindings: media: sun6i: Separate H3 compatible from A31

2018-12-11 Thread Rob Herring

On Fri, 30 Nov 2018 15:58:44 +0800, Chen-Yu Tsai wrote:
> The CSI controller found on the H3 (and H5) is a reduced version of the
> one found on the A31. It only has 1 channel, instead of 4 channels for
> time-multiplexed BT.656. Since the H3 is a reduced version, it cannot
> "fallback" to a compatible that implements more features than it
> supports.
> 
> Split out the H3 compatible as a separate entry, with no fallback.
> 
> Fixes: b7eadaa3a02a ("media: dt-bindings: media: sun6i: Add A31 and H3
> compatibles")
> Signed-off-by: Chen-Yu Tsai 
> ---
>  Documentation/devicetree/bindings/media/sun6i-csi.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Reviewed-by: Rob Herring

[tip:x86/urgent] x86/mm: Fix decoy address handling vs 32-bit builds

2018-12-11 Thread tip-bot for Dan Williams

Commit-ID:  51c3fbd89d7554caa3290837604309f8d8669d99
Gitweb: https://git.kernel.org/tip/51c3fbd89d7554caa3290837604309f8d8669d99
Author: Dan Williams 
AuthorDate: Tue, 11 Dec 2018 07:49:39 -0800
Committer:  Thomas Gleixner 
CommitDate: Tue, 11 Dec 2018 18:28:20 -0800

x86/mm: Fix decoy address handling vs 32-bit builds

A decoy address is used by set_mce_nospec() to update the cache attributes
for a page that may contain poison (multi-bit ECC error) while attempting
to minimize the possibility of triggering a speculative access to that
page.

When reserve_memtype() is handling a decoy address it needs to convert it
to its real physical alias. The conversion, AND'ing with __PHYSICAL_MASK,
is broken for a 32-bit physical mask and reserve_memtype() is passed the
last physical page. Gert reports triggering the:

BUG_ON(start >= end);

...assertion when running a 32-bit non-PAE build on a platform that has
a driver resource at the top of physical memory:

BIOS-e820: [mem 0xfff0-0x] reserved

Given that the decoy address scheme is only targeted at 64-bit builds and
assumes that the top of physical address space is free for use as a decoy
address range, simply bypass address sanitization in the 32-bit case.

Lastly, there was no need to crash the system when this failure occurred,
and no need to crash future systems if the assumptions of decoy addresses
are ever violated. Change the BUG_ON() to a WARN() with an error return.

Fixes: 510ee090abc3 ("x86/mm/pat: Prepare {reserve, free}_memtype() for...")
Reported-by: Gert Robben 
Signed-off-by: Dan Williams 
Signed-off-by: Thomas Gleixner 
Tested-by: Gert Robben 
Cc: sta...@vger.kernel.org
Cc: Andy Shevchenko 
Cc: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: platform-driver-...@vger.kernel.org
Cc: 
Link: 
https://lkml.kernel.org/r/154454337985.789277.1213328839166465.st...@dwillia2-desk3.amr.corp.intel.com

---
 arch/x86/mm/pat.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 08013524fba1..4fe956a63b25 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -519,8 +519,13 @@ static u64 sanitize_phys(u64 address)
 * for a "decoy" virtual address (bit 63 clear) passed to
 * set_memory_X(). __pa() on a "decoy" address results in a
 * physical address with bit 63 set.
+*
+* Decoy addresses are not present for 32-bit builds, see
+* set_mce_nospec().
 */
-   return address & __PHYSICAL_MASK;
+   if (IS_ENABLED(CONFIG_X86_64))
+   return address & __PHYSICAL_MASK;
+   return address;
 }
 
 /*
@@ -546,7 +551,11 @@ int reserve_memtype(u64 start, u64 end, enum 
page_cache_mode req_type,
 
start = sanitize_phys(start);
end = sanitize_phys(end);
-   BUG_ON(start >= end); /* end is exclusive */
+   if (start >= end) {
+   WARN(1, "%s failed: [mem %#010Lx-%#010Lx], req %s\n", __func__,
+   start, end - 1, cattr_name(req_type));
+   return -EINVAL;
+   }
 
if (!pat_enabled()) {
/* This is identical to page table setting without PAT */

Re: Can we drop upstream Linux x32 support?

2018-12-11 Thread Thorsten Glaser

Andy Lutomirski dixit:

>That’s the thing, though: the whole generic kernel compat
>infrastructure assumes there are at most two ABIs: native and, if
>enabled and relevant, compat. x32 breaks this entirely.

MIPS had o32, n32, n64 since like forever.

ARM has old ABI, EABI and now 64-bit.

Other architectures are yet to come.

>IMO the real right solution would be to push the whole problem to
>userspace: get an ILP32 system working with almost or entirely LP64

Is this a reflex of Linux kernel developers? ;-)

I doubt that userspace is the right place for this, remember
the recent glibc vs. syscalls debate. It would also need to
multiply across various libcs.

>How hard would it be to have __attribute__((ilp64)), with an optional
>warning if any embedded structs are not ilp64?  This plus a wrapper to

You mean LP64. Impossible, because LP64 vs. ILP32 is not the only
difference between amd64 and x32.

bye,
//mirabilos
-- 
I believe no one can invent an algorithm. One just happens to hit upon it
when God enlightens him. Or only God invents algorithms, we merely copy them.
If you don't believe in God, just consider God as Nature if you won't deny
existence.  -- Coywolf Qi Hunt

Re: [PATCH] dt-bindings: arm: mrvl: amend Browstone compatible string

2018-12-11 Thread Rob Herring

On Sun, Dec 02, 2018 at 12:40:08PM +0100, Lubomir Rintel wrote:
> The Brownstone board is compatible with "mrvl,mmp2". The actual DTS
> already contains the string -- add it to the binding doc as well.
> 
> Signed-off-by: Lubomir Rintel 
> ---
>  Documentation/devicetree/bindings/arm/mrvl/mrvl.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied.

Rob

RE: [PATCH v3] thermal: qoriq: add multiple sensors support

2018-12-11 Thread Andy Tang



> -Original Message-
> From: Eduardo Valentin 
> Sent: 2018年11月30日 1:21
> To: Daniel Lezcano 
> Cc: Andy Tang ; rui.zh...@intel.com;
> linux...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v3] thermal: qoriq: add multiple sensors support
> 
> On Wed, Nov 21, 2018 at 10:41:36AM +0100, Daniel Lezcano wrote:
> > On 21/11/2018 10:16, Andy Tang wrote:
> > > Hi Daniel,
> > >
> > > Thanks for your explanation. The problem is these two trees are not synced
> well.
> > > Let's take our driver(qoriq_thermal.c) for example.
> > >
> > > Git log on Rui's tree next branch:
> > > 2dfef65 thermal: qoriq: Switch to SPDX identifier
> > > 1a893a5 thermal: qoriq: Simplify the 'site' variable assignment
> > > f1506a6 thermal: qoriq: Use devm_thermal_zone_of_sensor_register()
> > > c30d5d5 thermal: qoriq: constify thermal_zone_of_device_ops
> > > structures
> > > 0e77488 thermal: qoriq: remove useless call for
> > > of_thermal_get_trip_points()
> > > 4352844 thermal: qoriq: Add thermal management support
> > >
> > > Git log on linux-soc-thermal tree branch next:
> > > 6017e2a thermal: qoriq: add i.mx8mq support
> > > 9b96566 thermal: Convert to using %pOFn instead of device_node.name
> > > c30d5d5 thermal: qoriq: constify thermal_zone_of_device_ops
> > > structures
> > > 0e77488 thermal: qoriq: remove useless call for
> > > of_thermal_get_trip_points()
> > > 4352844 thermal: qoriq: Add thermal management support
> > >
> > > You can see that the first 2-3 commits on these two tress are different.
> > >
> > > The strange thing is they seems sync well on Linus' tree:
> > > 0ef7791 Merge branch 'linus' of
> > > git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-the
> > > rmal 6017e2a thermal: qoriq: add i.mx8mq support
> > > 9b96566 thermal: Convert to using %pOFn instead of device_node.name
> > > 2dfef65 thermal: qoriq: Switch to SPDX identifier
> > > 1a893a5 thermal: qoriq: Simplify the 'site' variable assignment
> > > f1506a6 thermal: qoriq: Use devm_thermal_zone_of_sensor_register()
> > > c30d5d5 thermal: qoriq: constify thermal_zone_of_device_ops
> > > structures
> > > 0e77488 thermal: qoriq: remove useless call for
> > > of_thermal_get_trip_points()
> > > 4352844 thermal: qoriq: Add thermal management support
> > >
> > > Currently my patch was created based on Run's tree, probably I should
> rebase it to soc tree.
> > > But whichever tree I use, it can't be merged to Linus' tree without 
> > > conflict.
> > >
> > > Something I missed?
> >
> > No.
> >
> > Eduardo, Rui,
> >
> > why not create a 'thermal' group ala 'tip' group with a single tree
> > and two branches:
> >
> > thermal/next
> > thermal/fixes
> >
> >  - Rui takes the core changes.
> >  - Eduardo takes the SoC changes.
> >
> >  - Both commit to thermal/next
> >  - Both commit to thermal/fixes
> >  - Both merge thermal/fixes into thermal/core as often as possible.
> >
> > That will help to have a more up to date branch, simplify the patch
> > submission path and reduce the latency for the merge windows.
> >
> > If you need help, I can take care of applying the fixes only and merge
> > them to thermal/next.
> >
> > That is how the tip subsystem works, Peter Ziljstra, Ingo Molnar,
> > Thomas Gleixner, have all permissions to commit in the tip tree but
> > they take care of their subsystems. If one is away for vacations or
> > whatever, someone else can take over during the absence.
> >
> 
> Yeah, that is a setup people have been following. It does not necessarily mean
> it will work for all cases though.
> 
> I believe regardless of process and tree setup what we are lacking here is a
> documentation of how things are being done.
> 
> As I mentioned, I will work on writing something up to document at least what
> we have today before any change in process gets in place.
> 
[Andy] Besides the document, what about this patch? Could it be merged before 
the document is ready?

BR,
Andy
> >
> >
> >
> >
> > >> -Original Message-
> > >> From: Daniel Lezcano 
> > >> Sent: 2018年11月21日 16:44
> > >> To: Andy Tang ; rui.zh...@intel.com;
> > >> edubez...@gmail.com
> > >> Cc: linux...@vger.kernel.org; linux-kernel@vger.kernel.org
> > >> Subject: Re: [PATCH v3] thermal: qoriq: add multiple sensors
> > >> support
> > >>
> > >> On 21/11/2018 02:34, Andy Tang wrote:
> > >>> Hi all,
> > >>>
> > >>> Do you have any comments on this patch?
> > >>>
> > >>> I found for our thermal driver(qoriq_thermal.c) there are
> > >>> different
> > >> between the following two git trees:
> > >>> git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux.git
> > >>> branch: next
> > >>>
> > >> git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-th
> > >> ermal.gi
> > >> t.
> > >>> branch: next
> > >>>
> > >>> Could you please clarify which git tree/branch should I use?
> > >>
> > >> SoC changes are submitted against linux-soc-thermal.git.
> > >>
> > >> Generic thermal framework are sent against Zhang Rui's tree but it
> > >> happens somet

Re: [RFC PATCH v3 0/4] x86: Add exception fixup for SGX ENCLU

2018-12-11 Thread Dr. Greg

On Tue, Dec 11, 2018 at 03:10:52PM -0800, Andy Lutomirski wrote:

Good evening, I hope the day has gone well for everyone.

> > > > On Dec 11, 2018, at 8:52 AM, Sean Christopherson 
> > > >  wrote:
> > > >
> > > > This isn't fundamentally different than forcing all EENTER
> > > > calls through the vDSO, which is also per-process.
> > > > Technically this is more flexible in that regard since
> > > > userspace gets to choose where their one ENCLU gets to reside.
> > > > Userspace can have per-enclave entry flows so long as the
> > > > actual ENLU[EENTER] is common, same as vDSO.

> > > Right. The problem is that user libraries have a remarkably hard
> > > time agreeing on where their one copy of anything lives.

> > Are you concerned about userspace shooting themselves in the foot,
> > e.g.  unknowingly overwriting their handler?  Requiring
> > unregister->register to change the handler would mitigate that
> > issue for the most part.  Or we could even say it's a write-once
> > property.
> >
> > That obviously doesn't solve the issue of a userspace application
> > deliberately using two different libraries to run enclaves in a
> > single process, but I have a hard time envisioning a scenario
> > where someone would want to use two different *SGX* libraries in a
> > single process.  Don't most of the signal issue arise due to
> > loading multiple libraries that provide *different* services
> > needing to handle signals?

> I can easily imagine two SGX libraries that know nothing about each
> other running in the same process.  One or both could be PKCS#11
> modules, for example.

Very good, I see that Sean agreed with this down thread.  I was
concerned that our discussion was lacking precision and we were
talking past one another.

> I suspect that Linux will eventually want the ability for libraries
> to register exception handlers, but that's not going to get designed
> and implemented quickly enough for SGX's initial Linux rollout.  A
> vDSO helper like in your earlier series should solve most of the
> problem without any contention issues.

Let me see if I can impart some framework for additional clarity as
discussions proceed forward.

I believe it would be helpful if we could agree to refer to a body of
code, possibly in library form, that loads, initializes and executes
an enclave as an 'SGX runtime'.  In this framework, the term 'library'
refers to code that an application links to for domain specific
functionality, ie. libpkcs11, libkrb5, libsasl.  These 'libraries' may
implement enclaves, using 'SGX runtimes' of their choice, to improve
their security guarantees.

In this model it is the 'SGX runtime' that is responsible for
registering SGX exception handlers under their management.

In order for mainline Linux SGX support to be relevant, it must admit
mutually distrusting 'SGX runtimes' in the same process context.  The
SGX exception handler architecture must also support the notion of
'nested enclave' invocation where an enclave may execute an OCALL and
then go on to execute an enclave, possibly based on a different 'SGX
runtime', before returning into its previous enclave.

Hopefully the above will help assist further discussions.

Have a good evening.

Dr. Greg

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.   Specializing in information infra-structure
Fargo, ND  58102development.
PH: 701-281-1686
FAX: 701-281-3949   EMAIL: g...@enjellic.com
--
"If you think nobody cares if you're alive, try missing a couple of car
 payments."
-- Earl Wilson

Re: [PATCH ghak59 V3 0/4] audit: config_change normalizations and event record gathering

2018-12-11 Thread Richard Guy Briggs

On 2018-12-11 18:26, Paul Moore wrote:
> On Tue, Dec 11, 2018 at 5:41 PM Richard Guy Briggs  wrote:
> > On 2018-12-11 17:31, Paul Moore wrote:
> > > On Mon, Dec 10, 2018 at 5:18 PM Richard Guy Briggs  
> > > wrote:
> 
> ...
> 
> > > > Richard Guy Briggs (4):
> > > >   audit: give a clue what CONFIG_CHANGE op was involved
> > > >   audit: add syscall information to CONFIG_CHANGE records
> > > >   audit: hand taken context to audit_kill_trees for syscall logging
> > > >   audit: extend config_change mark/watch/tree rule changes
> > > >
> > > >  kernel/audit.c  | 33 +++--
> > > >  kernel/audit.h  |  4 ++--
> > > >  kernel/audit_fsnotify.c |  4 ++--
> > > >  kernel/audit_tree.c | 28 +++-
> > > >  kernel/audit_watch.c|  8 +---
> > > >  kernel/auditfilter.c|  2 +-
> > > >  kernel/auditsc.c| 12 ++--
> > > >  7 files changed, 54 insertions(+), 37 deletions(-)
> > >
> > > In order to make sure expectations are set appropriately, as we are at
> > > -rc6 right now this is not something that would go into audit/next now
> > > (assuming everything looks okay on review), it would go into
> > > audit/next *after* the upcoming merge window.
> >
> > I agree it is a bit late for this.  I wasn't expecting it to go in this
> > one.  I'm filling the queue since I'm blocked on other review for
> > ghak81(5.5wks), ghak90(5.5wks), ghak100(3.5wks).  ghak90 missed another
> > merge window.
> 
> As discussed previously, GHAK81
> (https://github.com/linux-audit/audit-kernel/issues/81) is something
> that I consider part of the audit container ID work (GHAK90).  I
> believe it's time to stop treating it as a separate issue.

Fine by me.  It was included in the ghak90 patchset this time and still
is in v5, waiting to get the questions replied to that arose out of the
review of v4 around Hallowe'en.

> The audit container ID work, GHAK90
> (https://github.com/linux-audit/audit-kernel/issues/90), is where all
> the dragon's lie.  That one takes a good deal of time to review, and
> quite frankly I'm really the only one who seems to be looking at it
> anymore, so it takes a bit longer.

We're working on finding other reviewers.

> Beside the fact that GHAK100
> (https://github.com/linux-audit/audit-kernel/issues/100) was marked as
> a RFC, I've been waiting to hear back from the VFS folks if they are
> comfortable with it.  Miklos Szeredi in particular had some concerns
> and it isn't clear to me from that thread that his concerns have been
> resolved.

I'm fine with Miklos' concerns and have ideas to address them.  I'd be
quite interested in your quick review to see if it is headed in the
right direction and I'm also hoping for opinions from you and the vfs
guys on Steve's question.

> paul moore

- RGB

--
Richard Guy Briggs 
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

Re: [PATCH] usb: typec: tcpm: Extend the matching rules on PPS APDO selection

2018-12-11 Thread Kyle Tso

On Mon, Dec 10, 2018 at 7:36 PM Adam Thomson
 wrote:
>
> On 10 December 2018 09:01, Adam Thomson wrote:
>
> > On 06 December 2018 03:02, Kyle Tso wrote:
> >
> > > Current matching rules ensure that the voltage range of selected
> > > Source Capability is entirely within the range defined in one of the
> > > Sink Capabilities. This is reasonable but not practical because Sink
> > > may not support wide range of voltage when sinking power while Source
> > > could advertise its capabilities in raletively wider range. For
> > > example, a Source PDO advertising 3.3V-11V@3A (9V Prog of Fixed
> > > Nominal Voltage) will not be selected if the Sink requires 5V- 12V@3A
> > > PPS power. However, the Sink could work well if the requested voltage 
> > > range in
> > RDOs is 5V-11V@3A.
> >
> > Is there a real world example of a sink requiring the 5V - 12V range? In 
> > that
> > scenario could we not add an additional sink capability which allows for 
> > this range
> > to be supported, and the current implementation should work just fine?
>
> Ok, I maybe should have waited until after my morning coffee to respond. So
> because the lower limit on the sink side, is higher than the advertised 
> source's
> PPS minimum voltage it never gets selected? Personally I'd prefer to keep the
> upper limit checking as is as I think that's an additional safety benefit
> helping to prevent over-voltage scenarios. I think if a PPS APDO can supply up
> to 11V then the system should be capable of handling that voltage, otherwise
> it shouldn't be considered at all. The Source provides limits checking as well
> to make sure the Sink doesn't request a value above the maximum voltage limit
> for that selected APDO.
>

If the over-voltage occurs, it means:
1. the adapter malfunctioned. or
2. the code on the Sink accidentally requests a voltage level which is
over the limit of the Sink.

For 1., it is difficult to predict the behaviors of a malfunctioned
adapter. The over-voltage event may happen even if the Sink doesn't
select the APDO from this broken adapter.
For 2., it is difficult to predict the behaviors from the careless code as well.

> For the lower limit I'm more inclined to agree with allowing a higher minimum
> on the sink side as that's less of a safety/damage issue as I understand it.
> FWIW, what is the real world scenario? What happens if voltage drops below 5V?
>

Some products (in Sink mode) have under-voltage protection (the lower
bound might be around 3.8V - 4V before
the calculation of IR-drop) that will cause the disconnection.

thanks,
Kyle

Re: [PATCH net] net: mvneta: fix operation for 64K PAGE_SIZE

2018-12-11 Thread Jisheng Zhang

Hi,

On Tue, 11 Dec 2018 13:56:49 +0100 Marcin Wojtas wrote:

> Recent changes in the mvneta driver reworked allocation
> and handling of the ingress buffers to use entire pages.
> Apart from that in SW BM scenario the HW must be informed
> via PRXDQS about the biggest possible incoming buffer
> that can be propagated by RX descriptors.
> 
> The BufferSize field was filled according to the MTU-dependent
> pkt_size value. Later change to PAGE_SIZE broke RX operation
> when usin 64K pages, as the field is simply too small.
> 
> This patch conditionally limits the value passed to the BufferSize
> of the PRXDQS register, depending on the PAGE_SIZE used.
> On the occasion remove now unused frag_size field of the mvneta_port
> structure.
> 
> Fixes: 562e2f467e71 ("net: mvneta: Improve the buffer allocation
> method for SWBM")

IMHO, we'd better revert 562e2f467e71 and 7e47fd84b56bb

The issue commit 562e2f467e71 wants to solve is due to commit 7e47fd84b56bb
It looks a bit wired, to introduce regression then submit another commit(in
the same patch set) solve it

Per my test, after reverting 562e2f467e71 and 7e47fd84b56bb, I can't reproduce
what's claimed in commit 562e2f467e71 -- "With system having a small memory
(around 256MB), the state "cannot allocate memory to refill with new buffer"
is reach pretty quickly."


> 
> Signed-off-by: Marcin Wojtas 
> ---
>  drivers/net/ethernet/marvell/mvneta.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/marvell/mvneta.c 
> b/drivers/net/ethernet/marvell/mvneta.c
> index e5397c8..61b2349 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -408,7 +408,6 @@ struct mvneta_port {
>   struct mvneta_pcpu_stats __percpu   *stats;
>  
>   int pkt_size;
> - unsigned int frag_size;
>   void __iomem *base;
>   struct mvneta_rx_queue *rxqs;
>   struct mvneta_tx_queue *txqs;
> @@ -2905,7 +2904,9 @@ static void mvneta_rxq_hw_init(struct mvneta_port *pp,
>   if (!pp->bm_priv) {
>   /* Set Offset */
>   mvneta_rxq_offset_set(pp, rxq, 0);
> - mvneta_rxq_buf_size_set(pp, rxq, pp->frag_size);
> + mvneta_rxq_buf_size_set(pp, rxq, PAGE_SIZE < SZ_64K ?
> + PAGE_SIZE :
> + MVNETA_RX_BUF_SIZE(pp->pkt_size));
>   mvneta_rxq_bm_disable(pp, rxq);
>   mvneta_rxq_fill(pp, rxq, rxq->size);
>   } else {
> @@ -3760,7 +3761,6 @@ static int mvneta_open(struct net_device *dev)
>   int ret;
>  
>   pp->pkt_size = MVNETA_RX_PKT_SIZE(pp->dev->mtu);
> - pp->frag_size = PAGE_SIZE;
>  
>   ret = mvneta_setup_rxqs(pp);
>   if (ret)

[PATCH v12 3/3] pps: pps-gpio pps-echo implementation

2018-12-11 Thread Tom Burkart

This patch implements the pps echo functionality for pps-gpio, that
sysfs claims is available already.

Configuration is done via device tree bindings.

This patch was originally written by Lukas Senger as part of a masters
thesis project and modified for inclusion into the linux kernel by Tom
Burkart.

Signed-off-by: Lukas Senger 
Signed-off-by: Tom Burkart 
---
 drivers/pps/clients/pps-gpio.c | 94 --
 include/linux/pps-gpio.h   |  2 +
 2 files changed, 93 insertions(+), 3 deletions(-)

diff --git a/drivers/pps/clients/pps-gpio.c b/drivers/pps/clients/pps-gpio.c
index 5d764dceabc5..b239c4b9ff51 100644
--- a/drivers/pps/clients/pps-gpio.c
+++ b/drivers/pps/clients/pps-gpio.c
@@ -35,6 +35,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /* Info for each registered platform device */
 struct pps_gpio_device_data {
@@ -42,8 +44,13 @@ struct pps_gpio_device_data {
struct pps_device *pps; /* PPS source device */
struct pps_source_info info;/* PPS source information */
struct gpio_desc *gpio_pin; /* GPIO port descriptors */
+   struct gpio_desc *echo_pin;
+   struct timer_list echo_timer;   /* timer to reset echo active state */
bool assert_falling_edge;
bool capture_clear;
+   bool enable_pps_echo;
+   unsigned int echo_active_ms;/* PPS echo active duration */
+   unsigned long echo_timeout; /* timer timeout value in jiffies */
 };
 
 /*
@@ -64,19 +71,57 @@ static irqreturn_t pps_gpio_irq_handler(int irq, void *data)
rising_edge = gpiod_get_value(info->gpio_pin);
if ((rising_edge && !info->assert_falling_edge) ||
(!rising_edge && info->assert_falling_edge))
-   pps_event(info->pps, &ts, PPS_CAPTUREASSERT, NULL);
+   pps_event(info->pps, &ts, PPS_CAPTUREASSERT, data);
else if (info->capture_clear &&
((rising_edge && info->assert_falling_edge) ||
(!rising_edge && !info->assert_falling_edge)))
-   pps_event(info->pps, &ts, PPS_CAPTURECLEAR, NULL);
+   pps_event(info->pps, &ts, PPS_CAPTURECLEAR, data);
 
return IRQ_HANDLED;
 }
 
+static void pps_gpio_echo(struct pps_device *pps, int event, void *data)
+{
+   /* add_timer() needs to write into info->echo_timer */
+   struct pps_gpio_device_data *info;
+
+   info = data;
+
+   switch (event) {
+   case PPS_CAPTUREASSERT:
+   if (pps->params.mode & PPS_ECHOASSERT)
+   gpiod_set_value(info->echo_pin, 1);
+   break;
+
+   case PPS_CAPTURECLEAR:
+   if (pps->params.mode & PPS_ECHOCLEAR)
+   gpiod_set_value(info->echo_pin, 1);
+   break;
+   }
+
+   /* fire the timer */
+   if (info->pps->params.mode & (PPS_ECHOASSERT | PPS_ECHOCLEAR)) {
+   info->echo_timer.expires = jiffies + info->echo_timeout;
+   add_timer(&info->echo_timer);
+   }
+}
+
+/* Timer callback to reset the echo pin to the inactive state */
+static void pps_gpio_echo_timer_callback(struct timer_list *t)
+{
+   const struct pps_gpio_device_data *info;
+
+   info = from_timer(info, t, echo_timer);
+
+   gpiod_set_value(info->echo_pin, 0);
+}
+
 static int pps_gpio_setup(struct platform_device *pdev)
 {
struct pps_gpio_device_data *data = platform_get_drvdata(pdev);
struct device_node *np = pdev->dev.of_node;
+   int ret;
+   u32 value;
 
data->gpio_pin = devm_gpiod_get(&pdev->dev,
NULL,   /* request "gpios" */
@@ -87,6 +132,35 @@ static int pps_gpio_setup(struct platform_device *pdev)
return PTR_ERR(data->gpio_pin);
}
 
+   if (of_property_read_bool(np, "echo-gpios")) {
+   data->enable_pps_echo = true;
+
+   data->echo_pin = devm_gpiod_get(&pdev->dev,
+   "echo",
+   GPIOD_OUT_LOW);
+   if (IS_ERR(data->echo_pin)) {
+   dev_err(&pdev->dev, "failed to request ECHO GPIO\n");
+   return PTR_ERR(data->echo_pin);
+   }
+
+   ret = of_property_read_u32(np,
+   "echo-active-ms",
+   &value);
+   if (ret) {
+   dev_err(&pdev->dev,
+   "failed to get echo-active-ms from OF\n");
+   return ret;
+   }
+   data->echo_active_ms = value;
+   /* sanity check on echo_active_ms */
+   if (!data->echo_active_ms || data->echo_active_ms > 999) {
+   dev_err(&pdev->dev,
+   "echo-active-ms: %u - bad value from OF\n",
+   data->echo_active_ms);
+   return -EINVAL;
+   }
+   }
+
if (of_

[PATCH v12 2/3] dt-bindings: pps: pps-gpio PPS ECHO implementation

2018-12-11 Thread Tom Burkart

This patch implements the device tree changes required for the pps
echo functionality for pps-gpio, that sysfs claims is available
already.

This patch was originally written by Lukas Senger as part of a masters
thesis project and modified for inclusion into the linux kernel by Tom
Burkart.

Signed-off-by: Lukas Senger 
Signed-off-by: Tom Burkart 
---
 Documentation/devicetree/bindings/pps/pps-gpio.txt | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/devicetree/bindings/pps/pps-gpio.txt 
b/Documentation/devicetree/bindings/pps/pps-gpio.txt
index 3683874832ae..9012a2a02e14 100644
--- a/Documentation/devicetree/bindings/pps/pps-gpio.txt
+++ b/Documentation/devicetree/bindings/pps/pps-gpio.txt
@@ -7,6 +7,10 @@ Required properties:
 - compatible: should be "pps-gpio"
 - gpios: one PPS GPIO in the format described by ../gpio/gpio.txt
 
+Additional required properties for the PPS ECHO functionality:
+- echo-gpios: one PPS ECHO GPIO in the format described by ../gpio/gpio.txt
+- echo-active-ms: duration in ms of the active portion of the echo pulse
+
 Optional properties:
 - assert-falling-edge: when present, assert is indicated by a falling edge
(instead of by a rising edge)
@@ -19,5 +23,8 @@ Example:
gpios = <&gpio1 26 GPIO_ACTIVE_HIGH>;
assert-falling-edge;
 
+   echo-gpios = <&gpio1 27 GPIO_ACTIVE_HIGH>;
+   echo-active-ms = <100>;
+
compatible = "pps-gpio";
};
-- 
2.12.3

[PATCH v12 0/3] PPS: pps-gpio PPS ECHO implementation

2018-12-11 Thread Tom Burkart

Hi all,
please find attached the PPS-GPIO PPS ECHO implementation patch. The
driver claims to have echo functionality in the sysfs interface but this
functionality is not present.  This patch provides this functionality.

Part 1 of the patch change the original driver from the number
based GPIO ABI to the descriptor based ABI.

Parts 2 and 3 then add the PPS ECHO functionality.  This is enabled if a
"echo-gpios" entry is found in the devicetree.

Changes in v8:
Changes requested by Rob Herring and Philipp Zabel:
DT explanation and don't change the DT entry for the PPS gpio.

Changes in v9:
Simplify "if" expression by doing echo_active_ms validation earlier.

Changes in v10:
Changes requested by Philipp Zabel:
Mostly cosmetic changes: PATCH 2/4 now reviewed.  Thanks a lot, Philipp!
(Please note that as of v11 PATCH 1 is gone so this has become PATCH 1)

Change in v11:
Change requested by Rob Herring:
All changes in regard to the capture-clear DT entry are gone.

Change in v12:
Change requested by Rob Herring:
Deleted superfluous use of invert-pps-echo

On the linuxpps mailing list it was suggested to use a hrtimer for
resetting the GPIO ECHO active state to the inactive state.
Please also comment on whether a hrtimer is necessary/desirable for the
purpose of resetting the echo pin active state.  I am happy to implement
it if this is useful/desirable.

Please install, test and comment as it is now a quite major change to
the driver.
Suggestions for improvement are welcome.

Tom Burkart

Tom Burkart (3):
  pps: descriptor-based gpio
  dt-bindings: pps: pps-gpio PPS ECHO implementation
  pps: pps-gpio pps-echo implementation

 Documentation/devicetree/bindings/pps/pps-gpio.txt |   7 +
 drivers/pps/clients/pps-gpio.c | 159 -
 include/linux/pps-gpio.h   |   5 +-
 3 files changed, 131 insertions(+), 40 deletions(-)

-- 
2.12.3

[PATCH v12 1/3] pps: descriptor-based gpio

2018-12-11 Thread Tom Burkart

This patch changes the GPIO access for the pps-gpio driver from the
integer based ABI to the descriptor based ABI.

Reviewed-by: Philipp Zabel 
Signed-off-by: Tom Burkart 
---
 drivers/pps/clients/pps-gpio.c | 67 +++---
 include/linux/pps-gpio.h   |  3 +-
 2 files changed, 32 insertions(+), 38 deletions(-)

diff --git a/drivers/pps/clients/pps-gpio.c b/drivers/pps/clients/pps-gpio.c
index 333ad7d5b45b..5d764dceabc5 100644
--- a/drivers/pps/clients/pps-gpio.c
+++ b/drivers/pps/clients/pps-gpio.c
@@ -31,7 +31,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -41,9 +41,9 @@ struct pps_gpio_device_data {
int irq;/* IRQ used as PPS source */
struct pps_device *pps; /* PPS source device */
struct pps_source_info info;/* PPS source information */
+   struct gpio_desc *gpio_pin; /* GPIO port descriptors */
bool assert_falling_edge;
bool capture_clear;
-   unsigned int gpio_pin;
 };
 
 /*
@@ -61,18 +61,37 @@ static irqreturn_t pps_gpio_irq_handler(int irq, void *data)
 
info = data;
 
-   rising_edge = gpio_get_value(info->gpio_pin);
+   rising_edge = gpiod_get_value(info->gpio_pin);
if ((rising_edge && !info->assert_falling_edge) ||
(!rising_edge && info->assert_falling_edge))
pps_event(info->pps, &ts, PPS_CAPTUREASSERT, NULL);
else if (info->capture_clear &&
((rising_edge && info->assert_falling_edge) ||
-(!rising_edge && !info->assert_falling_edge)))
+   (!rising_edge && !info->assert_falling_edge)))
pps_event(info->pps, &ts, PPS_CAPTURECLEAR, NULL);
 
return IRQ_HANDLED;
 }
 
+static int pps_gpio_setup(struct platform_device *pdev)
+{
+   struct pps_gpio_device_data *data = platform_get_drvdata(pdev);
+   struct device_node *np = pdev->dev.of_node;
+
+   data->gpio_pin = devm_gpiod_get(&pdev->dev,
+   NULL,   /* request "gpios" */
+   GPIOD_IN);
+   if (IS_ERR(data->gpio_pin)) {
+   dev_err(&pdev->dev,
+   "failed to request PPS GPIO\n");
+   return PTR_ERR(data->gpio_pin);
+   }
+
+   if (of_property_read_bool(np, "assert-falling-edge"))
+   data->assert_falling_edge = true;
+   return 0;
+}
+
 static unsigned long
 get_irqf_trigger_flags(const struct pps_gpio_device_data *data)
 {
@@ -90,53 +109,30 @@ get_irqf_trigger_flags(const struct pps_gpio_device_data 
*data)
 static int pps_gpio_probe(struct platform_device *pdev)
 {
struct pps_gpio_device_data *data;
-   const char *gpio_label;
int ret;
int pps_default_params;
const struct pps_gpio_platform_data *pdata = pdev->dev.platform_data;
-   struct device_node *np = pdev->dev.of_node;
 
/* allocate space for device info */
-   data = devm_kzalloc(&pdev->dev, sizeof(struct pps_gpio_device_data),
-   GFP_KERNEL);
+   data = devm_kzalloc(&pdev->dev, sizeof(*data), GFP_KERNEL);
if (!data)
return -ENOMEM;
+   platform_set_drvdata(pdev, data);
 
+   /* GPIO setup */
if (pdata) {
data->gpio_pin = pdata->gpio_pin;
-   gpio_label = pdata->gpio_label;
 
data->assert_falling_edge = pdata->assert_falling_edge;
data->capture_clear = pdata->capture_clear;
} else {
-   ret = of_get_gpio(np, 0);
-   if (ret < 0) {
-   dev_err(&pdev->dev, "failed to get GPIO from device 
tree\n");
-   return ret;
-   }
-   data->gpio_pin = ret;
-   gpio_label = PPS_GPIO_NAME;
-
-   if (of_get_property(np, "assert-falling-edge", NULL))
-   data->assert_falling_edge = true;
-   }
-
-   /* GPIO setup */
-   ret = devm_gpio_request(&pdev->dev, data->gpio_pin, gpio_label);
-   if (ret) {
-   dev_err(&pdev->dev, "failed to request GPIO %u\n",
-   data->gpio_pin);
-   return ret;
-   }
-
-   ret = gpio_direction_input(data->gpio_pin);
-   if (ret) {
-   dev_err(&pdev->dev, "failed to set pin direction\n");
-   return -EINVAL;
+   ret = pps_gpio_setup(pdev);
+   if (ret)
+   return -EINVAL;
}
 
/* IRQ setup */
-   ret = gpio_to_irq(data->gpio_pin);
+   ret = gpiod_to_irq(data->gpio_pin);
if (ret < 0) {
dev_err(&pdev->dev, "failed to map GPIO to IRQ: %d\n", ret);
return -EINVAL;
@@ -173,7 +169,6 @@ static int pps_gpio_probe(struct platform_device *pdev)
return -EINVAL;
}
 
-   platform_set_drvdata(pdev, data);
dev_info(

Re: [BUG BISECT next] Files cannot be opened after "fsverity: Move verity status check to fsverity_file_open"

2018-12-11 Thread Theodore Y. Ts'o

The fscrypt.git tree has been updated with for the problem.  Apologies
for not testing the !CONFIG_FS_VERITY case.

- Ted

Re: [PATCH net 2/4] vhost_net: rework on the lock ordering for busy polling

2018-12-11 Thread Jason Wang




On 2018/12/11 下午12:04, Michael S. Tsirkin wrote:

On Tue, Dec 11, 2018 at 11:06:43AM +0800, Jason Wang wrote:

On 2018/12/11 上午9:34, Michael S. Tsirkin wrote:

On Mon, Dec 10, 2018 at 05:44:52PM +0800, Jason Wang wrote:

When we try to do rx busy polling in tx path in commit 441abde4cd84
("net: vhost: add rx busy polling in tx path"), we lock rx vq mutex
after tx vq mutex is held. This may lead deadlock so we try to lock vq
one by one in commit 78139c94dc8c ("net: vhost: lock the vqs one by
one"). With this commit, we avoid the deadlock with the assumption
that handle_rx() and handle_tx() run in a same process. But this
commit remove the protection for IOTLB updating which requires the
mutex of each vq to be held.

To solve this issue, the first step is to have a exact same lock
ordering for vhost_net. This is done through:

- For handle_rx(), if busy polling is enabled, lock tx vq immediately.
- For handle_tx(), always lock rx vq before tx vq, and unlock it if
busy polling is not enabled.
- Remove the tricky locking codes in busy polling.

With this, we can have a exact same lock ordering for vhost_net, this
allows us to safely revert commit 78139c94dc8c ("net: vhost: lock the
vqs one by one") in next patch.

The patch will add two more atomic operations on the tx path during
each round of handle_tx(). 1 byte TCP_RR does not notice such
overhead.

Fixes: commit 78139c94dc8c ("net: vhost: lock the vqs one by one")
Cc: Tonghao Zhang
Signed-off-by: Jason Wang
---
   drivers/vhost/net.c | 18 +++---
   1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index ab11b2bee273..5f272ab4d5b4 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -513,7 +513,6 @@ static void vhost_net_busy_poll(struct vhost_net *net,
struct socket *sock;
struct vhost_virtqueue *vq = poll_rx ? tvq : rvq;
-   mutex_lock_nested(&vq->mutex, poll_rx ? VHOST_NET_VQ_TX: 
VHOST_NET_VQ_RX);
vhost_disable_notify(&net->dev, vq);
sock = rvq->private_data;
@@ -543,8 +542,6 @@ static void vhost_net_busy_poll(struct vhost_net *net,
vhost_net_busy_poll_try_queue(net, vq);
else if (!poll_rx) /* On tx here, sock has no rx data. */
vhost_enable_notify(&net->dev, rvq);
-
-   mutex_unlock(&vq->mutex);
   }
   static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
@@ -913,10 +910,16 @@ static void handle_tx_zerocopy(struct vhost_net *net, 
struct socket *sock)
   static void handle_tx(struct vhost_net *net)
   {
struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
+   struct vhost_net_virtqueue *nvq_rx = &net->vqs[VHOST_NET_VQ_RX];
struct vhost_virtqueue *vq = &nvq->vq;
+   struct vhost_virtqueue *vq_rx = &nvq_rx->vq;
struct socket *sock;
+   mutex_lock_nested(&vq_rx->mutex, VHOST_NET_VQ_RX);
mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_TX);
+   if (!vq->busyloop_timeout)
+   mutex_unlock(&vq_rx->mutex);
+
sock = vq->private_data;
if (!sock)
goto out;
@@ -933,6 +936,8 @@ static void handle_tx(struct vhost_net *net)
handle_tx_copy(net, sock);
   out:
+   if (vq->busyloop_timeout)
+   mutex_unlock(&vq_rx->mutex);
mutex_unlock(&vq->mutex);
   }

So rx mutex taken on tx path now.  And tx mutex is on rc path ...  This
is just messed up. Why can't tx polling drop rx lock before
getting the tx lock and vice versa?


Because we want to poll both tx and rx virtqueue at the same time
(vhost_net_busy_poll()).

     while (vhost_can_busy_poll(endtime)) {
         if (vhost_has_work(&net->dev)) {
             *busyloop_intr = true;
             break;
         }

         if ((sock_has_rx_data(sock) &&
          !vhost_vq_avail_empty(&net->dev, rvq)) ||
         !vhost_vq_avail_empty(&net->dev, tvq))
             break;

         cpu_relax();

     }


And we disable kicks and notification for better performance.

Right but it's all slow path - it happens when queue is
otherwise empty. So this is what I am saying: let's drop the locks
we hold around this.



Is this really safe? I looks to me it can race with SET_VRING_ADDR. And 
the codes did more:


- access sock object

- access device IOTLB

- enable and disable notification

None of above is safe without the protection of vq mutex.






Or if we really wanted to force everything to be locked at
all times, let's just use a single mutex.




We could, but it might requires more changes which could be done for -next I
believe.


Thanks

I'd rather we kept the fine grained locking. E.g. people are
looking at splitting the tx and rx threads. But if not possible
let's fix it cleanly with a coarse-grained one. A mess here will
just create more trouble later.



I believe we won't go back to coarse one. Looks like we can solve this 
by using mutex_trylock() for rxq during TX. And don't do polling for rxq 
is a IOTLB

Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]

2018-12-11 Thread Bjorn Helgaas

On Tue, Dec 11, 2018 at 6:38 PM David Gibson
 wrote:
>
> On Tue, Dec 11, 2018 at 08:01:43AM -0600, Bjorn Helgaas wrote:
> > Hi David,
> >
> > I see you're still working on this, but if you do end up going this
> > direction eventually, would you mind splitting this into two patches:
> > 1) rename the quirk to make it more generic (but not changing any
> > behavior), and 2) add the ConnectX devices to the quirk.  That way
> > the ConnectX change is smaller and more easily
> > understood/reverted/etc.
>
> Sure.  Would it make sense to send (1) as an independent cleanup,
> while I'm still working out exactly what (if anything) we need for
> (2)?

You could, but I don't think there's really much benefit in doing the
first without the second, and I think there is some value in handling
both patches at the same time.

Re: [PATCH v3] kbuild: Add support for DT binding schema checks

2018-12-11 Thread Masahiro Yamada

On Wed, Dec 12, 2018 at 3:36 AM Rob Herring  wrote:
>
> On Tue, Dec 11, 2018 at 10:03 AM Masahiro Yamada
>  wrote:
> >
> > On Wed, Dec 12, 2018 at 12:13 AM Rob Herring  wrote:
> >
> > >
> > > > > +$(obj)/%.example.dts: $(src)/%.yaml FORCE
> > > > > +   $(call if_changed,chk_binding)
> > > > > +
> > > > > +DT_TMP_SCHEMA := .schema.yaml.tmp
> > > >
> > > >
> > > > BTW, why does this file start with a period?
> > > > What is the meaning of '.tmp' extension?
> > >
> > > Nothing really. Just named it something so it gets cleaned and ignored by 
> > > git.
> >
> >
> > It is cleaned whatever file name you use.
> >
> >
> > See scripts/Makefile.clean
> >
> > __clean-files   := $(extra-y) $(extra-m) $(extra-)   \
> >$(always) $(targets) $(clean-files)   \
> >$(hostprogs-y) $(hostprogs-m) $(hostprogs-) \
> >$(hostlibs-y) $(hostlibs-m) $(hostlibs-) \
> >$(hostcxxlibs-y) $(hostcxxlibs-m)
> >
> >
> > $(extra-y) is cleaned.
>
> True.
>
> >
> >
> > You are adding *.example.dts to .gitignore
> >
> > Why not "schema.yaml" ?
>
> Okay. I'll do "processed-schema.yaml" to give a bit better name of
> what it contains.
>
> >
> > > > > +extra-y += $(DT_TMP_SCHEMA)
> > > > > +
> > > > > +quiet_cmd_mk_schema = SCHEMA  $@
> > > > > +  cmd_mk_schema = mkdir -p $(obj); \
> > > > > +  rm -f $@; \
> > > > > +  $(DT_MK_SCHEMA) $(DT_MK_SCHEMA_FLAGS) -o $@ 
> > > > > $(filter-out FORCE, $^)
> > > >
> > > >
> > > > "mkdir -p $(obj)" is redundant.
> > > >
> > > >
> > > > Why is 'rm -f $@' necessary ?
> > > > Can't dt-mk-schema overwrite the output file?
> > >
> > > It is for error case when the output file is not generated. I can
> > > handle this within dt-mk-schema instead.
> > > > > +DT_DOCS = $(shell cd $(srctree)/$(src) && find * -name '*.yaml')
> > > > > +DT_SCHEMA_FILES ?= $(addprefix $(src)/,$(DT_DOCS))
> > > > > +
> > > > > +extra-y += $(patsubst $(src)/%.yaml,%.example.dts, 
> > > > > $(DT_SCHEMA_FILES))
> > > > > +extra-y += $(patsubst $(src)/%.yaml,%.example.dtb, 
> > > > > $(DT_SCHEMA_FILES))
> > > >
> > > >
> > > >
> > > > I assume you intentionally did not do like this:
> > > >
> > > > extra-y += $(patsubst %.yaml,%.example.dtb, $(DT_DOCS))
> > > >
> > > > From the commit description, DT_SCHEMA_FILES might be overridden by a 
> > > > user.
> > > > So, I think this is OK.
> > > >
> > > >
> > > >
> > > >
> > > > > +$(obj)/$(DT_TMP_SCHEMA): | $(addprefix $(obj)/,$(patsubst 
> > > > > $(src)/%.yaml,%.example.dtb, $(DT_SCHEMA_FILES)))
> > > >
> > > > I do not understand this line.
> > > > Why is it necessary?
> > > >
> > > > *.example.dtb files are generated anyway
> > > > since they are listed in extra-y.
> > >
> > > It is enforcing the ordering. Without it, the binding checks and
> > > building .schema.yaml.tmp happen in parallel because both only have
> > > the source files as dependencies. The '|' keeps the dependencies out
> > > of the dependency list($^).
> >
> >
> > What kind problem would you see if
> > the binding checks and building .schema.yaml.tmp
> > happen in parallel?
>
> In case of no errors in the binding docs, it doesn't matter. If there
> are errors, I don't want the dtbs validation to run if any schema
> doesn't validate. However, I played around with this a bit more and it
> seems like having the examples' dts/dtb in extra-y prevents that from
> happening. Does that match your expections?

Exactly.

If any error occurs in Documentation/devicetree/bindings/Makefile,
Make terminates before proceeding to the dtbs_check stage.



-- 
Best Regards
Masahiro Yamada

Re: [PATCH v4] kbuild: Add support for DT binding schema checks

2018-12-11 Thread Masahiro Yamada

On Wed, Dec 12, 2018 at 5:24 AM Rob Herring  wrote:
>
> This adds the build infrastructure for checking DT binding schema
> documents and validating dts files using the binding schema.
>
> Check DT binding schema documents:
> make dt_binding_check
>
> Build dts files and check using DT binding schema:
> make dtbs_check
>
> Optionally, DT_SCHEMA_FILES can be passed in with a schema file(s) to
> use for validation. This makes it easier to find and fix errors
> generated by a specific schema.
>
> Currently, the validation targets are separate from a normal build to
> avoid a hard dependency on the external DT schema project and because
> there are lots of warnings generated.
>
> Cc: Jonathan Corbet 
> Cc: Mark Rutland 
> Cc: Masahiro Yamada 
> Cc: Michal Marek 
> Cc: linux-...@vger.kernel.org
> Cc: devicet...@vger.kernel.org
> Cc: linux-kbu...@vger.kernel.org
> Signed-off-by: Rob Herring 
> ---
> v4:
> - Rework libyaml check and error message with Masahiro's version
> - Simplify build rules and dependencies
>


Acked-by: Masahiro Yamada 


Thanks.


-- 
Best Regards
Masahiro Yamada

[PATCH v13 0/6] x86/boot/KASLR: Parse ACPI table and limit KASLR to choosing immovable memory

2018-12-11 Thread Chao Fan

***Background:
People reported that KASLR may randomly choose some positions
which are located in movable memory regions. This will break memory
hotplug feature and make the movable memory chosen by KASLR can't be
removed.

***Solutions:
Get the information of memory hot-remove, then KASLR will know the
right regions. Information about memory hot-remove is in ACPI
tables, which will be parsed after start_kernel(), so that KASLR
can't get the information.

Somebody suggest to add a kernel parameter to specify the
immovable memory so that limit KASLR in these regions. Then I make
a PATCHSET. After several versions, Ingo gave a suggestion:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1634024.html
Follow Ingo's suggestion, imitate the ACPI code to parse the ACPI
tables, so that the kaslr can get necessary memory information in
ACPI tables.
I think ACPI code is an independent part, so imitate the codes
and functions to 'compressed/' directory, so that kaslr won't
influence the initialization of ACPI.

PATCH 1/6 Copy kstrtoull() to boot/string.c
PATCH 2/6 Add get_acpi_rsdp() to parse RSDP in cmdline from KEXEC
PATCH 3/6 Add efi_get_rsdp_addr() to find RSDP from EFI table when
  booting from EFI.
PATCH 4/6 Add bios_get_rsdp_addr() to search RSDP in memory when EFI
  table not found.
PATCH 5/6 Compute SRAT from RSDP and walk SRAT to store the immovable
  memory regions.
PATCH 6/6 Calculate the intersection between memory regions from e820/efi
  memory table and immovable memory regions. Limit KASLR to
  choosing these regions for randomization.

v1->v2:
 -  Simplify some code.
Follow Baoquan He's suggestion:
 - Reuse the head file of acpi code.

v2->v3:
 - Test in more conditions, so remove the 'RFC' tag.
 - Change some comments.

v3->v4:
Follow Thomas Gleixner's suggestion:
 - Put the whole efi related function into #define CONFIG_EFI and return
   false in the other stub.

v4->v5:
Follow Dou Liyang's suggestion:
 - Add more comments about some functions based on kernel code.
 - Change some typo in comments.
 - Clean useless variable.
 - Add check for the boundary of array.
 - Add check for 'movable_node' parameter

v5->v6:
Follow Baoquan He's suggestion:
 - Change some log.
 - Add the check for acpi_rsdp
 - Change some code logical to make code clear

v6->v7:
Follow Rafael's suggestion:
 - Add more comments and patch log.
Follow test robot's suggestion:
 - Add "static" tag for function

v7-v8:
Follow Kees Cook's suggestion:
 - Use mem_overlaps() to check memory region.
 - Use #ifdef in the definition of function.

v8-v9:
Follow Boris' suggestion:
 - Change code style.
 - Splite PATCH 1/3 to more path.
 - Introduce some new function
 - Use existing function to rework some code
Follow Masayoshi's suggetion:
 - Make code more readable

v9->v10:
Follow Baoquan's suggestion:
 - Change some log
 - Merge last two patch together.

v10->v11:
Follow Boris' suggestion:
 - Link kstrtoull() instead of copying it.
 - Drop the useless wrapped function.

v11->v12:
Follow Boris' suggestion:
 - Change patch log and code comments.
 - Add 'CONFIG_EARLY_PARSE_RSDP' to make code easy to read
 - Put strtoull() to misc.c
Follow Masa's suggestion:
 - Remove the detection for 'movable_node'
 - Change the code logical about cmdline_find_option()

v12->v13:
Follow Boris' suggestion:
 - Copy kstrtoull() to boot/string.c 
Follow Masa's suggestion:
 - Change some code logical
Follow Baoquan's suggestion:
 - Add tag to disable export symbol

Any comments will be welcome.

Chao Fan (6):
  x86/boot: Introduce kstrtoull() to boot directory instead of
simple_strtoull()
  x86/boot: Introduce get_acpi_rsdp() to parse RSDP in cmdline from
KEXEC
  x86/boot: Introduce efi_get_rsdp_addr() to find RSDP from EFI table
  x86/boot: Introduce bios_get_rsdp_addr() to search RSDP in memory
  x86/boot: Parse SRAT from RSDP and store immovable memory
  x86/boot/KASLR: Limit KASLR to extracting kernel in immovable memory

 arch/x86/Kconfig  |  10 +
 arch/x86/boot/compressed/Makefile |   2 +
 arch/x86/boot/compressed/acpi.c   | 322 ++
 arch/x86/boot/compressed/kaslr.c  |  79 ++--
 arch/x86/boot/compressed/misc.h   |  21 ++
 arch/x86/boot/string.c| 137 +
 arch/x86/boot/string.h|   2 +
 7 files changed, 558 insertions(+), 15 deletions(-)
 create mode 100644 arch/x86/boot/compressed/acpi.c

-- 
2.19.2

[PATCH v13 2/6] x86/boot: Introduce get_acpi_rsdp() to parse RSDP in cmdline from KEXEC

2018-12-11 Thread Chao Fan

Memory information in SRAT is necessary to fix the conflict between
KASLR and memory-hotremove.

ACPI SRAT (System/Static Resource Affinity Table) shows the details
about memory ranges, including ranges of memory provided by hot-added
memory devices. SRAT is introduced by Root System Description
Pointer(RSDP). So RSDP should be found firstly.

When booting form KEXEC/EFI/BIOS, the methods to find RSDP
are different. When booting from KEXEC, 'acpi_rsdp' may have been
added to cmdline, so parse cmdline to find RSDP.

Since 'RANDOMIZE_BASE' && 'MEMORY_HOTREMOVE' is needed, introduce
'CONFIG_EARLY_PARSE_RSDP' to make ifdeffery clear.

Signed-off-by: Chao Fan 
---
 arch/x86/Kconfig| 10 ++
 arch/x86/boot/compressed/acpi.c | 30 ++
 arch/x86/boot/compressed/misc.h |  6 ++
 3 files changed, 46 insertions(+)
 create mode 100644 arch/x86/boot/compressed/acpi.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ba7e3464ee92..455da382fa9e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2149,6 +2149,16 @@ config X86_NEED_RELOCS
def_bool y
depends on RANDOMIZE_BASE || (X86_32 && RELOCATABLE)
 
+config EARLY_PARSE_RSDP
+   bool "Parse RSDP pointer on compressed period for KASLR"
+   def_bool y
+   depends on RANDOMIZE_BASE && MEMORY_HOTREMOVE
+   help
+ This option parses RSDP in compressed period. Works
+ for KASLR to get memory information from SRAT table and choose
+ immovable memory to extract kernel.
+ Say Y if you want to use both KASLR and memory-hotremove.
+
 config PHYSICAL_ALIGN
hex "Alignment value to which kernel should be aligned"
default "0x20"
diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
new file mode 100644
index ..cad15686f82c
--- /dev/null
+++ b/arch/x86/boot/compressed/acpi.c
@@ -0,0 +1,30 @@
+// SPDX-License-Identifier: GPL-2.0
+#define BOOT_CTYPE_H
+#include "misc.h"
+#include "error.h"
+
+#include 
+#include 
+#include 
+#include 
+
+#define STATIC
+#include 
+
+#include "../string.h"
+
+static acpi_physical_address get_acpi_rsdp(void)
+{
+#ifdef CONFIG_KEXEC
+   unsigned long long res;
+   int len = 0;
+   char val[MAX_ADDRESS_LENGTH+1];
+
+   len = cmdline_find_option("acpi_rsdp", val, MAX_ADDRESS_LENGTH+1);
+   if (len > 0) {
+   val[len] = 0;
+   return (acpi_physical_address)kstrtoull(val, 16, &res);
+   }
+   return 0;
+#endif
+}
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index a1d5918765f3..72fcfbfec3c6 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -116,3 +116,9 @@ static inline void console_init(void)
 void set_sev_encryption_mask(void);
 
 #endif
+
+/* acpi.c */
+#ifdef CONFIG_EARLY_PARSE_RSDP
+/* Max length of 64-bit hex address string is 18, prefix "0x" + 16 hex digit. 
*/
+#define MAX_ADDRESS_LENGTH 18
+#endif
-- 
2.19.2

[PATCH v13 3/6] x86/boot: Introduce efi_get_rsdp_addr() to find RSDP from EFI table

2018-12-11 Thread Chao Fan

Memory information in SRAT is necessary to fix the conflict between
KASLR and memory-hotremove. So RSDP and SRAT should be parsed.

When booting form KEXEC/EFI/BIOS, the methods to compute RSDP
are different. When booting from EFI, EFI table points to RSDP.
So parse the EFI table and find the RSDP.

Signed-off-by: Chao Fan 
---
 arch/x86/boot/compressed/acpi.c | 79 +
 1 file changed, 79 insertions(+)

diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
index cad15686f82c..c96008712ec9 100644
--- a/arch/x86/boot/compressed/acpi.c
+++ b/arch/x86/boot/compressed/acpi.c
@@ -28,3 +28,82 @@ static acpi_physical_address get_acpi_rsdp(void)
return 0;
 #endif
 }
+
+/* Search EFI table for RSDP. */
+static acpi_physical_address efi_get_rsdp_addr(void)
+{
+#ifdef CONFIG_EFI
+   acpi_physical_address rsdp_addr = 0;
+   efi_system_table_t *systab;
+   struct efi_info *e;
+   bool efi_64;
+   char *sig;
+   int size;
+   int i;
+
+   e = &boot_params->efi_info;
+   sig = (char *)&e->efi_loader_signature;
+
+   if (!strncmp(sig, EFI64_LOADER_SIGNATURE, 4))
+   efi_64 = true;
+   else if (!strncmp(sig, EFI32_LOADER_SIGNATURE, 4))
+   efi_64 = false;
+   else {
+   debug_putstr("Wrong EFI loader signature.\n");
+   return 0;
+   }
+
+   /* Get systab from boot params. Based on efi_init(). */
+#ifdef CONFIG_X86_64
+   systab = (efi_system_table_t *)(e->efi_systab | 
((__u64)e->efi_systab_hi<<32));
+#else
+   if (e->efi_systab_hi || e->efi_memmap_hi) {
+   debug_putstr("Error getting RSDP address: EFI system table 
located above 4GB.\n");
+   return 0;
+   }
+   systab = (efi_system_table_t *)e->efi_systab;
+#endif
+
+   if (!systab)
+   return 0;
+
+   /*
+* Get EFI tables from systab. Based on efi_config_init() and
+* efi_config_parse_tables().
+*/
+   size = efi_64 ? sizeof(efi_config_table_64_t) :
+   sizeof(efi_config_table_32_t);
+
+   for (i = 0; i < systab->nr_tables; i++) {
+   void *config_tables;
+   unsigned long table;
+   efi_guid_t guid;
+
+   config_tables = (void *)(systab->tables + size * i);
+   if (efi_64) {
+   efi_config_table_64_t *tmp_table;
+
+   tmp_table = (efi_config_table_64_t *)config_tables;
+   guid = tmp_table->guid;
+   table = tmp_table->table;
+
+   if (!IS_ENABLED(CONFIG_X86_64) && table >> 32) {
+   debug_putstr("Error getting RSDP address: EFI 
system table located above 4GB.\n");
+   return 0;
+   }
+   } else {
+   efi_config_table_32_t *tmp_table;
+
+   tmp_table = (efi_config_table_32_t *)config_tables;
+   guid = tmp_table->guid;
+   table = tmp_table->table;
+   }
+
+   if (!(efi_guidcmp(guid, ACPI_TABLE_GUID)))
+   rsdp_addr = (acpi_physical_address)table;
+   else if (!(efi_guidcmp(guid, ACPI_20_TABLE_GUID)))
+   return (acpi_physical_address)table;
+   }
+   return rsdp_addr;
+#endif
+}
-- 
2.19.2

[PATCH v13 4/6] x86/boot: Introduce bios_get_rsdp_addr() to search RSDP in memory

2018-12-11 Thread Chao Fan

To fix the conflict between KASLR and memory-hotremove, memory
Memory information in SRAT table is necessary to fix the conflict
between KASLR and memory-hotremove. So RSDP and SRAT should be parsed.

When booting form KEXEC/EFI/BIOS, the methods to compute RSDP
are different. When booting from BIOS, there is no variable who can
point to RSDP directly, so scan memory for the RSDP and verify RSDP
by signature and checksum.

Signed-off-by: Chao Fan 
---
 arch/x86/boot/compressed/acpi.c | 85 +
 1 file changed, 85 insertions(+)

diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
index c96008712ec9..c546a463b8ba 100644
--- a/arch/x86/boot/compressed/acpi.c
+++ b/arch/x86/boot/compressed/acpi.c
@@ -107,3 +107,88 @@ static acpi_physical_address efi_get_rsdp_addr(void)
return rsdp_addr;
 #endif
 }
+
+static u8 compute_checksum(u8 *buffer, u32 length)
+{
+   u8 *end = buffer + length;
+   u8 sum = 0;
+
+   while (buffer < end)
+   sum += *(buffer++);
+
+   return sum;
+}
+
+/* Search a block of memory for the RSDP signature. */
+static u8 *scan_mem_for_rsdp(u8 *start, u32 length)
+{
+   struct acpi_table_rsdp *rsdp;
+   u8 *address;
+   u8 *end;
+
+   end = start + length;
+
+   /* Search from given start address for the requested length */
+   for (address = start; address < end; address += ACPI_RSDP_SCAN_STEP) {
+   /*
+* Both RSDP signature and checksum must be correct.
+* Note: Sometimes there exists more than one RSDP in memory;
+* the valid RSDP has a valid checksum, all others have an
+* invalid checksum.
+*/
+   rsdp = (struct acpi_table_rsdp *)address;
+
+   /* BAD Signature */
+   if (!ACPI_VALIDATE_RSDP_SIG(rsdp->signature))
+   continue;
+
+   /* Check the standard checksum */
+   if (compute_checksum((u8 *) rsdp, ACPI_RSDP_CHECKSUM_LENGTH))
+   continue;
+
+   /* Check extended checksum if table version >= 2 */
+   if ((rsdp->revision >= 2) &&
+   (compute_checksum((u8 *) rsdp, ACPI_RSDP_XCHECKSUM_LENGTH)))
+   continue;
+
+   /* Signature and checksum valid, we have found a real RSDP */
+   return address;
+   }
+   return NULL;
+}
+
+/* Search RSDP address, based on acpi_find_root_pointer(). */
+static acpi_physical_address bios_get_rsdp_addr(void)
+{
+   u8 *table_ptr;
+   u32 address;
+   u8 *rsdp;
+
+   /* Get the location of the Extended BIOS Data Area (EBDA) */
+   table_ptr = (u8 *)ACPI_EBDA_PTR_LOCATION;
+   *(u32 *)(void *)&address = *(u16 *)(void *)table_ptr;
+   address <<= 4;
+   table_ptr = (u8 *)(long)address;
+
+   /*
+* Search EBDA paragraphs (EBDA is required to be a minimum of
+* 1K length)
+*/
+   if (address > 0x400) {
+   rsdp = scan_mem_for_rsdp(table_ptr, ACPI_EBDA_WINDOW_SIZE);
+   if (rsdp) {
+   address += (u32)ACPI_PTR_DIFF(rsdp, table_ptr);
+   return (acpi_physical_address)address;
+   }
+   }
+
+   table_ptr = (u8 *)ACPI_HI_RSDP_WINDOW_BASE;
+   rsdp = scan_mem_for_rsdp(table_ptr, ACPI_HI_RSDP_WINDOW_SIZE);
+
+   /* Search upper memory: 16-byte boundaries in Eh-Fh */
+   if (rsdp) {
+   address = (u32)(ACPI_HI_RSDP_WINDOW_BASE +
+   ACPI_PTR_DIFF(rsdp, table_ptr));
+   return (acpi_physical_address)address;
+   }
+}
-- 
2.19.2

[PATCH v13 6/6] x86/boot/KASLR: Limit KASLR to extracting kernel in immovable memory

2018-12-11 Thread Chao Fan

KASLR randomly chooses some positions which may locate in movable
memory regions. It will break memory hotplug feature and make the
movable memory chosen by KASLR practically immovable.

The solution is to limit KASLR to choose memory regions in immovable
node according to SRAT tables.
When CONFIG_EARLY_PARSE_RSDP is enabled, walk through SRAT to get the
information of immovable memory so that KASLR knows where should be
chosen for randomization.

Rename process_mem_region() as __process_mem_region() and name new
function as process_mem_region().

Signed-off-by: Chao Fan 
---
 arch/x86/boot/compressed/kaslr.c | 75 +++-
 1 file changed, 64 insertions(+), 11 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index b251572e77af..fb2cdddaa81a 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -97,6 +97,11 @@ static bool memmap_too_large;
 /* Store memory limit specified by "mem=nn[KMG]" or "memmap=nn[KMG]" */
 static unsigned long long mem_limit = ULLONG_MAX;
 
+#ifdef CONFIG_EARLY_PARSE_RSDP
+/* The immovable memory regions */
+extern struct mem_vector immovable_mem[MAX_NUMNODES*2];
+#endif
+
 
 enum mem_avoid_index {
MEM_AVOID_ZO_RANGE = 0,
@@ -413,6 +418,9 @@ static void mem_avoid_init(unsigned long input, unsigned 
long input_size,
/* Mark the memmap regions we need to avoid */
handle_mem_options();
 
+   /* Mark the immovable regions we need to choose */
+   get_immovable_mem();
+
 #ifdef CONFIG_X86_VERBOSE_BOOTUP
/* Make sure video RAM can be used. */
add_identity_map(0, PMD_SIZE);
@@ -568,9 +576,9 @@ static unsigned long slots_fetch_random(void)
return 0;
 }
 
-static void process_mem_region(struct mem_vector *entry,
-  unsigned long minimum,
-  unsigned long image_size)
+static void __process_mem_region(struct mem_vector *entry,
+unsigned long minimum,
+unsigned long image_size)
 {
struct mem_vector region, overlap;
unsigned long start_orig, end;
@@ -646,6 +654,57 @@ static void process_mem_region(struct mem_vector *entry,
}
 }
 
+static bool process_mem_region(struct mem_vector *region,
+  unsigned long long minimum,
+  unsigned long long image_size)
+{
+   int i;
+   /*
+* If no immovable memory found, or MEMORY_HOTREMOVE disabled,
+* walk all the regions, so use region directly.
+*/
+   if (!num_immovable_mem) {
+   __process_mem_region(region, minimum, image_size);
+
+   if (slot_area_index == MAX_SLOT_AREA) {
+   debug_putstr("Aborted e820/efi memmap scan (slot_areas 
full)!\n");
+   return 1;
+   }
+   return 0;
+   }
+
+#ifdef CONFIG_EARLY_PARSE_RSDP
+   /*
+* If immovable memory found, filter the intersection between
+* immovable memory and region to __process_mem_region().
+* Otherwise, go on old code.
+*/
+   for (i = 0; i < num_immovable_mem; i++) {
+   struct mem_vector entry;
+   unsigned long long start, end, entry_end, region_end;
+
+   if (!mem_overlaps(region, &immovable_mem[i]))
+   continue;
+
+   start = immovable_mem[i].start;
+   end = start + immovable_mem[i].size;
+   region_end = region->start + region->size;
+
+   entry.start = clamp(region->start, start, end);
+   entry_end = clamp(region_end, start, end);
+   entry.size = entry_end - entry.start;
+
+   __process_mem_region(&entry, minimum, image_size);
+
+   if (slot_area_index == MAX_SLOT_AREA) {
+   debug_putstr("Aborted e820/efi memmap scan (slot_areas 
full)!\n");
+   return 1;
+   }
+   }
+   return 0;
+#endif
+}
+
 #ifdef CONFIG_EFI
 /*
  * Returns true if mirror region found (and must have been processed
@@ -711,11 +770,8 @@ process_efi_entries(unsigned long minimum, unsigned long 
image_size)
 
region.start = md->phys_addr;
region.size = md->num_pages << EFI_PAGE_SHIFT;
-   process_mem_region(®ion, minimum, image_size);
-   if (slot_area_index == MAX_SLOT_AREA) {
-   debug_putstr("Aborted EFI scan (slot_areas full)!\n");
+   if (process_mem_region(®ion, minimum, image_size))
break;
-   }
}
return true;
 }
@@ -742,11 +798,8 @@ static void process_e820_entries(unsigned long minimum,
continue;
region.start = entry->addr;
region.size = entry->size;
-   process_mem_region(®ion, minimum,

[PATCH v13 5/6] x86/boot: Parse SRAT from RSDP and store immovable memory

2018-12-11 Thread Chao Fan

SRAT should be parsed by RSDP to fix the conflict between KASLR
and memory-hotremove, then find the immovable memory regions and store
them in an array called immovable_mem[]. With immovable_mem[], KASLR
can avoid to extract kernel to specific regions.

Signed-off-by: Chao Fan 
---
 arch/x86/boot/compressed/Makefile |   2 +
 arch/x86/boot/compressed/acpi.c   | 128 ++
 arch/x86/boot/compressed/kaslr.c  |   4 -
 arch/x86/boot/compressed/misc.h   |  15 
 4 files changed, 145 insertions(+), 4 deletions(-)

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index 466f66c8a7f8..bcc6560fb2ec 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -84,6 +84,8 @@ ifdef CONFIG_X86_64
vmlinux-objs-y += $(obj)/pgtable_64.o
 endif
 
+vmlinux-objs-$(CONFIG_EARLY_PARSE_RSDP) += $(obj)/acpi.o
+
 $(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone
 
 vmlinux-objs-$(CONFIG_EFI_STUB) += $(obj)/eboot.o $(obj)/efi_stub_$(BITS).o \
diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
index c546a463b8ba..6fa0a1ff5ba0 100644
--- a/arch/x86/boot/compressed/acpi.c
+++ b/arch/x86/boot/compressed/acpi.c
@@ -13,6 +13,9 @@
 
 #include "../string.h"
 
+/* Store the immovable memory regions. */
+struct mem_vector immovable_mem[MAX_NUMNODES*2];
+
 static acpi_physical_address get_acpi_rsdp(void)
 {
 #ifdef CONFIG_KEXEC
@@ -192,3 +195,128 @@ static acpi_physical_address bios_get_rsdp_addr(void)
return (acpi_physical_address)address;
}
 }
+
+/* Determine RSDP, based on acpi_os_get_root_pointer(). */
+static acpi_physical_address get_rsdp_addr(void)
+{
+   acpi_physical_address pa = 0;
+
+   pa = get_acpi_rsdp();
+
+   if (!pa)
+   pa = efi_get_rsdp_addr();
+
+   if (!pa)
+   pa = bios_get_rsdp_addr();
+
+   return pa;
+}
+
+/* Compute SRAT address from RSDP. */
+static struct acpi_table_header *get_acpi_srat_table(void)
+{
+   acpi_physical_address acpi_table;
+   acpi_physical_address root_table;
+   struct acpi_table_header *header;
+   struct acpi_table_rsdp *rsdp;
+   u32 num_entries;
+   char arg[10];
+   u8 *entry;
+   u32 size;
+   u32 len;
+
+   rsdp = (struct acpi_table_rsdp *)get_rsdp_addr();
+   if (!rsdp)
+   return NULL;
+
+   /* Get RSDT or XSDT from RSDP. */
+   if (!(cmdline_find_option("acpi", arg, sizeof(arg)) == 4 &&
+   !strncmp(arg, "rsdt", 4)) &&
+   rsdp->xsdt_physical_address &&
+   rsdp->revision > 1) {
+   root_table = rsdp->xsdt_physical_address;
+   size = ACPI_XSDT_ENTRY_SIZE;
+   } else {
+   root_table = rsdp->rsdt_physical_address;
+   size = ACPI_RSDT_ENTRY_SIZE;
+   }
+
+   /* Get ACPI root table from RSDT or XSDT.*/
+   header = (struct acpi_table_header *)root_table;
+   if (!header)
+   return NULL;
+
+   len = header->length;
+   if (len < sizeof(struct acpi_table_header) + size)
+   return NULL;
+
+   num_entries = (u32)((len - sizeof(struct acpi_table_header)) / size);
+   entry = ACPI_ADD_PTR(u8, header, sizeof(struct acpi_table_header));
+
+   while (num_entries--) {
+   u64 address64;
+
+   if (size == ACPI_RSDT_ENTRY_SIZE)
+   acpi_table = ((acpi_physical_address)
+ (*ACPI_CAST_PTR(u32, entry)));
+   else {
+   *(u64 *)(void *)&address64 = *(u64 *)(void *)entry;
+   acpi_table = (acpi_physical_address) address64;
+   }
+
+   if (acpi_table) {
+   header = (struct acpi_table_header *)acpi_table;
+
+   if (ACPI_COMPARE_NAME(header->signature, ACPI_SIG_SRAT))
+   return header;
+   }
+   entry += size;
+   }
+   return NULL;
+}
+
+/*
+ * According to ACPI table, filter the immovable memory regions
+ * and store them in immovable_mem[].
+ */
+void get_immovable_mem(void)
+{
+   struct acpi_table_header *table_header;
+   struct acpi_subtable_header *table;
+   struct acpi_srat_mem_affinity *ma;
+   unsigned long table_end;
+   char arg[10];
+   int i = 0;
+
+   if (cmdline_find_option("acpi", arg, sizeof(arg)) == 3 &&
+   !strncmp(arg, "off", 3))
+   return;
+
+   table_header = get_acpi_srat_table();
+   if (!table_header)
+   return;
+
+   table_end = (unsigned long)table_header + table_header->length;
+   table = (struct acpi_subtable_header *)
+   ((unsigned long)table_header + sizeof(struct acpi_table_srat));
+
+   while (((unsigned long)table) +
+  sizeof(struct acpi_subtable_header) < table_end) {
+   if (table->type ==

[PATCH v13 1/6] x86/boot: Introduce kstrtoull() to boot directory instead of simple_strtoull()

2018-12-11 Thread Chao Fan

Introduce kstrtoull() from lib/kstrtox.c to boot directory so that code
in boot/ can use kstrtoull() and the old simple_strtoull() can be
replaced.

Signed-off-by: Chao Fan 
---
 arch/x86/boot/string.c | 137 +
 arch/x86/boot/string.h |   2 +
 2 files changed, 139 insertions(+)

diff --git a/arch/x86/boot/string.c b/arch/x86/boot/string.c
index c4428a176973..e02405f20f98 100644
--- a/arch/x86/boot/string.c
+++ b/arch/x86/boot/string.c
@@ -12,7 +12,10 @@
  * Very basic string functions
  */
 
+#define _LINUX_CTYPE_H
 #include 
+#include 
+#include 
 #include 
 #include "ctype.h"
 #include "string.h"
@@ -187,3 +190,137 @@ char *strchr(const char *s, int c)
return NULL;
return (char *)s;
 }
+
+static inline u64 div_u64_rem(u64 dividend, u32 divisor, u32 *remainder)
+{
+   union {
+   u64 v64;
+   u32 v32[2];
+   } d = { dividend };
+   u32 upper;
+
+   upper = d.v32[1];
+   d.v32[1] = 0;
+   if (upper >= divisor) {
+   d.v32[1] = upper / divisor;
+   upper %= divisor;
+   }
+   asm ("divl %2" : "=a" (d.v32[0]), "=d" (*remainder) :
+   "rm" (divisor), "0" (d.v32[0]), "1" (upper));
+   return d.v64;
+}
+
+static inline u64 div_u64(u64 dividend, u32 divisor)
+{
+   u32 remainder;
+   return div_u64_rem(dividend, divisor, &remainder);
+}
+
+static inline char _tolower(const char c)
+{
+   return c | 0x20;
+}
+
+const char *_parse_integer_fixup_radix(const char *s, unsigned int *base)
+{
+   if (*base == 0) {
+   if (s[0] == '0') {
+   if (_tolower(s[1]) == 'x' && isxdigit(s[2]))
+   *base = 16;
+   else
+   *base = 8;
+   } else
+   *base = 10;
+   }
+   if (*base == 16 && s[0] == '0' && _tolower(s[1]) == 'x')
+   s += 2;
+   return s;
+}
+
+/*
+ * Convert non-negative integer string representation in explicitly given radix
+ * to an integer.
+ * Return number of characters consumed maybe or-ed with overflow bit.
+ * If overflow occurs, result integer (incorrect) is still returned.
+ *
+ * Don't you dare use this function.
+ */
+unsigned int _parse_integer(const char *s, unsigned int base, unsigned long 
long *p)
+{
+   unsigned long long res;
+   unsigned int rv;
+
+   res = 0;
+   rv = 0;
+   while (1) {
+   unsigned int c = *s;
+   unsigned int lc = c | 0x20; /* don't tolower() this line */
+   unsigned int val;
+
+   if ('0' <= c && c <= '9')
+   val = c - '0';
+   else if ('a' <= lc && lc <= 'f')
+   val = lc - 'a' + 10;
+   else
+   break;
+
+   if (val >= base)
+   break;
+   /*
+* Check for overflow only if we are within range of
+* it in the max base we support (16)
+*/
+   if (unlikely(res & (~0ull << 60))) {
+   if (res > div_u64(ULLONG_MAX - val, base))
+   rv |= KSTRTOX_OVERFLOW;
+   }
+   res = res * base + val;
+   rv++;
+   s++;
+   }
+   *p = res;
+   return rv;
+}
+
+static int _kstrtoull(const char *s, unsigned int base, unsigned long long 
*res)
+{
+   unsigned long long _res;
+   unsigned int rv;
+
+   s = _parse_integer_fixup_radix(s, &base);
+   rv = _parse_integer(s, base, &_res);
+   if (rv & KSTRTOX_OVERFLOW)
+   return -ERANGE;
+   if (rv == 0)
+   return -EINVAL;
+   s += rv;
+   if (*s == '\n')
+   s++;
+   if (*s)
+   return -EINVAL;
+   *res = _res;
+   return 0;
+}
+
+/**
+ * kstrtoull - convert a string to an unsigned long long
+ * @s: The start of the string. The string must be null-terminated, and may 
also
+ *  include a single newline before its terminating null. The first character
+ *  may also be a plus sign, but not a minus sign.
+ * @base: The number base to use. The maximum supported base is 16. If base is
+ *  given as 0, then the base of the string is automatically detected with the
+ *  conventional semantics - If it begins with 0x the number will be parsed as 
a
+ *  hexadecimal (case insensitive), if it otherwise begins with 0, it will be
+ *  parsed as an octal number. Otherwise it will be parsed as a decimal.
+ * @res: Where to write the result of the conversion on success.
+ *
+ * Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.
+ * Used as a replacement for the obsolete simple_strtoull. Return code must
+ * be checked.
+ */
+int kstrtoull(const char *s, unsigned int base, unsigned long long *res)
+{
+   if (s[0] == '+')
+   s++;
+

Re: [PATCH v2] f2fs: fix sbi->extent_list corruption issue

2018-12-11 Thread Sahitya Tummala

On Fri, Dec 07, 2018 at 05:47:31PM +0800, Chao Yu wrote:
> On 2018/12/1 4:33, Jaegeuk Kim wrote:
> > On 11/29, Sahitya Tummala wrote:
> >>
> >> On Tue, Nov 27, 2018 at 09:42:39AM +0800, Chao Yu wrote:
> >>> On 2018/11/27 8:30, Jaegeuk Kim wrote:
>  On 11/26, Sahitya Tummala wrote:
> > When there is a failure in f2fs_fill_super() after/during
> > the recovery of fsync'd nodes, it frees the current sbi and
> > retries again. This time the mount is successful, but the files
> > that got recovered before retry, still holds the extent tree,
> > whose extent nodes list is corrupted since sbi and sbi->extent_list
> > is freed up. The list_del corruption issue is observed when the
> > file system is getting unmounted and when those recoverd files extent
> > node is being freed up in the below context.
> >
> > list_del corruption. prev->next should be fff1e1ef5480, but was 
> > (null)
> > <...>
> > kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
> > task: fff1f46f2280 task.stack: ff8008068000
> > lr : __list_del_entry_valid+0x94/0xb4
> > pc : __list_del_entry_valid+0x94/0xb4
> > <...>
> > Call trace:
> > __list_del_entry_valid+0x94/0xb4
> > __release_extent_node+0xb0/0x114
> > __free_extent_tree+0x58/0x7c
> > f2fs_shrink_extent_tree+0xdc/0x3b0
> > f2fs_leave_shrinker+0x28/0x7c
> > f2fs_put_super+0xfc/0x1e0
> > generic_shutdown_super+0x70/0xf4
> > kill_block_super+0x2c/0x5c
> > kill_f2fs_super+0x44/0x50
> > deactivate_locked_super+0x60/0x8c
> > deactivate_super+0x68/0x74
> > cleanup_mnt+0x40/0x78
> > __cleanup_mnt+0x1c/0x28
> > task_work_run+0x48/0xd0
> > do_notify_resume+0x678/0xe98
> > work_pending+0x8/0x14
> >
> > Fix this by cleaning up inodes, extent tree and nodes of those
> > recovered files before freeing up sbi and before next retry.
> >
> > Signed-off-by: Sahitya Tummala 
> > ---
> > v2:
> > -call evict_inodes() and f2fs_shrink_extent_tree() to cleanup inodes
> >
> >  fs/f2fs/f2fs.h |  1 +
> >  fs/f2fs/shrinker.c |  2 +-
> >  fs/f2fs/super.c| 13 -
> >  3 files changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index 1e03197..aaee63b 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -3407,6 +3407,7 @@ struct rb_entry *f2fs_lookup_rb_tree_ret(struct 
> > rb_root_cached *root,
> >  bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi,
> > struct rb_root_cached 
> > *root);
> >  unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int 
> > nr_shrink);
> > +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi);
> >  bool f2fs_init_extent_tree(struct inode *inode, struct f2fs_extent 
> > *i_ext);
> >  void f2fs_drop_extent_tree(struct inode *inode);
> >  unsigned int f2fs_destroy_extent_node(struct inode *inode);
> > diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
> > index 9e13db9..7e3c13b 100644
> > --- a/fs/f2fs/shrinker.c
> > +++ b/fs/f2fs/shrinker.c
> > @@ -30,7 +30,7 @@ static unsigned long __count_free_nids(struct 
> > f2fs_sb_info *sbi)
> > return count > 0 ? count : 0;
> >  }
> >  
> > -static unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
> > +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
> >  {
> > return atomic_read(&sbi->total_zombie_tree) +
> > atomic_read(&sbi->total_ext_node);
> > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> > index af58b2c..769e7b1 100644
> > --- a/fs/f2fs/super.c
> > +++ b/fs/f2fs/super.c
> > @@ -3016,6 +3016,16 @@ static void f2fs_tuning_parameters(struct 
> > f2fs_sb_info *sbi)
> > sbi->readdir_ra = 1;
> >  }
> >  
> > +static void f2fs_cleanup_inodes(struct f2fs_sb_info *sbi)
> > +{
> > +   struct super_block *sb = sbi->sb;
> > +
> > +   sync_filesystem(sb);
> 
>  This writes another checkpoint, which would not be what this retrial 
>  intended.
> >>>
> >>> Actually, checkpoint will not be triggered due to SBI_POR_DOING flag check
> >>> as below:
> >>>
> >>> int f2fs_sync_fs(struct super_block *sb, int sync)
> >>> {
> >>> ...
> >>>   if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
> >>>   return -EAGAIN;
> >>> ...
> >>> }
> >>>
> >>> And also all dirty data/node won't be persisted due to SBI_POR_DOING flag,
> >>> IIUC.
> >>>
> >>
> >> Thanks Chao for the clarification.
> >>
> >> Hi Jaegeuk,
> >>
> >> Do you still have any further concerns or comments on this patch?
> > 
> > Could you try the below first?
> > 
> > --  How about adding a condition in f2fs_may_extent_tree() when adding 
> > extents?
> > -- Likewise, if (shr

Re: [LKP] [f2fs] 089842de57: aim7.jobs-per-min 15.4% improvement

2018-12-11 Thread Chao Yu

On 2018/12/12 11:01, Rong Chen wrote:
> 
> 
> On 12/11/2018 06:12 PM, Chao Yu wrote:
>> Hi all,
>>
>> The commit only clean up codes which are unused currently, so why we can
>> improve performance with it? could you retest to make sure?
> 
> Hi Chao,
> 
> the improvement is exist in 0day environment.

Hi Rong,

Logically, the deleted codes are dead, and removal of them shouldn't impact
any flows.

Instead, I expect below patch can improve performance in some cases:

https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev-test&id=1e771e83ce26d0ba2ce6c7df7effb7822f032c4a

So I suspect there is problem in current test result, not sure the problem
is in test suit or test environment.

Any thoughts?

Thanks,

> 
> ➜  job cat 
> /result/aim7/4BRD_12G-RAID1-f2fs-disk_rw-3000-performance/lkp-ivb-ep01/debian-x86_64-2018-04-03.cgz/x86_64-rhel-7.2/gcc-7/089842de5750f434aa016eb23f3d3a3a151083bd/*/aim7.json
>  
> | grep -A1 min\"
>    "aim7.jobs-per-min": [
>      111406.82
> --
>    "aim7.jobs-per-min": [
>      110851.09
> --
>    "aim7.jobs-per-min": [
>      111399.93
> --
>    "aim7.jobs-per-min": [
>      110327.92
> --
>    "aim7.jobs-per-min": [
>      110321.16
> 
> ➜  job cat 
> /result/aim7/4BRD_12G-RAID1-f2fs-disk_rw-3000-performance/lkp-ivb-ep01/debian-x86_64-2018-04-03.cgz/x86_64-rhel-7.2/gcc-7/d6c66cd19ef322fe0d51ba09ce1b7f386acab04a/*/aim7.json
>  
> | grep -A1 min\"
>    "aim7.jobs-per-min": [
>      97082.14
> --
>    "aim7.jobs-per-min": [
>      95959.06
> --
>    "aim7.jobs-per-min": [
>      95959.06
> --
>    "aim7.jobs-per-min": [
>      95851.75
> --
>    "aim7.jobs-per-min": [
>      96946.19
> 
> Best Regards,
> Rong Chen
> 
>>
>> Thanks,
>>
>> On 2018/12/11 17:59, kernel test robot wrote:
>>> Greeting,
>>>
>>> FYI, we noticed a 15.4% improvement of aim7.jobs-per-min due to commit:
>>>
>>>
>>> commit: 089842de5750f434aa016eb23f3d3a3a151083bd ("f2fs: remove codes of 
>>> unused wio_mutex")
>>> https://git.kernel.org/cgit/linux/kernel/git/jaegeuk/f2fs.git dev-test
>>>
>>> in testcase: aim7
>>> on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 
>>> 384G memory
>>> with following parameters:
>>>
>>> disk: 4BRD_12G
>>> md: RAID1
>>> fs: f2fs
>>> test: disk_rw
>>> load: 3000
>>> cpufreq_governor: performance
>>>
>>> test-description: AIM7 is a traditional UNIX system level benchmark suite 
>>> which is used to test and measure the performance of multiuser system.
>>> test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/
>>>
>>> In addition to that, the commit also has significant impact on the 
>>> following tests:
>>>
>>> +--+---+
>>> | testcase: change | aim7: aim7.jobs-per-min 8.8% improvement   
>>>|
>>> | test machine | 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz 
>>> with 384G memory |
>>> | test parameters  | cpufreq_governor=performance   
>>>|
>>> |  | disk=4BRD_12G  
>>>|
>>> |  | fs=f2fs
>>>|
>>> |  | load=3000  
>>>|
>>> |  | md=RAID1   
>>>|
>>> |  | test=disk_rr   
>>>|
>>> +--+---+
>>>
>>>
>>> Details are as below:
>>> -->
>>>
>>>
>>> To reproduce:
>>>
>>>  git clone https://github.com/intel/lkp-tests.git
>>>  cd lkp-tests
>>>  bin/lkp install job.yaml  # job file is attached in this email
>>>  bin/lkp run job.yaml
>>>
>>> =
>>> compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase:
>>>
>>> gcc-7/performance/4BRD_12G/f2fs/x86_64-rhel-7.2/3000/RAID1/debian-x86_64-2018-04-03.cgz/lkp-ivb-ep01/disk_rw/aim7
>>>
>>> commit:
>>>d6c66cd19e ("f2fs: fix count of seg_freed to make sec_freed correct")
>>>089842de57 ("f2fs: remove codes of unused wio_mutex")
>>>
>>> d6c66cd19ef322fe 089842de5750f434aa016eb23f
>>>  --
>>>   %stddev %change %stddev
>>>   \  |\
>>>   96213   +15.4% 110996aim7.jobs-per-min
>>>  191.50 ±  3% -15.1% 162.52aim7.time.elapsed_time
>>>  191.50 ±  3% -15.1% 162.52aim7.time.elapsed_time.max
>>> 1090253 ±  2% -17.5% 899165
>>> aim7.time.involun

RE: [Intel-wired-lan] [PATCH] igb: Fix an issue that PME is not enabled during runtime suspend

2018-12-11 Thread Brown, Aaron F

> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On
> Behalf Of Kai-Heng Feng
> Sent: Sunday, December 2, 2018 9:55 PM
> To: Kirsher, Jeffrey T 
> Cc: net...@vger.kernel.org; Kai-Heng Feng
> ; intel-wired-...@lists.osuosl.org;
> da...@davemloft.net; linux-kernel@vger.kernel.org
> Subject: [Intel-wired-lan] [PATCH] igb: Fix an issue that PME is not enabled
> during runtime suspend
> 
> I210 ethernet card doesn't wakeup when a cable gets plugged. It's
> because its PME is not set.
> 
> Since commit 42eca2302146 ("PCI: Don't touch card regs after runtime
> suspend D3"), if the PCI state is saved, pci_pm_runtime_suspend() stops
> calling pci_finish_runtime_suspend(), which enables the PCI PME.
> 
> To fix the issue, let's not to save PCI states when it's runtime
> suspend, to let the PCI subsytem enables PME.
> 
> Fixes: 42eca2302146 ("PCI: Don't touch card regs after runtime suspend D3")
> Signed-off-by: Kai-Heng Feng 
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)

Tested-by: Aaron Brown

Re: [PATCH v2] mm, memory_hotplug: Don't bail out in do_migrate_range prematurely

2018-12-11 Thread Wei Yang

On Tue, Dec 11, 2018 at 02:53:12PM +0100, Oscar Salvador wrote:
>v1 -> v2:
>- Keep branch to decrease refcount and print out
>  the failed pfn/page
>- Modified changelog per Michal's feedback
>- move put_page() out of the if/else branch
>
>---
>>From f81da873be9a5b7845249d1e62a423f054c487d5 Mon Sep 17 00:00:00 2001
>From: Oscar Salvador 
>Date: Tue, 11 Dec 2018 11:45:19 +0100
>Subject: [PATCH] mm, memory_hotplug: Don't bail out in do_migrate_range
> prematurely
>
>do_migrate_ranges() takes a memory range and tries to isolate the
>pages to put them into a list.
>This list will be later on used in migrate_pages() to know
>the pages we need to migrate.
>
>Currently, if we fail to isolate a single page, we put all already
>isolated pages back to their LRU and we bail out from the function.
>This is quite suboptimal, as this will force us to start over again
>because scan_movable_pages will give us the same range.
>If there is no chance that we can isolate that page, we will loop here
>forever.
>
>Issue debugged in [1] has proved that.
>During the debugging of that issue, it was noticed that if
>do_migrate_ranges() fails to isolate a single page, we will
>just discard the work we have done so far and bail out, which means
>that scan_movable_pages() will find again the same set of pages.
>
>Instead, we can just skip the error, keep isolating as much pages
>as possible and then proceed with the call to migrate_pages().
>
>This will allow us to do as much work as possible at once.
>
>[1] https://lkml.org/lkml/2018/12/6/324
>
>Signed-off-by: Oscar Salvador 
>---
> mm/memory_hotplug.c | 18 ++
> 1 file changed, 2 insertions(+), 16 deletions(-)
>
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index 86ab673fc4e3..68e740b1768e 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -1339,7 +1339,6 @@ do_migrate_range(unsigned long start_pfn, unsigned long 
>end_pfn)
>   unsigned long pfn;
>   struct page *page;
>   int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
>-  int not_managed = 0;
>   int ret = 0;
>   LIST_HEAD(source);
> 
>@@ -1388,7 +1387,6 @@ do_migrate_range(unsigned long start_pfn, unsigned long 
>end_pfn)
>   else
>   ret = isolate_movable_page(page, ISOLATE_UNEVICTABLE);
>   if (!ret) { /* Success */
>-  put_page(page);
>   list_add_tail(&page->lru, &source);
>   move_pages--;
>   if (!__PageMovable(page))
>@@ -1398,22 +1396,10 @@ do_migrate_range(unsigned long start_pfn, unsigned 
>long end_pfn)
>   } else {
>   pr_warn("failed to isolate pfn %lx\n", pfn);
>   dump_page(page, "isolation failed");

I see the above code is wrapped with CONFIG_DEBUG_VM on current Linus tree.

This is removed by someone else?

>-  put_page(page);
>-  /* Because we don't have big zone->lock. we should
>- check this again here. */
>-  if (page_count(page)) {
>-  not_managed++;
>-  ret = -EBUSY;
>-  break;
>-  }
>   }
>+  put_page(page);
>   }
>   if (!list_empty(&source)) {
>-  if (not_managed) {
>-  putback_movable_pages(&source);
>-  goto out;
>-  }
>-
>   /* Allocate a new page from the nearest neighbor node */
>   ret = migrate_pages(&source, new_node_page, NULL, 0,
>   MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
>@@ -1426,7 +1412,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long 
>end_pfn)
>   putback_movable_pages(&source);
>   }
>   }
>-out:
>+
>   return ret;
> }
> 
>-- 
>2.13.7

-- 
Wei Yang
Help you, Help me

Re: [PATCH v2] f2fs: fix sbi->extent_list corruption issue

2018-12-11 Thread Chao Yu

On 2018/12/12 11:17, Sahitya Tummala wrote:
> On Fri, Dec 07, 2018 at 05:47:31PM +0800, Chao Yu wrote:
>> On 2018/12/1 4:33, Jaegeuk Kim wrote:
>>> On 11/29, Sahitya Tummala wrote:

 On Tue, Nov 27, 2018 at 09:42:39AM +0800, Chao Yu wrote:
> On 2018/11/27 8:30, Jaegeuk Kim wrote:
>> On 11/26, Sahitya Tummala wrote:
>>> When there is a failure in f2fs_fill_super() after/during
>>> the recovery of fsync'd nodes, it frees the current sbi and
>>> retries again. This time the mount is successful, but the files
>>> that got recovered before retry, still holds the extent tree,
>>> whose extent nodes list is corrupted since sbi and sbi->extent_list
>>> is freed up. The list_del corruption issue is observed when the
>>> file system is getting unmounted and when those recoverd files extent
>>> node is being freed up in the below context.
>>>
>>> list_del corruption. prev->next should be fff1e1ef5480, but was 
>>> (null)
>>> <...>
>>> kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
>>> task: fff1f46f2280 task.stack: ff8008068000
>>> lr : __list_del_entry_valid+0x94/0xb4
>>> pc : __list_del_entry_valid+0x94/0xb4
>>> <...>
>>> Call trace:
>>> __list_del_entry_valid+0x94/0xb4
>>> __release_extent_node+0xb0/0x114
>>> __free_extent_tree+0x58/0x7c
>>> f2fs_shrink_extent_tree+0xdc/0x3b0
>>> f2fs_leave_shrinker+0x28/0x7c
>>> f2fs_put_super+0xfc/0x1e0
>>> generic_shutdown_super+0x70/0xf4
>>> kill_block_super+0x2c/0x5c
>>> kill_f2fs_super+0x44/0x50
>>> deactivate_locked_super+0x60/0x8c
>>> deactivate_super+0x68/0x74
>>> cleanup_mnt+0x40/0x78
>>> __cleanup_mnt+0x1c/0x28
>>> task_work_run+0x48/0xd0
>>> do_notify_resume+0x678/0xe98
>>> work_pending+0x8/0x14
>>>
>>> Fix this by cleaning up inodes, extent tree and nodes of those
>>> recovered files before freeing up sbi and before next retry.
>>>
>>> Signed-off-by: Sahitya Tummala 
>>> ---
>>> v2:
>>> -call evict_inodes() and f2fs_shrink_extent_tree() to cleanup inodes
>>>
>>>  fs/f2fs/f2fs.h |  1 +
>>>  fs/f2fs/shrinker.c |  2 +-
>>>  fs/f2fs/super.c| 13 -
>>>  3 files changed, 14 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>> index 1e03197..aaee63b 100644
>>> --- a/fs/f2fs/f2fs.h
>>> +++ b/fs/f2fs/f2fs.h
>>> @@ -3407,6 +3407,7 @@ struct rb_entry *f2fs_lookup_rb_tree_ret(struct 
>>> rb_root_cached *root,
>>>  bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi,
>>> struct rb_root_cached 
>>> *root);
>>>  unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int 
>>> nr_shrink);
>>> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi);
>>>  bool f2fs_init_extent_tree(struct inode *inode, struct f2fs_extent 
>>> *i_ext);
>>>  void f2fs_drop_extent_tree(struct inode *inode);
>>>  unsigned int f2fs_destroy_extent_node(struct inode *inode);
>>> diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
>>> index 9e13db9..7e3c13b 100644
>>> --- a/fs/f2fs/shrinker.c
>>> +++ b/fs/f2fs/shrinker.c
>>> @@ -30,7 +30,7 @@ static unsigned long __count_free_nids(struct 
>>> f2fs_sb_info *sbi)
>>> return count > 0 ? count : 0;
>>>  }
>>>  
>>> -static unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
>>> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
>>>  {
>>> return atomic_read(&sbi->total_zombie_tree) +
>>> atomic_read(&sbi->total_ext_node);
>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>> index af58b2c..769e7b1 100644
>>> --- a/fs/f2fs/super.c
>>> +++ b/fs/f2fs/super.c
>>> @@ -3016,6 +3016,16 @@ static void f2fs_tuning_parameters(struct 
>>> f2fs_sb_info *sbi)
>>> sbi->readdir_ra = 1;
>>>  }
>>>  
>>> +static void f2fs_cleanup_inodes(struct f2fs_sb_info *sbi)
>>> +{
>>> +   struct super_block *sb = sbi->sb;
>>> +
>>> +   sync_filesystem(sb);
>>
>> This writes another checkpoint, which would not be what this retrial 
>> intended.
>
> Actually, checkpoint will not be triggered due to SBI_POR_DOING flag check
> as below:
>
> int f2fs_sync_fs(struct super_block *sb, int sync)
> {
> ...
>   if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
>   return -EAGAIN;
> ...
> }
>
> And also all dirty data/node won't be persisted due to SBI_POR_DOING flag,
> IIUC.
>

 Thanks Chao for the clarification.

 Hi Jaegeuk,

 Do you still have any further concerns or comments on this patch?
>>>
>>> Could you try the below first?
>>>
>>> --  How about adding a condition in f2fs_may_extent_tree() when

Re: [PATCH v2] x86/speculation: Add support for STIBP always-on preferred mode

2018-12-11 Thread Thomas Gleixner

On Wed, 12 Dec 2018, Borislav Petkov wrote:
> On Tue, Dec 11, 2018 at 10:46:16PM +, Lendacky, Thomas wrote:
> > +   /*
> > +* At this point, an STIBP mode other than "off" has been set.
> > +* If STIBP support is not being forced, check if STIBP always-on
> > +* is preferred.
> > +*/
> > +   if (mode != SPECTRE_V2_USER_STRICT &&
> > +   boot_cpu_has(X86_FEATURE_AMD_STIBP_ALWAYS_ON)) {
> > +   stibp_always_on = true;
> > +   mode = SPECTRE_V2_USER_STRICT;
> > +   pr_info("mitigation: STIBP always-on is preferred\n");
> > +   }
> > +
> > /* Initialize Indirect Branch Prediction Barrier */
> > if (boot_cpu_has(X86_FEATURE_IBPB)) {
> > setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
> > @@ -1088,7 +1102,8 @@ static char *stibp_state(void)
> > case SPECTRE_V2_USER_NONE:
> > return ", STIBP: disabled";
> > case SPECTRE_V2_USER_STRICT:
> > -   return ", STIBP: forced";
> > +   return stibp_always_on ? ", STIBP: always-on"
> > +  : ", STIBP: forced";
> 
> I still don't like that separate stibp_always_on variable when we can do
> all the querying just by using mode and X86_FEATURE_AMD_STIBP_ALWAYS_ON.

Hmmm. I've not seen the V1 of this (it's not in my inbox) but the v1->v2
changes contain:

> > - Removed explicit SPECTRE_V2_USER_STRICT_PREFERRED mode

Now I really have to ask why?

Neither the extra variable nor the cpu feature check are pretty. An
explicit mode is way better in terms of code clarity and you get the proper
printout via spectre_v2_user_strings.

Hmm?

tglx

Re: [PATCH net 2/4] vhost_net: rework on the lock ordering for busy polling

2018-12-11 Thread Michael S. Tsirkin

On Wed, Dec 12, 2018 at 11:03:57AM +0800, Jason Wang wrote:
> 
> On 2018/12/11 下午12:04, Michael S. Tsirkin wrote:
> > On Tue, Dec 11, 2018 at 11:06:43AM +0800, Jason Wang wrote:
> > > On 2018/12/11 上午9:34, Michael S. Tsirkin wrote:
> > > > On Mon, Dec 10, 2018 at 05:44:52PM +0800, Jason Wang wrote:
> > > > > When we try to do rx busy polling in tx path in commit 441abde4cd84
> > > > > ("net: vhost: add rx busy polling in tx path"), we lock rx vq mutex
> > > > > after tx vq mutex is held. This may lead deadlock so we try to lock vq
> > > > > one by one in commit 78139c94dc8c ("net: vhost: lock the vqs one by
> > > > > one"). With this commit, we avoid the deadlock with the assumption
> > > > > that handle_rx() and handle_tx() run in a same process. But this
> > > > > commit remove the protection for IOTLB updating which requires the
> > > > > mutex of each vq to be held.
> > > > > 
> > > > > To solve this issue, the first step is to have a exact same lock
> > > > > ordering for vhost_net. This is done through:
> > > > > 
> > > > > - For handle_rx(), if busy polling is enabled, lock tx vq immediately.
> > > > > - For handle_tx(), always lock rx vq before tx vq, and unlock it if
> > > > > busy polling is not enabled.
> > > > > - Remove the tricky locking codes in busy polling.
> > > > > 
> > > > > With this, we can have a exact same lock ordering for vhost_net, this
> > > > > allows us to safely revert commit 78139c94dc8c ("net: vhost: lock the
> > > > > vqs one by one") in next patch.
> > > > > 
> > > > > The patch will add two more atomic operations on the tx path during
> > > > > each round of handle_tx(). 1 byte TCP_RR does not notice such
> > > > > overhead.
> > > > > 
> > > > > Fixes: commit 78139c94dc8c ("net: vhost: lock the vqs one by one")
> > > > > Cc: Tonghao Zhang
> > > > > Signed-off-by: Jason Wang
> > > > > ---
> > > > >drivers/vhost/net.c | 18 +++---
> > > > >1 file changed, 15 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > > > > index ab11b2bee273..5f272ab4d5b4 100644
> > > > > --- a/drivers/vhost/net.c
> > > > > +++ b/drivers/vhost/net.c
> > > > > @@ -513,7 +513,6 @@ static void vhost_net_busy_poll(struct vhost_net 
> > > > > *net,
> > > > >   struct socket *sock;
> > > > >   struct vhost_virtqueue *vq = poll_rx ? tvq : rvq;
> > > > > - mutex_lock_nested(&vq->mutex, poll_rx ? VHOST_NET_VQ_TX: 
> > > > > VHOST_NET_VQ_RX);
> > > > >   vhost_disable_notify(&net->dev, vq);
> > > > >   sock = rvq->private_data;
> > > > > @@ -543,8 +542,6 @@ static void vhost_net_busy_poll(struct vhost_net 
> > > > > *net,
> > > > >   vhost_net_busy_poll_try_queue(net, vq);
> > > > >   else if (!poll_rx) /* On tx here, sock has no rx data. */
> > > > >   vhost_enable_notify(&net->dev, rvq);
> > > > > -
> > > > > - mutex_unlock(&vq->mutex);
> > > > >}
> > > > >static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
> > > > > @@ -913,10 +910,16 @@ static void handle_tx_zerocopy(struct vhost_net 
> > > > > *net, struct socket *sock)
> > > > >static void handle_tx(struct vhost_net *net)
> > > > >{
> > > > >   struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
> > > > > + struct vhost_net_virtqueue *nvq_rx = &net->vqs[VHOST_NET_VQ_RX];
> > > > >   struct vhost_virtqueue *vq = &nvq->vq;
> > > > > + struct vhost_virtqueue *vq_rx = &nvq_rx->vq;
> > > > >   struct socket *sock;
> > > > > + mutex_lock_nested(&vq_rx->mutex, VHOST_NET_VQ_RX);
> > > > >   mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_TX);
> > > > > + if (!vq->busyloop_timeout)
> > > > > + mutex_unlock(&vq_rx->mutex);
> > > > > +
> > > > >   sock = vq->private_data;
> > > > >   if (!sock)
> > > > >   goto out;
> > > > > @@ -933,6 +936,8 @@ static void handle_tx(struct vhost_net *net)
> > > > >   handle_tx_copy(net, sock);
> > > > >out:
> > > > > + if (vq->busyloop_timeout)
> > > > > + mutex_unlock(&vq_rx->mutex);
> > > > >   mutex_unlock(&vq->mutex);
> > > > >}
> > > > So rx mutex taken on tx path now.  And tx mutex is on rc path ...  This
> > > > is just messed up. Why can't tx polling drop rx lock before
> > > > getting the tx lock and vice versa?
> > > 
> > > Because we want to poll both tx and rx virtqueue at the same time
> > > (vhost_net_busy_poll()).
> > > 
> > >      while (vhost_can_busy_poll(endtime)) {
> > >          if (vhost_has_work(&net->dev)) {
> > >              *busyloop_intr = true;
> > >              break;
> > >          }
> > > 
> > >          if ((sock_has_rx_data(sock) &&
> > >           !vhost_vq_avail_empty(&net->dev, rvq)) ||
> > >          !vhost_vq_avail_empty(&net->dev, tvq))
> > >              break;
> > > 
> > >          cpu_relax();
> > > 
> > >      }
> > > 
> > > 
> > > And we disable kicks and notification for better performance.
> > Right but it's all

Re: [LKP] [tty] c96cf923a9: WARNING:possible_circular_locking_dependency_detected

2018-12-11 Thread Sergey Senozhatsky

Hi,

Cc-ing Peter, Waiman


Hmm, so, how it looks to me

On (12/11/18 20:59), Dmitry Safonov wrote:
> >> [   87.218483] -> #2 (&port_lock_key){-.-.}:
> >> [   87.219282]lock_acquire+0x28c/0x2e7
> >> [   87.219901]_raw_spin_lock_irqsave+0x35/0x49
> >> [   87.220601]serial8250_console_write+0x110/0x5b5
> >> [   87.221354]univ8250_console_write+0x5f/0x64
> >> [   87.222056]console_unlock+0x61c/0x7cf
> >> [   87.222680]register_console+0x63a/0x7b0
> >> [   87.223345]univ8250_console_init+0x1e/0x28
> >> [   87.224041]console_init+0x3be/0x57e
> >> [   87.224641]start_kernel+0x441/0x6c6
> >> [   87.225246]x86_64_start_reservations+0x29/0x2b
> >> [   87.225979]x86_64_start_kernel+0x6f/0x72
> >> [   87.226637]secondary_startup_64+0xa4/0xb0

console_sem -> uart_port->lock

> >> [   87.227314] -> #1 (console_owner){-...}:
> >> [   87.228127]lock_acquire+0x28c/0x2e7
> >> [   87.228728]console_unlock+0x424/0x7cf
> >> [   87.229363]vprintk_emit+0x22d/0x252
> >> [   87.229969]vprintk_default+0x18/0x1a
> >> [   87.230576]vprintk_func+0xa9/0xab
> >> [   87.231156]printk+0x97/0xbe
> >> [   87.231659]__debug_object_init+0x8db/0x92d
> >> [   87.232349]debug_object_init+0x14/0x17
> >> [   87.232987]__init_work+0x1b/0x1d
> >> [   87.233551]rhashtable_init+0x53b/0x602
> >> [   87.234192]rhltable_init+0xe/0x41
> >> [   87.234772]test_insert_dup+0xac/0xa94
> >> [   87.235467]test_rht_init+0x387/0x79c
> >> [   87.236222]do_one_initcall+0x23c/0x4af
> >> [   87.236869]kernel_init_freeable+0x5ec/0x69f
> >> [   87.237855]kernel_init+0xc/0x100
> >> [   87.238470]ret_from_fork+0x3a/0x50

db->lock -> console_sem -> uart_port->lock

   obj_hash[i].lock
   /* db->lock */
__debug_object_init()
  debug_print_object()
   printk()
spin_lock_irqsave(uart->port_lock)

BTW, there is a patch from Waiman which moves debug_print_object()
out of db->lock scope [1].

> >> [   87.239071] -> #0 (&obj_hash[i].lock){-.-.}:
> >> [   87.239904]__lock_acquire+0x1f78/0x22d1
> >> [   87.240556]lock_acquire+0x28c/0x2e7
> >> [   87.241173]_raw_spin_lock_irqsave+0x35/0x49
> >> [   87.241882]debug_check_no_obj_freed+0xb4/0x302
> >> [   87.242620]free_unref_page_prepare+0x33a/0x483
> >> [   87.243368]free_unref_page+0x48/0x80
> >> [   87.243991]__free_pages+0x2e/0x40
> >> [   87.244611]free_pages+0x54/0x5a
> >> [   87.245188]uart_shutdown+0x3df/0x4e2
> >> [   87.245817]uart_hangup+0x123/0x280
> >> [   87.246406]__tty_hangup+0x4da/0x50f
> >> [   87.247025]tty_vhangup_session+0xe/0x10
> >> [   87.247680]disassociate_ctty+0xeb/0x5c5
> >> [   87.248349]do_exit+0xc97/0x1daf
> >> [   87.248920]__x64_sys_exit_group+0x0/0x3e
> >> [   87.249587]__wake_up_parent+0x0/0x52
> >> [   87.250211]do_syscall_64+0x5e8/0x881
> >> [   87.250839]entry_SYSCALL_64_after_hwframe+0x49/0xbe

But I think what really makes lockdep nervous is this thing:

uart_shutdown()
 uart_port_lock()  //  spin_lock_irqsave(uart_port->lock)
  free_page()
   debug_check_no_obj_freed()
db->lock
 debug_print_object()
  printk()
   spin_lock_irqsave(uart_port->lock)


Lockdep complains about:   uart_port->lock -> db->lock

It knows that we already have the reverse chain: db->lock -> uart_port->lock
From
db->lock -> debug_print_object() -> printk() -> console_sem -> 
uart_port->lock


> >> [   87.255156]CPU0CPU1
> >> [   87.255813]
> >> [   87.256460]   lock(&port_lock_key);
> >> [   87.256973]lock(console_owner);
> >> [   87.257829]lock(&port_lock_key);
> >> [   87.258680]   lock(&obj_hash[i].lock);


So it's like

CPU0CPU1

uart_shutdown() db->lock
 uart_port->lock debug_print_object()
  free_page() printk
   debug_check_no_obj_freeduart_port->lock
db->lock


In this particular case we probably can just move free_page()
out of uart_port lock scope. Note that free_page()->MM can printk()
on its own.


Something like this (not tested):

---

 drivers/tty/serial/serial_core.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index c439a5a1e6c0..64050f506348 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -268,6 +268,7 @@ static void uart_shutdown(struct tty_struct *tty, struct 
uart_state *state)

[PATCH v2] f2fs: remove codes of unused wio_mutex

2018-12-11 Thread Yunlong Song

v1->v2: delete comments in f2fs.h: "/* bio ordering for NODE/DATA */"

Signed-off-by: Yunlong Song 
Reviewed-by: Chao Yu 
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/f2fs.h  | 2 --
 fs/f2fs/super.c | 5 +
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 1e03197..195850e 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1170,8 +1170,6 @@ struct f2fs_sb_info {
 
/* for bio operations */
struct f2fs_bio_info *write_io[NR_PAGE_TYPE];   /* for write bios */
-   struct mutex wio_mutex[NR_PAGE_TYPE - 1][NR_TEMP_TYPE];
-   /* bio ordering for NODE/DATA */
/* keep migration IO order for LFS mode */
struct rw_semaphore io_order_lock;
mempool_t *write_io_dummy;  /* Dummy pages */
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index af58b2c..2d18de5 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -2674,7 +2674,7 @@ int f2fs_sanity_check_ckpt(struct f2fs_sb_info *sbi)
 static void init_sb_info(struct f2fs_sb_info *sbi)
 {
struct f2fs_super_block *raw_super = sbi->raw_super;
-   int i, j;
+   int i;
 
sbi->log_sectors_per_block =
le32_to_cpu(raw_super->log_sectors_per_block);
@@ -2710,9 +2710,6 @@ static void init_sb_info(struct f2fs_sb_info *sbi)
 
INIT_LIST_HEAD(&sbi->s_list);
mutex_init(&sbi->umount_mutex);
-   for (i = 0; i < NR_PAGE_TYPE - 1; i++)
-   for (j = HOT; j < NR_TEMP_TYPE; j++)
-   mutex_init(&sbi->wio_mutex[i][j]);
init_rwsem(&sbi->io_order_lock);
spin_lock_init(&sbi->cp_lock);
 
-- 
1.8.5.2

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1692 matches

Mail list logo