from:"anton"

Re: RFC: Changes for PCI

2001-06-27 Thread anton



Hi,

> The following 3 functions are added. Their purpose is a little
> different than to add support for more than 256 buses but they are
> important. Skip ahead and I'll explain what they are for
> 
> int (*pci_read_bases)(struct pci_dev *, int cnt,int rom);  /* These optional
> hooks provide */
> int (*pci_read_irq)(struct pci_dev *); /* the arch dependant
> code a way*/
> int (*pci_fixup_registers)(struct pci_dev *);  /* to manage the
> registers. */

I do not think these functions are required at all.

> The 3 additional functions are hooks so that an architecture has a
> chance to make sure things are in order beforehand. pci_read_bases is
> for the management and fixup of the BARs. pci_read_irq is the same but
> for IRQs.  pci_fixup_registers again same idea but for bridge
> resources.
> 
> The essense of the design here is to allow you to get in and make sure
> everything is ok with the card, bridge etc, beforehand so as to avoid
> multiple bus walks. 

Please look at how other architectures solve this problem. For example
on ppc32 we already fix up the BARs if required. There are also hooks
to fix the irq, you seem to be duplicating all of this.

Anton
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: Changes for PCI

2001-06-27 Thread anton


 
> Why not use sysdata like the other arches?
> 
> Changing the meaning of dev->bus->number globally seems pointless.  If
> you are going to do that, just do it the right way and introduce another
> struct member, pci_domain or somesuch.

Thats 2.5 material. For 2.4 we should do as davem suggested and make
the bus number unique. I do this by just adding 256 to each overlapping
host bridge.

Anton
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] more penguins

2001-07-03 Thread anton



> What is the point of displaying penguins in framebuffer mode if it is going
> to change the video mode set by the vga= command line parameter?
> I like to set my display to 50 lines. This won't stay when the penguin 
> comes up.
> In standard character mode, this isn't a problem. So, how do we fix this?
> Is there a command line parameter that prevents the penguin logo from 
> coming up?

Just change VTs or "echo ^[c" to the screen to clear the boot logo.

Anton
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/2] Removing aio_write from NTFS

2015-03-11 Thread anton

From: Anton Altaparmakov 

Hi Al,

As you requested at LSF, these two patches change NTFS from aio_write to 
write_iter.

I know you were sceptical about the addition of a multi page fault in function 
but please take a look at my patches and see if it is acceptable.  If not, 
let me know and I will do it inside NTFS itself without introducing a generic 
function...

If you are happy with the patches, please feel free to send them onto Linus 
or let me know to send them myself.

Note you can also pull the patches from:

git://git.kernel.org/pub/scm/linux/kernel/git/aia21/ntfs.git

Best regards,

Anton

Anton Altaparmakov (2):
  VFS: Add iov_iter_fault_in_multipages_readable() which is simillar to 
   iov_iter_fault_in_readable() but differs in that it is not
limited to faulting in the first iovec and instead faults in
"bytes" bytes iterating over the iovecs as necessary.
  NTFS: Version 2.1.32 - Update file write from aio_write to write_iter.

 fs/ntfs/Makefile|   2 +-
 fs/ntfs/file.c  | 783 
 include/linux/uio.h |   1 +
 mm/iov_iter.c   |  26 ++
 4 files changed, 335 insertions(+), 477 deletions(-)

-- 
Anton Altaparmakov  (replace at with @)
Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
Linux NTFS maintainer
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] VFS: Add iov_iter_fault_in_multipages_readable() which is simillar to iov_iter_fault_in_readable() but differs in that it is not limited to faulting in the first iovec and instead faults i

2015-03-11 Thread anton

From: Anton Altaparmakov 

Also, instead of only faulting in the first and last page of the
range, all pages are faulted in.

This function is needed by NTFS when it does multi page file
writes.

Signed-off-by: Anton Altaparmakov 
---
 include/linux/uio.h |  1 +
 mm/iov_iter.c   | 26 ++
 2 files changed, 27 insertions(+)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 07a0226..2986d5d 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -76,6 +76,7 @@ size_t iov_iter_copy_from_user_atomic(struct page *page,
struct iov_iter *i, unsigned long offset, size_t bytes);
 void iov_iter_advance(struct iov_iter *i, size_t bytes);
 int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes);
+int iov_iter_fault_in_multipages_readable(struct iov_iter *i, size_t bytes);
 size_t iov_iter_single_seg_count(const struct iov_iter *i);
 size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
 struct iov_iter *i);
diff --git a/mm/iov_iter.c b/mm/iov_iter.c
index 8277320..040ad9e 100644
--- a/mm/iov_iter.c
+++ b/mm/iov_iter.c
@@ -317,6 +317,32 @@ int iov_iter_fault_in_readable(struct iov_iter *i, size_t 
bytes)
 }
 EXPORT_SYMBOL(iov_iter_fault_in_readable);
 
+/*
+ * Fault in one or more iovecs of the given iov_iter, to a maximum length of
+ * bytes.  For each iovec, fault in each page that constitutes the iovec.
+ *
+ * Return 0 on success, or non-zero if the memory could not be accessed (i.e.
+ * because it is an invalid address).
+ */
+int iov_iter_fault_in_multipages_readable(struct iov_iter *i, size_t bytes)
+{
+   size_t skip = i->iov_offset;
+   const struct iovec *iov;
+   int err;
+   struct iovec v;
+
+   if (!(i->type & (ITER_BVEC|ITER_KVEC))) {
+   iterate_iovec(i, bytes, v, iov, skip, ({
+   err = fault_in_multipages_readable(v.iov_base,
+   v.iov_len);
+   if (unlikely(err))
+   return err;
+   0;}))
+   }
+   return 0;
+}
+EXPORT_SYMBOL(iov_iter_fault_in_multipages_readable);
+
 void iov_iter_init(struct iov_iter *i, int direction,
const struct iovec *iov, unsigned long nr_segs,
size_t count)
-- 
1.9.3 (Apple Git-50)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] NTFS: Version 2.1.32 - Update file write from aio_write to write_iter.

2015-03-11 Thread anton

From: Anton Altaparmakov 

Signed-off-by: Anton Altaparmakov 
---
 fs/ntfs/Makefile |   2 +-
 fs/ntfs/file.c   | 783 ++-
 2 files changed, 308 insertions(+), 477 deletions(-)

diff --git a/fs/ntfs/Makefile b/fs/ntfs/Makefile
index 36ae529..2ff263e 100644
--- a/fs/ntfs/Makefile
+++ b/fs/ntfs/Makefile
@@ -8,7 +8,7 @@ ntfs-y := aops.o attrib.o collate.o compress.o debug.o dir.o 
file.o \
 
 ntfs-$(CONFIG_NTFS_RW) += bitmap.o lcnalloc.o logfile.o quota.o usnjrnl.o
 
-ccflags-y := -DNTFS_VERSION=\"2.1.31\"
+ccflags-y := -DNTFS_VERSION=\"2.1.32\"
 ccflags-$(CONFIG_NTFS_DEBUG)   += -DDEBUG
 ccflags-$(CONFIG_NTFS_RW)  += -DNTFS_RW
 
diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
index 1da9b2d..29139ff 100644
--- a/fs/ntfs/file.c
+++ b/fs/ntfs/file.c
@@ -1,7 +1,7 @@
 /*
  * file.c - NTFS kernel file operations.  Part of the Linux-NTFS project.
  *
- * Copyright (c) 2001-2014 Anton Altaparmakov and Tuxera Inc.
+ * Copyright (c) 2001-2015 Anton Altaparmakov and Tuxera Inc.
  *
  * This program/include file is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License as published
@@ -329,62 +329,168 @@ err_out:
return err;
 }
 
-/**
- * ntfs_fault_in_pages_readable -
- *
- * Fault a number of userspace pages into pagetables.
- *
- * Unlike include/linux/pagemap.h::fault_in_pages_readable(), this one copes
- * with more than two userspace pages as well as handling the single page case
- * elegantly.
- *
- * If you find this difficult to understand, then think of the while loop being
- * the following code, except that we do without the integer variable ret:
- *
- * do {
- * ret = __get_user(c, uaddr);
- * uaddr += PAGE_SIZE;
- * } while (!ret && uaddr < end);
- *
- * Note, the final __get_user() may well run out-of-bounds of the user buffer,
- * but _not_ out-of-bounds of the page the user buffer belongs to, and since
- * this is only a read and not a write, and since it is still in the same page,
- * it should not matter and this makes the code much simpler.
- */
-static inline void ntfs_fault_in_pages_readable(const char __user *uaddr,
-   int bytes)
+static ssize_t ntfs_prepare_file_for_write(struct file *file, loff_t *ppos,
+   size_t *count)
 {
-   const char __user *end;
-   volatile char c;
-
-   /* Set @end to the first byte outside the last page we care about. */
-   end = (const char __user*)PAGE_ALIGN((unsigned long)uaddr + bytes);
-
-   while (!__get_user(c, uaddr) && (uaddr += PAGE_SIZE, uaddr < end))
-   ;
-}
-
-/**
- * ntfs_fault_in_pages_readable_iovec -
- *
- * Same as ntfs_fault_in_pages_readable() but operates on an array of iovecs.
- */
-static inline void ntfs_fault_in_pages_readable_iovec(const struct iovec *iov,
-   size_t iov_ofs, int bytes)
-{
-   do {
-   const char __user *buf;
-   unsigned len;
+   loff_t pos;
+   s64 end, ll;
+   ssize_t err;
+   unsigned long flags;
+   struct inode *vi = file_inode(file);
+   ntfs_inode *base_ni, *ni = NTFS_I(vi);
+   ntfs_volume *vol = ni->vol;
 
-   buf = iov->iov_base + iov_ofs;
-   len = iov->iov_len - iov_ofs;
-   if (len > bytes)
-   len = bytes;
-   ntfs_fault_in_pages_readable(buf, len);
-   bytes -= len;
-   iov++;
-   iov_ofs = 0;
-   } while (bytes);
+   ntfs_debug("Entering for i_ino 0x%lx, attribute type 0x%x, pos "
+   "0x%llx, count 0x%lx.", vi->i_ino,
+   (unsigned)le32_to_cpu(ni->type),
+   (unsigned long long)*ppos, (unsigned long)*count);
+   /* We can write back this queue in page reclaim. */
+   current->backing_dev_info = inode_to_bdi(vi);
+   err = generic_write_checks(file, ppos, count, S_ISBLK(vi->i_mode));
+   if (unlikely(err))
+   goto out;
+   /*
+* All checks have passed.  Before we start doing any writing we want
+* to abort any totally illegal writes.
+*/
+   BUG_ON(NInoMstProtected(ni));
+   BUG_ON(ni->type != AT_DATA);
+   /* If file is encrypted, deny access, just like NT4. */
+   if (NInoEncrypted(ni)) {
+   /* Only $DATA attributes can be encrypted. */
+   /*
+* Reminder for later: Encrypted files are _always_
+* non-resident so that the content can always be encrypted.
+*/
+   ntfs_debug("Denying write access to encrypted file.");
+   err = -EACCES;
+   goto out;
+   }
+   if (NInoCompressed(ni)) {
+   /* Only unnamed $DATA attribute can be compressed. */
+   BUG_ON(ni->name

Re: [PATCH 10/10] ARM: FIQ: Get rid of init_FIQ()

2012-11-27 Thread Anton Vorontsov

On Thu, Nov 22, 2012 at 11:51:38PM -0800, Anton Vorontsov wrote:
> On Fri, Nov 23, 2012 at 11:36:01AM +0400, Alexander Shiyan wrote:
> [...]
> > got_no_fiq_insn also not initialized.
> 
> It is static, so by definition it is initialized to 0.
> 
> > I think as a whole on this issue requires additional comments from Russell
> > as the author of implementation.
> > 
> > PS: In any case, I can reproduce it and check it out if you send me a 
> > patchset
> > to my email as an attachment.
> 
> Sent privately.

Andrew, Arnd, just FYI, it works for Alexander.

- Forwarded message from Alexander Shiyan  -
Date: Tue, 27 Nov 2012 12:42:50 +0400
From: Alexander Shiyan 
To: Anton Vorontsov 
Subject: Re[2]: FIQ

> Hello Alexander,
> 
> On Tue, Nov 27, 2012 at 12:14:28PM +0400, Alexander Shiyan wrote:
> > Sorry, I lost the original message to the correct answer.
> > Today I had a test of removal init_FIQ.
> > The test took place on a custom board with a custom ARM CLPS711X OSS audio
> > driver, which uses the FIQ interrupts. Everything works fine.
> 
> Nice! Thanks a lot for testing!
> 
> Can you please reply to the original thread, with the
> 
>   Tested-by: Alexander Shiyan 
> ?
> 
> This will help pushing the series forward, plus will give you a credit for
> your efforts.

I am lost thread in my mailbox, you can just copy/paste my reply and ping for 
Arnd.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] Add mempressure cgroup

2012-11-28 Thread Anton Vorontsov

This is an attempt to implement David Rientjes' idea of mempressure
cgroup.

The main characteristics are the same to what I've tried to add to vmevent
API:

  Internally, it uses Mel Gorman's idea of scanned/reclaimed ratio for
  pressure index calculation. But we don't expose the index to the
  userland. Instead, there are three levels of the pressure:

  o low (just reclaiming, e.g. caches are draining);
  o medium (allocation cost becomes high, e.g. swapping);
  o oom (about to oom very soon).

  The rationale behind exposing levels and not the raw pressure index
  described here: http://lkml.org/lkml/2012/11/16/675

The API uses standard cgroups eventfd notifications:

  $ gcc Documentation/cgroups/cgroup_event_listener.c -o \
cgroup_event_listener
  $ cd /sys/fs/cgroup/
  $ mkdir mempressure
  $ mount -t cgroup cgroup ./mempressure -o mempressure
  $ cd mempressure
  $ cgroup_event_listener ./mempressure.level low
  ("low", "medium", "oom" are permitted values.)

  Upon hitting the threshold, you should see "/sys/fs/cgroup/mempressure
  low: crossed" messages.

To test that it actually works on per-cgroup basis, I did a small trick: I
moved all kswapd into a separate cgroup, and hooked the listener onto
another (non-root) cgroup. The listener no longer received global reclaim
pressure, which is expected.

For a task it is possible to be in both cpusets, memcg and mempressure
cgroups, so by rearranging the tasks it should be possible to watch a
specific pressure.

Note that while this adds the cgroups support, the code is well separated
and eventually we might add a lightweight, non-cgroups API, i.e. vmevent.
But this is another story.

Signed-off-by: Anton Vorontsov 
---
 include/linux/cgroup_subsys.h |   6 +
 include/linux/vmstat.h|   8 ++
 init/Kconfig  |   5 +
 mm/Makefile   |   1 +
 mm/mempressure.c  | 287 ++
 mm/vmscan.c   |   3 +
 6 files changed, 310 insertions(+)
 create mode 100644 mm/mempressure.c

diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index f204a7a..b9802e2 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -37,6 +37,12 @@ SUBSYS(mem_cgroup)
 
 /* */
 
+#if IS_SUBSYS_ENABLED(CONFIG_CGROUP_MEMPRESSURE)
+SUBSYS(mpc_cgroup)
+#endif
+
+/* */
+
 #if IS_SUBSYS_ENABLED(CONFIG_CGROUP_DEVICE)
 SUBSYS(devices)
 #endif
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 92a86b2..7698341 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -10,6 +10,14 @@
 
 extern int sysctl_stat_interval;
 
+#ifdef CONFIG_CGROUP_MEMPRESSURE
+extern void vmpressure(ulong scanned, ulong reclaimed);
+extern void vmpressure_prio(int prio);
+#else
+static inline void vmpressure(ulong scanned, ulong reclaimed) {}
+static inline void vmpressure_prio(int prio) {}
+#endif
+
 #ifdef CONFIG_VM_EVENT_COUNTERS
 /*
  * Light weight per cpu counter implementation.
diff --git a/init/Kconfig b/init/Kconfig
index 6fdd6e3..7065e44 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -826,6 +826,11 @@ config MEMCG_KMEM
  the kmem extension can use it to guarantee that no group of processes
  will ever exhaust kernel resources alone.
 
+config CGROUP_MEMPRESSURE
+   bool "Memory pressure monitor for Control Groups"
+   help
+ TODO
+
 config CGROUP_HUGETLB
bool "HugeTLB Resource Controller for Control Groups"
depends on RESOURCE_COUNTERS && HUGETLB_PAGE && EXPERIMENTAL
diff --git a/mm/Makefile b/mm/Makefile
index 6b025f8..40cee19 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -50,6 +50,7 @@ obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_QUICKLIST) += quicklist.o
 obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o
 obj-$(CONFIG_MEMCG) += memcontrol.o page_cgroup.o
+obj-$(CONFIG_CGROUP_MEMPRESSURE) += mempressure.o
 obj-$(CONFIG_CGROUP_HUGETLB) += hugetlb_cgroup.o
 obj-$(CONFIG_MEMORY_FAILURE) += memory-failure.o
 obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
diff --git a/mm/mempressure.c b/mm/mempressure.c
new file mode 100644
index 000..5c85bbe
--- /dev/null
+++ b/mm/mempressure.c
@@ -0,0 +1,287 @@
+/*
+ * Linux VM pressure notifications
+ *
+ * Copyright 2012 Linaro Ltd.
+ *   Anton Vorontsov 
+ *
+ * Based on ideas from David Rientjes, KOSAKI Motohiro, Leonid Moiseichuk,
+ * Mel Gorman, Minchan Kim and Pekka Enberg.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static void mpc_vmpressure(ulong scanned, ulong reclaimed);
+
+/*
+ * Generic VM Pressure routines (no cgroups or any other API details)
+ */
+
+/

Re: [RFC] Add mempressure cgroup

2012-11-28 Thread Anton Vorontsov

On Wed, Nov 28, 2012 at 03:14:32PM -0800, Andrew Morton wrote:
[...]
> Compare this with the shrink_slab() shrinkers.  With these, the VM can
> query and then control the clients.  If something goes wrong or is out
> of balance, it's the VM's problem to solve.
> 
> So I'm thinking that a better design would be one which puts the kernel
> VM in control of userspace scanning and freeing.  Presumably with a
> query-and-control interface similar to the slab shrinkers.

Thanks for the ideas, Andrew.

Query-and-control scheme looks very attractive, and that's actually
resembles my "balance" level idea, when userland tells the kernel how much
reclaimable memory it has. Except the your scheme works in the reverse
direction, i.e. the kernel becomes in charge.

But there is one, rather major issue: we're crossing kernel-userspace
boundary. And with the scheme we'll have to cross the boundary four times:
query / reply-available / control / reply-shrunk / (and repeat if
necessary, every SHRINK_BATCH pages). Plus, it has to be done somewhat
synchronously (all the four stages), and/or we have to make a "userspace
shrinker" thread working in parallel with the normal shrinker, and here,
I'm afraid, we'll see more strange interactions. :)

But there is a good news: for these kind of fine-grained control we have a
better interface, where we don't have to communicate [very often] w/ the
kernel. These are "volatile ranges", where userland itself marks chunks of
data as "I might need it, but I won't cry if you recycle it; but when I
access it next time, let me know if you actually recycled it". Yes,
userland no longer able to decide which exact page it permits to recycle,
but we don't have use-cases when we actually care that much. And if we do,
we'd rather introduce volatile LRUs with different priorities, or
something alike.

So, we really don't need the full-fledged userland shrinker, since we can
just let the in-kernel shrinker do its job. If we work with the
bytes/pages granularity it is just easier (and more efficient in terms of
communication) to do the volatile ranges.

For the pressure notifications use-cases, we don't even know bytes/pages
information: "activity managers" are separate processes looking after
overall system performance.

So, we're not trying to make userland too smart, quite the contrary: we
realized that for this interface we don't want to mess with the bytes and
pages, and that's why we cut this stuff down to only three levels. Before
this, we were actually trying to count bytes, we did not like it and we
ran away screaming.

OTOH, your scheme makes volatile ranges unneeded, since a thread might
register a shrinker hook and free stuff by itself. But again, I believe
this involves more communication with the kernel.

Thanks,
Anton.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Add mempressure cgroup

2012-11-28 Thread Anton Vorontsov

On Wed, Nov 28, 2012 at 05:27:51PM -0800, Anton Vorontsov wrote:
> On Wed, Nov 28, 2012 at 03:14:32PM -0800, Andrew Morton wrote:
> [...]
> > Compare this with the shrink_slab() shrinkers.  With these, the VM can
> > query and then control the clients.  If something goes wrong or is out
> > of balance, it's the VM's problem to solve.
> > 
> > So I'm thinking that a better design would be one which puts the kernel
> > VM in control of userspace scanning and freeing.  Presumably with a
> > query-and-control interface similar to the slab shrinkers.
> 
> Thanks for the ideas, Andrew.
> 
> Query-and-control scheme looks very attractive, and that's actually
> resembles my "balance" level idea, when userland tells the kernel how much
> reclaimable memory it has. Except the your scheme works in the reverse
> direction, i.e. the kernel becomes in charge.
> 
> But there is one, rather major issue: we're crossing kernel-userspace
> boundary. And with the scheme we'll have to cross the boundary four times:
> query / reply-available / control / reply-shrunk / (and repeat if
> necessary, every SHRINK_BATCH pages). Plus, it has to be done somewhat
> synchronously (all the four stages), and/or we have to make a "userspace
> shrinker" thread working in parallel with the normal shrinker, and here,
> I'm afraid, we'll see more strange interactions. :)
> 
> But there is a good news: for these kind of fine-grained control we have a
> better interface, where we don't have to communicate [very often] w/ the
> kernel. These are "volatile ranges", where userland itself marks chunks of
> data as "I might need it, but I won't cry if you recycle it; but when I
> access it next time, let me know if you actually recycled it". Yes,
> userland no longer able to decide which exact page it permits to recycle,
> but we don't have use-cases when we actually care that much. And if we do,
> we'd rather introduce volatile LRUs with different priorities, or
> something alike.
> 
> So, we really don't need the full-fledged userland shrinker, since we can
> just let the in-kernel shrinker do its job. If we work with the
> bytes/pages granularity it is just easier (and more efficient in terms of
> communication) to do the volatile ranges.
> 
> For the pressure notifications use-cases, we don't even know bytes/pages
> information: "activity managers" are separate processes looking after
> overall system performance.
> 
> So, we're not trying to make userland too smart, quite the contrary: we
> realized that for this interface we don't want to mess with the bytes and
> pages, and that's why we cut this stuff down to only three levels. Before
> this, we were actually trying to count bytes, we did not like it and we
> ran away screaming.
> 
> OTOH, your scheme makes volatile ranges unneeded, since a thread might
> register a shrinker hook and free stuff by itself. But again, I believe
> this involves more communication with the kernel.

Btw, I believe your idea is something completely new, and I surely cannot
fully evaluate it on my own -- I might be wrong here. So I invite folks to
express their opinions too.

Guys, it's about Andrew's idea of exposing shrinker-alike logic to the
userland (and I made it 'vs. volatile ranges'):

http://lkml.org/lkml/2012/11/28/607

Thanks,
Anton.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Add mempressure cgroup

2012-11-28 Thread Anton Vorontsov

Hello Michal,

Thanks a lot for taking a look into this!

On Wed, Nov 28, 2012 at 05:29:24PM +0100, Michal Hocko wrote:
> On Wed 28-11-12 02:29:08, Anton Vorontsov wrote:
> > This is an attempt to implement David Rientjes' idea of mempressure
> > cgroup.
> > 
> > The main characteristics are the same to what I've tried to add to vmevent
> > API:
> > 
> >   Internally, it uses Mel Gorman's idea of scanned/reclaimed ratio for
> >   pressure index calculation. But we don't expose the index to the
> >   userland. Instead, there are three levels of the pressure:
> > 
> >   o low (just reclaiming, e.g. caches are draining);
> >   o medium (allocation cost becomes high, e.g. swapping);
> >   o oom (about to oom very soon).
> > 
> >   The rationale behind exposing levels and not the raw pressure index
> >   described here: http://lkml.org/lkml/2012/11/16/675
> > 
> > The API uses standard cgroups eventfd notifications:
> > 
> >   $ gcc Documentation/cgroups/cgroup_event_listener.c -o \
> > cgroup_event_listener
> >   $ cd /sys/fs/cgroup/
> >   $ mkdir mempressure
> >   $ mount -t cgroup cgroup ./mempressure -o mempressure
> >   $ cd mempressure
> >   $ cgroup_event_listener ./mempressure.level low
> >   ("low", "medium", "oom" are permitted values.)
> > 
> >   Upon hitting the threshold, you should see "/sys/fs/cgroup/mempressure
> >   low: crossed" messages.
> > 
> > To test that it actually works on per-cgroup basis, I did a small trick: I
> > moved all kswapd into a separate cgroup, and hooked the listener onto
> > another (non-root) cgroup. The listener no longer received global reclaim
> > pressure, which is expected.
> 
> Is this really expected? So you want to be notified only about the
> direct reclaim?

I didn't try to put much meaning into assinging a task to a non-global
reclaim watchers, I just mentioned this as an easiest way to test that we
actually can account things on per-thread basis. :)

> I am not sure how much useful is that. If you co-mount with e.g. memcg then
> the picture is different because even global memory pressure is spread
> among groups so it would be just a matter of the proper accounting
> (which can be handled similar to lruvec when your code doesn't have to
> care about memcg internally).
> Co-mounting with cpusets makes sense as well because then you get a
> pressure notification based on the placement policy.
> 
> So does it make much sense to mount mempressure on its own without
> co-mounting with other controllers?

Android does not actually need any of these (memcg or cpusets), but we
still want to get notifications (for a root cgroup would be enough for us
-- but I'm trying to make things generic, of course).

> > For a task it is possible to be in both cpusets, memcg and mempressure
> > cgroups, so by rearranging the tasks it should be possible to watch a
> > specific pressure.
> 
> Could you be more specific what you mean by rearranging? Creating a same
> hierarchy? Co-mounting?
> 
> > Note that while this adds the cgroups support, the code is well separated
> > and eventually we might add a lightweight, non-cgroups API, i.e. vmevent.
> > But this is another story.
> 
> I think it would be nice to follow freezer and split this into 2 files.
> Generic and cgroup spefici.

Yeah, this is surely an option, but so far it's only a few hundrends lines
of code, plus we don't have any other users for the "internals". So, for
the time being, I'd rather keep it in one file.

> > Signed-off-by: Anton Vorontsov 
> > ---
> [...]
> > +/* These are defaults. Might make them configurable one day. */
> > +static const uint vmpressure_win = SWAP_CLUSTER_MAX * 16;
> 
> I realize this is just an RFC but could you be more specific what is the
> meaning of vmpressure_win?

Sure, let me just copy the text from the previous RFC, to which you were
not Cc'ed:

 When the system is short on idle pages, the new memory is allocated by
 reclaiming least recently used resources: kernel scans pages to be
 reclaimed (e.g. from file caches, mmap(2) volatile ranges, etc.; and
 potentially swapping some pages out). The index shows the relative time
 spent by the kernel uselessly scanning pages, or, in other words, the
 percentage of scans of pages (vmpressure_window) that were not reclaimed.
 ...
 Window size is used as a rate-limit tunable for VMPRESSURE_LOW
 notifications and for averaging for VMPRESSURE_{MEDIUM,OOM} levels. So,
 using small window sizes can cause lot of false positives for _MEDIUM and
 _OOM levels, but too big window size m

Re: [RFC] Add mempressure cgroup

2012-11-28 Thread Anton Vorontsov

On Thu, Nov 29, 2012 at 08:14:13AM +0200, Kirill A. Shutemov wrote:
> On Wed, Nov 28, 2012 at 02:29:08AM -0800, Anton Vorontsov wrote:
> > +static int mpc_pre_destroy(struct cgroup *cg)
> > +{
> > +   struct mpc_state *mpc = cg2mpc(cg);
> > +   int ret = 0;
> > +
> > +   mutex_lock(&mpc->lock);
> > +
> > +   if (mpc->eventfd)
> > +   ret = -EBUSY;
> 
> cgroup_rmdir() will unregister all events for you. No need to handle it
> here.

Okie, thanks!

[...]
> > +static int mpc_register_level_event(struct cgroup *cg, struct cftype *cft,
> > +   struct eventfd_ctx *eventfd,
> > +   const char *args)
> > +{
> > +   struct mpc_state *mpc = cg2mpc(cg);
> > +   int i;
> > +   int ret;
> > +
> > +   mutex_lock(&mpc->lock);
> > +
> > +   /*
> > +* It's easy to implement multiple thresholds, but so far we don't
> > +* need it.
> > +*/
> > +   if (mpc->eventfd) {
> > +   ret = -EBUSY;
> > +   goto out_unlock;
> > +   }
> 
> One user which listen for one threashold per cgroup?
> I think it's wrong. It's essensial for API to serve multiple users.

Yea, if we'll consider merging this, I'll definitely fix this. Just didn't
want to bring the complexity into the code.

Thanks,
Anton.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 1/3] gpio: gpio-generic: Add 16 and 32 bit big endian byte order support

2013-03-18 Thread Anton Vorontsov

On Fri, Mar 15, 2013 at 02:45:38PM +0100, Andreas Larsson wrote:
> There is no general support for 64-bit big endian accesses, so that is
> left unsupported.
> 
> Signed-off-by: Andreas Larsson 

This looks perfect, thanks a lot!

Acked-by: Anton Vorontsov 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] AB8500 Power and MFD related updates

2013-03-18 Thread Anton Vorontsov

On Thu, Mar 07, 2013 at 12:38:59PM +0800, Lee Jones wrote:
> The following changes since commit 6dbe51c251a327e012439c4772097a13df43c5b8:
> 
>   Linux 3.9-rc1 (2013-03-03 15:11:05 -0800)
> 
> are available in the git repository at:
> 
>   git://git.linaro.org/people/ljones/linux-3.0-ux500.git for-mfd-and-power

Pulled, thanks a lot!

Anton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Patch v3 0/4] Add support for tps65090-charger

2013-03-18 Thread Anton Vorontsov

On Tue, Mar 12, 2013 at 06:08:05PM -0400, Rhyland Klein wrote:
> This patchset adds support for the TPS65090-charger. This charger is
> registered as a subdevice of the TPS65090 PMIC.
> 
> This series includes a fix for the tps65090 header file where all
> the interrupts were shifted by 1.

Thanks for the patches, Rhyland! 3/4 and 4/4 applied to the
battery-2.6.git tree.

Anton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A patch to test_power module

2013-03-18 Thread Anton Vorontsov

On Sun, Feb 03, 2013 at 01:17:16PM +0200, Andrey Gelman wrote:
[...]
> >The patch looks good, but it is missing Signed-off-by tag. Plus, you
> >should also Cc: linux-kernel@vger.kernel.org on submissions (I've added it
> >now).
> >
> >Plus, please don't use html in emails. :)
[...]
> Fixed ... I think.

Thanks a lot for the patch, Andrey! It is now in battery-2.6.git tree.

Anton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] fs: pstore: Replaced calls to kmalloc and memcpy with kmemdup

2013-03-18 Thread Anton Vorontsov

On Sun, Mar 10, 2013 at 04:23:01PM -0700, Kees Cook wrote:
> On Sat, Mar 9, 2013 at 3:57 AM, Alexandru Gheorghiu
>  wrote:
> > Replaced calls to kmalloc and memcpy with a single call to kmemdup.
> > This patch was found using coccicheck.
> >
> > Signed-off-by: Alexandru Gheorghiu 
[...]
> Looks fine to me. Thanks!
> 
> Acked-by: Kees Cook 

Applied to linux-pstore.git, thanks a lot!

Anton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] s3c-adc-battery: Fix possible NULL pointer dereference

2013-03-18 Thread Anton Vorontsov

On Mon, Feb 25, 2013 at 04:33:25AM +0530, Syam Sidhardhan wrote:
> Check for (bat == NULL) has to be done before accessing bat
> 
> Signed-off-by: Syam Sidhardhan 
> ---

Applied, thanks!

>  drivers/power/s3c_adc_battery.c |7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/power/s3c_adc_battery.c b/drivers/power/s3c_adc_battery.c
> index d2ca989..5948ce0 100644
> --- a/drivers/power/s3c_adc_battery.c
> +++ b/drivers/power/s3c_adc_battery.c
> @@ -145,14 +145,17 @@ static int s3c_adc_bat_get_property(struct power_supply 
> *psy,
>  
>   int new_level;
>   int full_volt;
> - const struct s3c_adc_bat_thresh *lut = bat->pdata->lut_noac;
> - unsigned int lut_size = bat->pdata->lut_noac_cnt;
> + const struct s3c_adc_bat_thresh *lut;
> + unsigned int lut_size;
>  
>   if (!bat) {
>   dev_err(psy->dev, "no battery infos ?!\n");
>   return -EINVAL;
>   }
>  
> + lut = bat->pdata->lut_noac;
> + lut_size = bat->pdata->lut_noac_cnt;
> +
>   if (bat->volt_value < 0 || bat->cur_value < 0 ||
>   jiffies_to_msecs(jiffies - bat->timestamp) >
>   BAT_POLL_INTERVAL) {
> -- 
> 1.7.9.5
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Patch v3 0/4] Add support for tps65090-charger

2013-03-19 Thread Anton Vorontsov

On Tue, Mar 19, 2013 at 11:55:33AM -0400, Rhyland Klein wrote:
> On 3/18/2013 10:24 PM, Anton Vorontsov wrote:
> > On Tue, Mar 12, 2013 at 06:08:05PM -0400, Rhyland Klein wrote:
> >> This patchset adds support for the TPS65090-charger. This charger is
> >> registered as a subdevice of the TPS65090 PMIC.
> >>
> >> This series includes a fix for the tps65090 header file where all
> >> the interrupts were shifted by 1.
> > 
> > Thanks for the patches, Rhyland! 3/4 and 4/4 applied to the
> > battery-2.6.git tree.
> > 
> I was planning on posting a slightly changed v4 sometime soon which took
> care of some concerns on the Documentation and added a compatibility
> entry for the charger. Do you want me to send that still and you can
> replace v3 with v4?

I would rather avoid replacing, i.e. I'd prefer if you send an update on
top of the v3 (incremental patch). If the changes were inspired by
someone, feel free to add a Suggested-by: tag. ;-)

Thanks!

Anton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND] goldfish: power device

2013-02-02 Thread Anton Vorontsov

On Fri, Jan 25, 2013 at 03:30:00PM +, Alan Cox wrote:
> From: Mike Lockwood 
> 
> Add the emulated power driver for the Goldfish platform.
> 
> This folds together the code from the Google tree, Jun Nakajima's cleanups
> and x86 porting work, and then a tidy up to pass checkpatch.
> 
> Signed-off-by: Mike A. Chan 
> [cleanup and x86 support]
> Signed-off-by: Sheng Yang 
> Signed-off-by: Yunhong Jiang 
> Signed-off-by: Xiaohui Xin 
> Signed-off-by: Jun Nakajima 
> Signed-off-by: Bruce Beare 
> [ported to 3.4]
> Signed-off-by: Tom Keel 
> [ported to 3.7 and final tidy]
> Signed-off-by: Alan Cox 

Applied, thank you!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ds2782_battery: Add power_supply_changed() calls for proper uevent support

2013-02-02 Thread Anton Vorontsov

From: Evgeny Romanov 

This patch affects on Android battery indicator. Battery driver should
send uevent message when battery status changes in order to get Android
battery level dynamically updated. Delayed work was added to periodically
check battery status and capacity.

Signed-off-by: Evgeny Romanov 
Signed-off-by: Anton Vorontsov 
---

Sending it to the community on Evgeny's behalf.

 drivers/power/ds2782_battery.c | 69 ++
 1 file changed, 69 insertions(+)

diff --git a/drivers/power/ds2782_battery.c b/drivers/power/ds2782_battery.c
index 2fa9b6b..e7301b3 100644
--- a/drivers/power/ds2782_battery.c
+++ b/drivers/power/ds2782_battery.c
@@ -7,6 +7,8 @@
  *
  * DS2786 added by Yulia Vilensky 
  *
+ * UEvent sending added by Evgeny Romanov 
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
@@ -19,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -40,6 +43,8 @@
 
 #define DS2786_CURRENT_UNITS   25
 
+#define DS278x_DELAY   1000
+
 struct ds278x_info;
 
 struct ds278x_battery_ops {
@@ -54,8 +59,11 @@ struct ds278x_info {
struct i2c_client   *client;
struct power_supply battery;
struct ds278x_battery_ops  *ops;
+   struct delayed_work bat_work;
int id;
int rsns;
+   int capacity;
+   int status; /* State Of Charge */
 };
 
 static DEFINE_IDR(battery_id);
@@ -220,6 +228,8 @@ static int ds278x_get_status(struct ds278x_info *info, int 
*status)
if (err)
return err;
 
+   info->capacity = capacity;
+
if (capacity == 100)
*status = POWER_SUPPLY_STATUS_FULL;
else if (current_uA == 0)
@@ -267,6 +277,27 @@ static int ds278x_battery_get_property(struct power_supply 
*psy,
return ret;
 }
 
+static void ds278x_bat_update(struct ds278x_info *info)
+{
+   int old_status = info->status;
+   int old_capacity = info->capacity;
+
+   ds278x_get_status(info, &info->status);
+
+   if ((old_status != info->status) || (old_capacity != info->capacity))
+   power_supply_changed(&info->battery);
+}
+
+static void ds278x_bat_work(struct work_struct *work)
+{
+   struct ds278x_info *info;
+
+   info = container_of(work, struct ds278x_info, bat_work.work);
+   ds278x_bat_update(info);
+
+   schedule_delayed_work(&info->bat_work, DS278x_DELAY);
+}
+
 static enum power_supply_property ds278x_battery_props[] = {
POWER_SUPPLY_PROP_STATUS,
POWER_SUPPLY_PROP_CAPACITY,
@@ -295,10 +326,39 @@ static int ds278x_battery_remove(struct i2c_client 
*client)
idr_remove(&battery_id, info->id);
mutex_unlock(&battery_lock);
 
+   cancel_delayed_work(&info->bat_work);
+
kfree(info);
return 0;
 }
 
+#ifdef CONFIG_PM
+
+static int ds278x_suspend(struct i2c_client *client,
+   pm_message_t state)
+{
+   struct ds278x_info *info = i2c_get_clientdata(client);
+
+   cancel_delayed_work(&info->bat_work);
+   return 0;
+}
+
+static int ds278x_resume(struct i2c_client *client)
+{
+   struct ds278x_info *info = i2c_get_clientdata(client);
+
+   schedule_delayed_work(&info->bat_work, DS278x_DELAY);
+   return 0;
+}
+
+#else
+
+#define ds278x_suspend NULL
+#define ds278x_resume NULL
+
+#endif /* CONFIG_PM */
+
+
 enum ds278x_num_id {
DS2782 = 0,
DS2786,
@@ -368,10 +428,17 @@ static int ds278x_battery_probe(struct i2c_client *client,
info->ops  = &ds278x_ops[id->driver_data];
ds278x_power_supply_init(&info->battery);
 
+   info->capacity = 100;
+   info->status = POWER_SUPPLY_STATUS_FULL;
+
+   INIT_DELAYED_WORK(&info->bat_work, ds278x_bat_work);
+
ret = power_supply_register(&client->dev, &info->battery);
if (ret) {
dev_err(&client->dev, "failed to register battery\n");
goto fail_register;
+   } else {
+   schedule_delayed_work(&info->bat_work, DS278x_DELAY);
}
 
return 0;
@@ -401,6 +468,8 @@ static struct i2c_driver ds278x_battery_driver = {
},
.probe  = ds278x_battery_probe,
.remove = ds278x_battery_remove,
+   .suspend= ds278x_suspend,
+   .resume = ds278x_resume,
.id_table   = ds278x_id,
 };
 module_i2c_driver(ds278x_battery_driver);
-- 
1.8.1.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ab8500: btemp: demote initcall sequence

2013-02-02 Thread Anton Vorontsov

On Wed, Jan 23, 2013 at 09:56:45AM +0530, Rajanikanth H.V wrote:
> From: "Rajanikanth H.V" 
> 
> Power supply subsystem creates thermal zone device for
> the property 'POWER_SUPPLY_PROP_TEMP' which requires
> thermal subsystem to be ready before 'ab8500 battery temperature monitor'
> driver is initialized.
> ab8500 btemp driver is initialized with subsys_initcall whereas thermal
> subsystem is initialized with fs_initcall which causes
> thermal_zone_device_register(...) to crash since the required structure
> 'thermal_class' is not initialized yet.
> 
> crash log:
[...]
> Signed-off-by: Rajanikanth H.V 
> ---

Applied and added Cc: stable.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] power: ab8500-bm: Latest Mainline<->STE delta reduction patch-set

2013-02-02 Thread Anton Vorontsov

On Thu, Jan 31, 2013 at 12:08:33PM +, Lee Jones wrote:
> > Okay, after Linus Walleij's diligent observaions, I have removed all
> > new functionallity related to Deep Debug and will continue to
> > investigate what I need to do in that regard.
> > 
> > In the mean-time, here's the fresh deep-debug:less pull-request:
> > 
> > 
> > 
> > The following changes since commit 8fd526fd18233887ba652079a369f4eee0de9d9d:
> > 
> >   qnap-poweroff: Fix license string (2013-01-19 18:04:04 -0800)
> > 
> > are available in the git repository at:
> > 
> >   git://git.linaro.org/people/ljones/linux-3.0-ux500.git tb-power-2
> > 
> > for you to fetch changes up to 34c11a709e928090cf34ecd706f7d3170f4e5026:
> > 
> >   u8500-charger: Delay for USB enumeration (2013-01-23 14:39:22 +)
> 
> Did you pull this in the end?

I was buried under my homework, sorry. :-) The changes look good, thanks a
lot! Pulled now.

Anton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A patch to test_power module

2013-02-02 Thread Anton Vorontsov

On Sun, Jan 27, 2013 at 10:45:43PM +0200, Andrey Gelman wrote:
> Hello !
> This patch fixes 2 issues in test_power module:
> 1. the mapping of ac_online parameter to its string value "on / off"
> was swapped.
> 2. more important ...
> You could not insmod test_power with module parameters, without
> causing kernel crash.
> The reason is that in test_power, setting module parameter value has
> side effect - power_supply_changed event.
> As the module parameters are processed prior to calling module_init,
> power_supply_changed event is generated on behalf module that is not
> yet initialized, causing some NULL pointer dereference.
> 
> My solution is to test whether the module has been initialized,
> prior to calling power_supply_changed.

Hi!

The patch looks good, but it is missing Signed-off-by tag. Plus, you
should also Cc: linux-kernel@vger.kernel.org on submissions (I've added it
now).

Plus, please don't use html in emails. :)

Thanks a lot!

Anton

> From ccd5b3e916477a3c2a51bbf10a3be8e01f3c85e4 Mon Sep 17 00:00:00 2001
> From: Andrey Gelman 
> Date: Sun, 27 Jan 2013 22:16:33 +0200
> Subject: [PATCH] test_power: fix a bug in setting module parameter values
> 
> When the kernel loads a module, module params are processed
> prior to calling module_init. As a result, setting module param
> value should not have side effects, at least as long as the
> module has not been initialized.
> ---
>  drivers/power/test_power.c |   31 +--
>  1 file changed, 21 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/power/test_power.c b/drivers/power/test_power.c
> index b99a452..753a74f 100644
> --- a/drivers/power/test_power.c
> +++ b/drivers/power/test_power.c
> @@ -30,6 +30,8 @@ static int battery_technology   = 
> POWER_SUPPLY_TECHNOLOGY_LION;
>  static int battery_capacity  = 50;
>  static int battery_voltage   = 3300;
>  
> +static bool module_initialized   = false;
> +
>  static int test_power_get_ac_property(struct power_supply *psy,
> enum power_supply_property psp,
> union power_supply_propval *val)
> @@ -185,6 +187,7 @@ static int __init test_power_init(void)
>   }
>   }
>  
> + module_initialized = true;
>   return 0;
>  failed:
>   while (--i >= 0)
> @@ -209,6 +212,8 @@ static void __exit test_power_exit(void)
>  
>   for (i = 0; i < ARRAY_SIZE(test_power_supplies); i++)
>   power_supply_unregister(&test_power_supplies[i]);
> +
> + module_initialized = false;
>  }
>  module_exit(test_power_exit);
>  
> @@ -221,8 +226,8 @@ struct battery_property_map {
>  };
>  
>  static struct battery_property_map map_ac_online[] = {
> - { 0,  "on"  },
> - { 1,  "off" },
> + { 0,  "off"  },
> + { 1,  "on" },
>   { -1, NULL  },
>  };
>  
> @@ -295,10 +300,16 @@ static const char *map_get_key(struct 
> battery_property_map *map, int value,
>   return def_key;
>  }
>  
> +static inline void signal_power_supply_changed(struct power_supply *psy)
> +{
> + if (module_initialized)
> + power_supply_changed(psy);
> +}
> +
>  static int param_set_ac_online(const char *key, const struct kernel_param 
> *kp)
>  {
>   ac_online = map_get_value(map_ac_online, key, ac_online);
> - power_supply_changed(&test_power_supplies[0]);
> + signal_power_supply_changed(&test_power_supplies[0]);
>   return 0;
>  }
>  
> @@ -311,7 +322,7 @@ static int param_get_ac_online(char *buffer, const struct 
> kernel_param *kp)
>  static int param_set_usb_online(const char *key, const struct kernel_param 
> *kp)
>  {
>   usb_online = map_get_value(map_ac_online, key, usb_online);
> - power_supply_changed(&test_power_supplies[2]);
> + signal_power_supply_changed(&test_power_supplies[2]);
>   return 0;
>  }
>  
> @@ -325,7 +336,7 @@ static int param_set_battery_status(const char *key,
>   const struct kernel_param *kp)
>  {
>   battery_status = map_get_value(map_status, key, battery_status);
> - power_supply_changed(&test_power_supplies[1]);
> + signal_power_supply_changed(&test_power_supplies[1]);
>   return 0;
>  }
>  
> @@ -339,7 +350,7 @@ static int param_set_battery_health(const char *key,
>   const struct kernel_param *kp)
>  {
>   battery_health = map_get_value(map_health, key, battery_health);
> - power_supply_ch

Re: [PATCH] power/reset: Remove newly introduced __dev* annotations

2013-02-02 Thread Anton Vorontsov

On Wed, Jan 30, 2013 at 12:39:16PM +0100, Andrew Lunn wrote:
> On Wed, Jan 30, 2013 at 12:28:13PM +0100, Thierry Reding wrote:
> > __devinit, __devexit and __devexit_p have recently been removed and
> > should no longer be used.
> > 
> > Signed-off-by: Thierry Reding 
> > ---
[...]
> Acked-by: Andrew Lunn 

Applied, thanks!

Anton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] bq27x00_battery: Fix reporting battery temperature

2013-02-02 Thread Anton Vorontsov

On Sat, Feb 02, 2013 at 11:06:09AM +0100, Pali Rohár wrote:
> Reported temperature can be also negative, so cache value in non negative 
> Kelvin degree.
> 
> Signed-off-by: Pali Rohár 
> ---

Applied, thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 40/62] power: convert to idr_alloc()

2013-02-02 Thread Anton Vorontsov

On Sat, Feb 02, 2013 at 05:20:41PM -0800, Tejun Heo wrote:
> Convert to the much saner new idr interface.
> 
> Only compile tested.
> 
> Signed-off-by: Tejun Heo 
> Cc: Anton Vorontsov 
> Cc: David Woodhouse 
> ---
> This patch depends on an earlier idr changes and I think it would be
> best to route these together through -mm.  Please holler if there's
> any objection.  Thanks.

Sure thing, Ack for these:

>  drivers/power/bq2415x_charger.c | 11 +++
>  drivers/power/bq27x00_battery.c |  9 +++--
>  drivers/power/ds2782_battery.c  |  9 ++---
>  3 files changed, 8 insertions(+), 21 deletions(-)

Thanks!

Anton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] bq27x00_battery: Fix reporting battery temperature

2013-02-03 Thread Anton Vorontsov

On Sun, Feb 03, 2013 at 09:01:54PM +0100, Pali Rohár wrote:
> On Sunday 03 February 2013 04:44:51 Anton Vorontsov wrote:
> > On Sat, Feb 02, 2013 at 11:06:09AM +0100, Pali Rohár wrote:
> > > Reported temperature can be also negative, so cache value in
> > > non negative Kelvin degree.
> > > 
> > > Signed-off-by: Pali Rohár 
> > > ---
> > 
> > Applied, thanks!
> 
> Now I looked at bq27x00_battery and rx51_battery drivers and I 
> see that both drivers reporting temperature in different units. 
> bq27x00_battery in 1/10 °C and rx51_battery in 1/100 °C. What is 
> correct degree for kernel power power supply API? Maybe other 
> kernel drivers have different units too... Note that my above 
> patch did not changed anything units, only fixed reporting 
> (possible) negative temperature.

Per Documentation/power/power_supply_class.txt and power_supply.h:

/*
 * All voltages, currents, charges, energies, time and temperatures in uV,
 * µA, µAh, µWh, seconds and tenths of degree Celsius unless otherwise
 * stated. It's driver's job to convert its raw values to units in which
 * this class operates.
 */

Feel free to fix the offending drivers.

Thanks,

Anton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/4] audit: Syscall rules are not applied to existing processes on non-x86

2013-02-06 Thread Anton Blanchard


Hi,

Just following up on this. I've had a few people complaining about
audit being broken on ppc64 and it would be nice to fix.

Anton
--

On Wed, 9 Jan 2013 10:46:17 +1100
Anton Blanchard  wrote:

> 
> Commit b05d8447e782 (audit: inline audit_syscall_entry to reduce
> burden on archs) changed audit_syscall_entry to check for a dummy
> context before calling __audit_syscall_entry. Unfortunately the dummy
> context state is maintained in __audit_syscall_entry so once set it
> never gets cleared, even if the audit rules change.
> 
> As a result, if there are no auditing rules when a process starts
> then it will never be subject to any rules added later. x86 doesn't
> see this because it has an assembly fast path that calls directly into
> __audit_syscall_entry.
> 
> I noticed this issue when working on audit performance optimisations.
> I wrote a set of simple test cases available at:
> 
> http://ozlabs.org/~anton/junkcode/audit_tests.tar.gz
> 
> 02_new_rule.py fails without the patch and passes with it. The
> test case clears all rules, starts a process, adds a rule then
> verifies the process produces a syscall audit record.
> 
> Signed-off-by: Anton Blanchard 
> Cc:  # 3.3+
> ---
> 
> Index: b/include/linux/audit.h
> ===
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -119,7 +119,7 @@ static inline void audit_syscall_entry(i
>  unsigned long a1, unsigned
> long a2, unsigned long a3)
>  {
> - if (unlikely(!audit_dummy_context()))
> + if (unlikely(current->audit_context))
>   __audit_syscall_entry(arch, major, a0, a1, a2, a3);
>  }
>  static inline void audit_syscall_exit(void *pt_regs)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 7/7] spi: spi-fsl-spi: Add support for gpio chipselects for GRLIB type cores

2013-02-07 Thread Anton Vorontsov

On Thu, Feb 07, 2013 at 02:12:11PM +0100, Andreas Larsson wrote:
> This relies upon of_spi_register_master to find out which gpios to use.
> 
> Signed-off-by: Andreas Larsson 
> ---
>  drivers/spi/spi-fsl-lib.h |1 +
>  drivers/spi/spi-fsl-spi.c |   53 +++-
>  2 files changed, 48 insertions(+), 6 deletions(-)

Just a couple of minor nits...

> diff --git a/drivers/spi/spi-fsl-lib.h b/drivers/spi/spi-fsl-lib.h
> index d5c788b..52db693 100644
> --- a/drivers/spi/spi-fsl-lib.h
> +++ b/drivers/spi/spi-fsl-lib.h
> @@ -71,6 +71,7 @@ struct mpc8xxx_spi {
>  
>  #ifdef CONFIG_SPI_FSL_SPI
>   int type;
> + int native_chipselects;
>   u8 max_bits_per_word;
>  
>   void (*set_shifts)(u32 *rx_shift, u32 *tx_shift,
> diff --git a/drivers/spi/spi-fsl-spi.c b/drivers/spi/spi-fsl-spi.c
> index 55652e6..2befe16 100644
> --- a/drivers/spi/spi-fsl-spi.c
> +++ b/drivers/spi/spi-fsl-spi.c
> @@ -418,7 +418,7 @@ static int fsl_spi_setup(struct spi_device *spi)
>  {
>   struct mpc8xxx_spi *mpc8xxx_spi;
>   struct fsl_spi_reg *reg_base;
> - int retval;
> + int retval, desel;

We don't usually place variable declarations on the same line, unless the
variables are closely related.

>   u32 hw_mode;
>   struct spi_mpc8xxx_cs   *cs = spi->controller_state;
>  
> @@ -456,12 +456,45 @@ static int fsl_spi_setup(struct spi_device *spi)
>   return retval;
>   }
>  
> + if (mpc8xxx_spi->type == TYPE_GRLIB) {
> + if (gpio_is_valid(spi->cs_gpio)) {

<- You can place the 'int desel;' here, limiting the visibility for it.

> + retval = gpio_request(spi->cs_gpio,
> +   dev_name(&spi->dev));
> + if (retval)
> + return retval;
> +
> + desel = !(spi->mode & SPI_CS_HIGH);
> + desel ^= !!(spi->cs_gpio_flags & OF_GPIO_ACTIVE_LOW);
> + retval = gpio_direction_output(spi->cs_gpio, desel);
> + if (retval) {
> + gpio_free(spi->cs_gpio);
> + return retval;
> + }

Thanks,
Anton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 0/7] spi: spi-fsl-spi: Make spi-fsl-spi usable in cpu mode outside of FSL SOC environments and add a grlib variant normally running on sparc

2013-02-07 Thread Anton Vorontsov

On Thu, Feb 07, 2013 at 02:12:04PM +0100, Andreas Larsson wrote:
> This makes the cpu mode of the driver available outside of an FSL SOC
> and even powerpc environment. Furthermore, this adds support for the
> mostly register-compatible SPICTRL core from the GRLIB VHDL IP core
> library normally running on SPARC.

The patches look clean and neat, I don't see anything obviously wrong with
them... so,

Acked-by: Anton Vorontsov 

Thanks!

Anton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] da9030_battery: include notifier.h

2013-02-08 Thread Anton Vorontsov

On Wed, Feb 06, 2013 at 03:09:59PM +0100, Michal Hocko wrote:
> Ohh, I have just noticed that this could be introduced by "mm: break
> circular include from linux/mmzone.h" in mm tree.
> Adding Andrew to CC.

The patch is legit, actually. So, I've applied it, thank you!

Anton

> On Wed 06-02-13 10:14:58, Michal Hocko wrote:
> > randconfig complains about:
> > drivers/power/da9030_battery.c:113: error: field ‘nb’ has incomplete type
> > 
> > because there is no direct include for notifier.h which defines
> > struct notifier_block.
> > 
> > Signed-off-by: Michal Hocko 
> > ---
> >  drivers/power/da9030_battery.c |1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/power/da9030_battery.c b/drivers/power/da9030_battery.c
> > index 94762e6..e8c5a39 100644
> > --- a/drivers/power/da9030_battery.c
> > +++ b/drivers/power/da9030_battery.c
> > @@ -22,6 +22,7 @@
> >  
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #define DA9030_FAULT_LOG   0x0a
> >  #define DA9030_FAULT_LOG_OVER_TEMP (1 << 7)
> > -- 
> > 1.7.10.4
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 
> -- 
> Michal Hocko
> SUSE Labs
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] bq27x00_battery: Fix reporting battery temperature

2013-02-08 Thread Anton Vorontsov

On Wed, Feb 06, 2013 at 06:56:34PM +0100, Pali Rohár wrote:
[...]
> >  * All voltages, currents, charges, energies, time and
> > temperatures in uV, * µA, µAh, µWh, seconds and tenths of
> > degree Celsius unless otherwise * stated. It's driver's job
[...]
> bq27x00_battery reporting "POWER_SUPPLY_TEMP=2700"
> and rx51_battery reporting "POWER_SUPPLY_TEMP=3590"
[...]
> Subject: [PATCH] bq27x00_battery: Report temperature in 1/100 degree Celsius

Hm. The documentation says tenth (1/10) degrees, and you even restate it
in the commit message. But the subject, and your example seem to prove
that you still report it in 1/100 of Celsius.

Unless your phone was on fire during the time you took the values, I tend
to think the patch needs to be fixed. :-)

Thanks,
Anton

> * Documentation/power/power_supply_class.txt say that temperature must be 
> reported in tenths of degree 
> Celsius
> 
> Signed-off-by: Pali Rohár 
> ---
>  drivers/power/bq27x00_battery.c |8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/power/bq27x00_battery.c b/drivers/power/bq27x00_battery.c
> index 5b077af..482755f 100644
> --- a/drivers/power/bq27x00_battery.c
> +++ b/drivers/power/bq27x00_battery.c
> @@ -312,8 +312,10 @@ static int bq27x00_battery_read_temperature(struct 
> bq27x00_device_info *di)
>   return temp;
>   }
>  
> - if (!bq27xxx_is_chip_version_higher(di))
> - temp = 5 * temp / 2;
> + if (bq27xxx_is_chip_version_higher(di))
> + temp *= 10;
> + else
> + temp *= 25;
>  
>   return temp;
>  }
> @@ -640,7 +642,7 @@ static int bq27x00_battery_get_property(struct 
> power_supply *psy,
>   case POWER_SUPPLY_PROP_TEMP:
>   ret = bq27x00_simple_value(di->cache.temperature, val);
>   if (ret == 0)
> - val->intval -= 2731;
> + val->intval -= 27310;
>   break;
>   case POWER_SUPPLY_PROP_TIME_TO_EMPTY_NOW:
>   ret = bq27x00_simple_value(di->cache.time_to_empty, val);
> -- 
> 1.7.10.4
> 
> -- 
> Pali Rohár
> pali.ro...@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] gpio: Add device driver for GRGPIO cores

2013-02-09 Thread Anton Vorontsov

On Mon, Feb 04, 2013 at 11:27:33AM +0100, Andreas Larsson wrote:
[...]
> >>2) The grgpio_to_irq function is very hardware specific, and there is of
> >>course no gpio_to_irq support in gpio-generic.
> >
> >Well, the idea about gpio-generic is to use the pieces you need
> >IIRC. You may override.
> 
> Ah, I see, bgpio_init is exported, so I might be able use that from
> my driver to get rid of some functions. There is support for BE I
> saw now. It seems broken to me (flips bits, not bytes), but that can
> be fixed.

No, it is not broken. :-) There is a BE-notation of GPIO numbering. BE/LE
issue is also valid for bits. Please don't "fix" this. :)

If you need to introduce byte endianness support in addition, that's fine.

Thanks,

Anton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] memcg: Add memory.pressure_level events

2013-02-10 Thread Anton Vorontsov

With this patch userland applications that want to maintain the
interactivity/memory allocation cost can use the new pressure level
notifications. The levels are defined like this:

The "low" level means that the system is reclaiming memory for new
allocations. Monitoring reclaiming activity might be useful for
maintaining overall system's cache level. Upon notification, the program
(typically "Activity Manager") might analyze vmstat and act in advance
(i.e. prematurely shutdown unimportant services).

The "medium" level means that the system is experiencing medium memory
pressure, there is some mild swapping activity. Upon this event
applications may decide to analyze vmstat/zoneinfo/memcg or internal
memory usage statistics and free any resources that can be easily
reconstructed or re-read from a disk.

The "critical" level means that the system is actively thrashing, it is
about to out of memory (OOM) or even the in-kernel OOM killer is on its
way to trigger. Applications should do whatever they can to help the
system. It might be too late to consult with vmstat or any other
statistics, so it's advisable to take an immediate action.

The events are propagated upward until the event is handled, i.e. the
events are not pass-through. Here is what this means: for example you have
three cgroups: A->B->C. Now you set up an event listener on cgroup A and
cgroup B, and suppose group C experiences some pressure. In this
situation, only group B will receive the notification, i.e. group A will
not receive it. This is done to avoid excessive "broadcasting" of
messages, which disturbs the system and which is especially bad if we are
low on memory or thrashing. So, organize the cgroups wisely, or propagate
the events manually (or, ask us to implement the pass-through events,
explaining why would you need them.)

The file mempressure.level is used to show the current memory pressure
level, and cgroups event control file can be used to setup an eventfd
notification with a specific memory pressure level threshold.

Signed-off-by: Anton Vorontsov 
Acked-by: Kirill A. Shutemov 
---

Hi all,

Here comes another iteration of the memory pressure saga. The previous
version of the patch (and discussion) can be found here:

http://lkml.org/lkml/2013/1/4/55

And here are changes in this revision:

- Andrew Morton was concerned that the mempressure stuff was tied to
  memcg, which was non-issue since mempressure wasn't actually bolted into
  memcg at that time. But now it is. :) So now you need memcg to use
  mempressure. Why? It makes things easier, simpler (e.g. this ends any
  questions on how two different cgroups would interact, which can be
  complex when two are distinct entities). Plus, as I understood it,
  that's how cgroup folks want to see it eventually;

- Only cgroups API implemented. Let's start with making memcg people
  happy, i.e. handling the most complex cases, and then we can start with
  any niche solutions;

- Implemented Minchan Kim's idea of checking gfp mask. Unfortunately, it
  is not as simple as checking '__GFP_HIGHMEM | __GFP_MOVABLE', since we
  also need to account files caches and kswapd reclaim. But even so we can
  filter out DMA or atomic allocations, which are not interesting for
  userland. Plus it opens doors for other gfp tuning, so definitely a good
  stuff;

- Per Leonid Moiseichuk's comments decreased vmpressure_level_critical to
  95. I didn't look close enough, but it seems that we the minimum step is
  indeed ~3%, and 99% makes it actually 100%. 95% should be fine;

- Per Kamezawa Hiroyuki added some words into documentation about that
  it's always a good idea to consult with vmstat/zoneinfo/memcg statistics
  before taking any action (with the exception of critical level). Also
  added 'TODO' wrt. automatic window adjustment;

- Documented events propagation strategy;

- Removed ulong/uint usage, per Andrew's comments;

- Glauber Costa didn't like too short and non-descriptive mpc_ naming,
  suggesting mempressure_ instead. And Andrew suggested mpcg_. I went with
  something completely different: vmpressure_/vmpr_. :) Also renamed
  xxx2yyy() to xxx_to_yyy() per Glauber Costa suggestion.

- _OOM level renamed to _CRITICAL. Andrew wanted _HIGH affix, but by using
  'critical' I want to denote that this level is the last one (e.g. we
  might want to introduce _HIGH some time later, if we can find a good
  definition for it);

- This patch does not include shrinker interface. In the last series I
  showed that implementing shrinker is possible, and that it actually can
  be useful. At the same time I explained that shrinker is not a
  substitution for the pressure levels. So, once we settle on the simple
  thing, I might continue my shrinker efforts (which, btw, QEMU guys found
  interesting and potentionally useful).

  For those who

Re: [PATCH 3/4] uprobes/tracing: Kill the pointless local_save_flags/preempt_count calls

2013-04-02 Thread Anton Arapov

On Fri, Mar 29, 2013 at 07:15:45PM +0100, Oleg Nesterov wrote:
> uprobe_trace_func() is never called with irqs or preemption
> disabled, no need to ask preempt_count() or local_save_flags().
> 
> Signed-off-by: Oleg Nesterov 
> ---
>  kernel/trace/trace_uprobe.c |   10 +++---
>  1 files changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 8e00901..43d258d 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -492,17 +492,13 @@ static int uprobe_trace_func(struct trace_uprobe *tu, 
> struct pt_regs *regs)
>   struct ring_buffer_event *event;
>   struct ring_buffer *buffer;
>   u8 *data;
> - int size, i, pc;
> - unsigned long irq_flags;
> + int size, i;
>   struct ftrace_event_call *call = &tu->call;
>  
> - local_save_flags(irq_flags);
> - pc = preempt_count();
> -
>   size = sizeof(*entry) + tu->size;
>  
>   event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
> -   size, irq_flags, pc);
> +   size, 0, 0);
>   if (!event)
>   return 0;
>  
> @@ -513,7 +509,7 @@ static int uprobe_trace_func(struct trace_uprobe *tu, 
> struct pt_regs *regs)
>   call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
>  
>   if (!filter_current_check_discard(buffer, call, entry, event))
> - trace_buffer_unlock_commit(buffer, event, irq_flags, pc);
> + trace_buffer_unlock_commit(buffer, event, 0, 0);
>  
>   return 0;
>  }
> -- 
> 1.5.5.1
> 
Acked-by: Anton Arapov 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/4] uprobes/tracing: Kill the pointless seq_print_ip_sym() call

2013-04-02 Thread Anton Arapov

On Fri, Mar 29, 2013 at 07:15:43PM +0100, Oleg Nesterov wrote:
> seq_print_ip_sym(ip) in print_uprobe_event() is pointless,
> kallsyms_lookup(ip) can not resolve a user-space address.
> 
> Signed-off-by: Oleg Nesterov 
> ---
>  kernel/trace/trace_uprobe.c |8 +---
>  1 files changed, 1 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index af5b773..8e00901 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -531,13 +531,7 @@ print_uprobe_event(struct trace_iterator *iter, int 
> flags, struct trace_event *e
>   field = (struct uprobe_trace_entry_head *)iter->ent;
>   tu = container_of(event, struct trace_uprobe, call.event);
>  
> - if (!trace_seq_printf(s, "%s: (", tu->call.name))
> - goto partial;
> -
> - if (!seq_print_ip_sym(s, field->ip, flags | TRACE_ITER_SYM_OFFSET))
> - goto partial;
> -
> - if (!trace_seq_puts(s, ")"))
> + if (!trace_seq_printf(s, "%s: (0x%lx)", tu->call.name, field->ip))
>   goto partial;
>  
>   data = (u8 *)&field[1];
> -- 
> 1.5.5.1

Acked-by: Anton Arapov  
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/4] uprobes/tracing: Kill the pointless task_pt_regs() calls

2013-04-02 Thread Anton Arapov

On Fri, Mar 29, 2013 at 07:15:40PM +0100, Oleg Nesterov wrote:
> uprobe_trace_func() and uprobe_perf_func() do not need task_pt_regs(),
> we already have "struct pt_regs *regs".
> 
> Signed-off-by: Oleg Nesterov 
> ---
>  kernel/trace/trace_uprobe.c |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 8dad2a9..af5b773 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -507,7 +507,7 @@ static int uprobe_trace_func(struct trace_uprobe *tu, 
> struct pt_regs *regs)
>   return 0;
>  
>   entry = ring_buffer_event_data(event);
> - entry->ip = instruction_pointer(task_pt_regs(current));
> + entry->ip = instruction_pointer(regs);
>   data = (u8 *)&entry[1];
>   for (i = 0; i < tu->nr_args; i++)
>   call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
> @@ -777,7 +777,7 @@ static int uprobe_perf_func(struct trace_uprobe *tu, 
> struct pt_regs *regs)
>   if (!entry)
>   goto out;
>  
> - entry->ip = instruction_pointer(task_pt_regs(current));
> + entry->ip = instruction_pointer(regs);
>   data = (u8 *)&entry[1];
>   for (i = 0; i < tu->nr_args; i++)
>   call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
> -- 
> 1.5.5.1
>
Acked-by: Anton Arapov 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] uprobes/tracing: generalize struct uprobe_trace_entry_head

2013-04-02 Thread Anton Arapov

;
> - struct trace_uprobe *tu = (struct trace_uprobe *)event_call->data;
> + struct trace_uprobe *tu = event_call->data;
>  
> - DEFINE_FIELD(unsigned long, ip, FIELD_STRING_IP, 0);
> + DEFINE_FIELD(unsigned long, vaddr[0], FIELD_STRING_IP, 0);
> + size = SIZEOF_TRACE_ENTRY(1);
>   /* Set argument names as fields */
>   for (i = 0; i < tu->nr_args; i++) {
>   ret = trace_define_field(event_call, tu->args[i].type->fmttype,
>tu->args[i].name,
> -  sizeof(field) + tu->args[i].offset,
> +  size + tu->args[i].offset,
>tu->args[i].type->size,
>tu->args[i].type->is_signed,
>FILTER_OTHER);
> @@ -748,33 +759,31 @@ static int uprobe_perf_func(struct trace_uprobe *tu, 
> struct pt_regs *regs)
>   struct ftrace_event_call *call = &tu->call;
>   struct uprobe_trace_entry_head *entry;
>   struct hlist_head *head;
> - u8 *data;
> - int size, __size, i;
> - int rctx;
> + unsigned long ip;
> + void *data;
> + int size, rctx, i;
>  
>   if (!uprobe_perf_filter(&tu->consumer, 0, current->mm))
>   return UPROBE_HANDLER_REMOVE;
>  
> - __size = sizeof(*entry) + tu->size;
> - size = ALIGN(__size + sizeof(u32), sizeof(u64));
> - size -= sizeof(u32);
> + size = SIZEOF_TRACE_ENTRY(1);
> + size = ALIGN(size + tu->size + sizeof(u32), sizeof(u64)) - sizeof(u32);
>   if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE, "profile buffer not large 
> enough"))
>   return 0;
>  
>   preempt_disable();
> -
>   entry = perf_trace_buf_prepare(size, call->event.type, regs, &rctx);
>   if (!entry)
>   goto out;
>  
> - entry->ip = instruction_pointer(regs);
> - data = (u8 *)&entry[1];
> + ip = instruction_pointer(regs);
> + entry->vaddr[0] = ip;
> + data = DATAOF_TRACE_ENTRY(entry, 1);
>   for (i = 0; i < tu->nr_args; i++)
>   call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
>  
>   head = this_cpu_ptr(call->perf_events);
> - perf_trace_buf_submit(entry, size, rctx, entry->ip, 1, regs, head, 
> NULL);
> -
> + perf_trace_buf_submit(entry, size, rctx, ip, 1, regs, head, NULL);
>   out:
>   preempt_enable();
>   return 0;
> @@ -784,7 +793,7 @@ static int uprobe_perf_func(struct trace_uprobe *tu, 
> struct pt_regs *regs)
>  static
>  int trace_uprobe_register(struct ftrace_event_call *event, enum trace_reg 
> type, void *data)
>  {
> - struct trace_uprobe *tu = (struct trace_uprobe *)event->data;
> + struct trace_uprobe *tu = event->data;
>  
>   switch (type) {
>   case TRACE_REG_REGISTER:
> -- 
> 1.5.5.1
> 
Acked-by: Anton Arapov 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/6] uprobes/tracing: uretprobes

2013-04-02 Thread Anton Arapov

On Mon, Apr 01, 2013 at 06:08:27PM +0200, Oleg Nesterov wrote:
> On 03/29, Oleg Nesterov wrote:
> >
> > uretprobes code is almost ready, we need to teach trace_uprobe.c
> > to support them.
> >
> > This looks simple, but there is a nasty complication. We do not
> > want to copy-and-paste the code like trace_kprobe.c does. Just look
> > at kprobe_event_define_fields() and kretprobe_event_define_fields().
> > They are non-trivial but almost identical. And there are a lot more
> > examples.
> >
> > So I'd like to send 4/4 for review before I'll do other changes.
> > The patch itself doesn't make sense and complicates the source code a
> > bit. But note how easy we can change, say, uprobe_event_define_fields(),
> >
> > -   DEFINE_FIELD(vaddr[0], FIELD_STRING_IP);
> > -   size = SIZEOF_TRACE_ENTRY(1);
> > +   if (!trace_probe_is_return(tu)) {
> > +   DEFINE_FIELD(vaddr[0], FIELD_STRING_IP);
> > +   size = SIZEOF_TRACE_ENTRY(1);
> > +   } else {
> > +   DEFINE_FIELD(vaddr[0], FIELD_STRING_FUNC);
> > +   DEFINE_FIELD(vaddr[1], FIELD_STRING_RETIP);
> > +   size = SIZEOF_TRACE_ENTRY(2);
> > +   }
> >
> > without copy-and-paste. The same simple change is possible for other
> > helpers playing with uprobe_trace_entry_head.
> 
> And the rest of the necessary changes to support uretprobes.
> 
> To remind, this is on top of Anton's uretprobes implementation which is
> not finished yet. This series can't be even compiled, but I think it can
> be already reviewed. All it needs to compile is the new
> uprobe_consumer->ret_handler().
> 
> Oleg.
> 
>  kernel/trace/trace_uprobe.c |  152 
> +++
>  1 files changed, 124 insertions(+), 28 deletions(-)
>

I've played with this patch-set and it works.

Tested-by: Anton Arapov 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4] memcg: Add memory.pressure_level events

2013-04-02 Thread Anton Vorontsov

With this patch userland applications that want to maintain the
interactivity/memory allocation cost can use the pressure level
notifications. The levels are defined like this:

The "low" level means that the system is reclaiming memory for new
allocations. Monitoring this reclaiming activity might be useful for
maintaining cache level. Upon notification, the program (typically
"Activity Manager") might analyze vmstat and act in advance (i.e.
prematurely shutdown unimportant services).

The "medium" level means that the system is experiencing medium memory
pressure, the system might be making swap, paging out active file caches,
etc. Upon this event applications may decide to further analyze
vmstat/zoneinfo/memcg or internal memory usage statistics and free any
resources that can be easily reconstructed or re-read from a disk.

The "critical" level means that the system is actively thrashing, it is
about to out of memory (OOM) or even the in-kernel OOM killer is on its
way to trigger. Applications should do whatever they can to help the
system. It might be too late to consult with vmstat or any other
statistics, so it's advisable to take an immediate action.

The events are propagated upward until the event is handled, i.e. the
events are not pass-through. Here is what this means: for example you have
three cgroups: A->B->C. Now you set up an event listener on cgroups A, B
and C, and suppose group C experiences some pressure. In this situation,
only group C will receive the notification, i.e. groups A and B will not
receive it. This is done to avoid excessive "broadcasting" of messages,
which disturbs the system and which is especially bad if we are low on
memory or thrashing. So, organize the cgroups wisely, or propagate the
events manually (or, ask us to implement the pass-through events,
explaining why would you need them.)

Performance wise, the memory pressure notifications feature itself is
lightweight and does not require much of bookkeeping, in contrast to the
rest of memcg features. Unfortunately, as of current memcg implementation,
pages accounting is an inseparable part and cannot be turned off. The good
news is that there are some efforts[1] to improve the situation; plus,
implementing the same, fully API-compatible[2] interface for
CONFIG_MEMCG=n case (e.g. embedded) is also a viable option, so it will
not require any changes on the userland side.

[1] http://permalink.gmane.org/gmane.linux.kernel.cgroups/6291
[2] http://lkml.org/lkml/2013/2/21/454

Signed-off-by: Anton Vorontsov 
Acked-by: Kirill A. Shutemov 
Acked-by: KAMEZAWA Hiroyuki 
---

Hi all,

Thanks for the previous reviews!

In v4 I addressed Andrew's and Kamezawa's comments:

- Documented public interfaces and tunables;

- Added documentation for eventfd interface;

- Some cosmetic changes: code rearrangements and variables renames
  (wk->work, lvl->level, etc.);

- Changed types for page counters from 'unsigned int' to 'unsigned long',
  this avoids possible overflows;

- Added Kamezawa's Ack, and rebased onto 3.9.0-rc5-next-20130402+.

In v3:

- No changes in the code, just updated commit message to incorporate the
  answer to Minchan Kim's comment regarding applicability to embedded use
  cases in the light of memcg performance overhead, plus gave some
  references to Glauber Costa's memcg work.

- Rebased onto 3.9.0-rc3-next-20130321.

Old changelogs/submissions:

  v3: http://lkml.org/lkml/2013/3/22/31
  v2: http://lkml.org/lkml/2013/2/18/577
  v1: http://lkml.org/lkml/2013/2/10/140
  mempressure cgroup: http://lkml.org/lkml/2013/1/4/55

 Documentation/cgroups/memory.txt |  70 +++-
 include/linux/vmpressure.h   |  48 +
 mm/Makefile  |   2 +-
 mm/memcontrol.c  |  29 +++
 mm/vmpressure.c  | 374 +++
 mm/vmscan.c  |   8 +
 6 files changed, 529 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/vmpressure.h
 create mode 100644 mm/vmpressure.c

diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 3aaa984..1178e23 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -40,6 +40,7 @@ Features:
  - soft limit
  - moving (recharging) account at moving a task is selectable.
  - usage threshold notifier
+ - memory pressure notifier
  - oom-killer disable knob and oom-notifier
  - Root cgroup has no limit controls.
 
@@ -65,6 +66,7 @@ Brief summary of control files.
  memory.stat# show various statistics
  memory.use_hierarchy   # set/show hierarchical account enabled
  memory.force_empty # trigger forced move charge to parent
+ memory.pressure_level  # set memory pressure notifications
  memory.swappiness  # set/show swappiness parameter of vmscan

Re: [PATCH v3] memcg: Add memory.pressure_level events

2013-04-02 Thread Anton Vorontsov

On Wed, Mar 27, 2013 at 10:53:30AM +0900, Kamezawa Hiroyuki wrote:
[...]
> >+++ b/mm/memcontrol.c
> >@@ -49,6 +49,7 @@
> >  #include 
> >  #include 
> >  #include 
> >+#include 
> >  #include 
> >  #include 
> >  #include 
> >@@ -376,6 +377,9 @@ struct mem_cgroup {
> > atomic_tnumainfo_events;
> > atomic_tnumainfo_updating;
> >  #endif
> >+
> >+struct vmpressure vmpr;
> >+
> 
> How about placing this just below "memsw_threshold" ?
> memory objects around there is not performance critical.

Yup, done.

[...]
> >+static const unsigned int vmpressure_win = SWAP_CLUSTER_MAX * 16;
> >+static const unsigned int vmpressure_level_med = 60;
> >+static const unsigned int vmpressure_level_critical = 95;
> >+static const unsigned int vmpressure_level_critical_prio = 3;
> >+
> more comments are welcomed...
> 
> I'm not against the numbers themselves but I'm not sure how these numbers are
> selected...I'm glad if you show some reasons in changelog or somewhere.

Sure, in v4 the numbers are described in the comments.

[...]
> >+static enum vmpressure_levels vmpressure_calc_level(unsigned int scanned,
> >+unsigned int reclaimed)
> >+{
> >+unsigned long scale = scanned + reclaimed;
> >+unsigned long pressure;
> >+
> >+if (!scanned)
> >+return VMPRESSURE_LOW;
> 
> Can you add comment here ? When !scanned happens ?

Yeah, the comment is needed. in v4 I added explanation for this case.

[...]
> >+mutex_lock(&vmpr->sr_lock);
> >+vmpr->scanned += scanned;
> >+vmpr->reclaimed += reclaimed;
> >+mutex_unlock(&vmpr->sr_lock);
> >+
> >+if (scanned < vmpressure_win || work_pending(&vmpr->work))
> >+return;
> >+schedule_work(&vmpr->work);
> >+}
> 
> I'm not sure how other guys thinks butcould you place the definition
> of work_fn above calling it ? you call vmpressure_wk_fn(), right ?

Yup. OK, I rearranged the code a bit.

[...]
> > do {
> >+vmpressure_prio(sc->gfp_mask, sc->target_mem_cgroup,
> >+sc->priority);
> > sc->nr_scanned = 0;
> > aborted_reclaim = shrink_zones(zonelist, sc);
> >
> >
> 
> When you answers Andrew's comment and fix problems, feel free to add
> 
> Acked-by: KAMEZAWA Hiroyuki 

Thanks a lot for the reviews, Kamezawa!

Anton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] memcg: Add memory.pressure_level events

2013-04-02 Thread Anton Vorontsov

 +
> > +   /*
> > +* OK, the prio is below the threshold, updating vmpressure
> 
> But you never told me what that threshold is for!  And I have no means
> of working out why you chose "3", nor the effects of altering it, etc.

True. This is explained it in a comment now.

[...]
> > +int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
> > + struct eventfd_ctx *eventfd, const char *args)
> 
> Document the interface, please.

Done.

> > +{
> > +   struct vmpressure *vmpr = cg_to_vmpr(cg);
> > +   struct vmpressure_event *ev;
> > +   int lvl;
> 
> These abbreviations are rather unlinuxy.  wk->work, vmpr->vmpressure,
> lvl->level, etc.

Yeah, I agree. Although, 'vmpressure' as a function-scope variable is
kinda too long, the code becomes really hard to read. But in memcg struct
and global namespace I now use the full 'vmpressure' name.

> > +   for (lvl = 0; lvl < VMPRESSURE_NUM_LEVELS; lvl++) {
> > +   if (!strcmp(vmpressure_str_levels[lvl], args))
> > +   break;
> > +   }
> > +
> > +   if (lvl >= VMPRESSURE_NUM_LEVELS)
> > +   return -EINVAL;
> > +
> > +   ev = kzalloc(sizeof(*ev), GFP_KERNEL);
> > +   if (!ev)
> > +   return -ENOMEM;
> > +
> > +   ev->efd = eventfd;
> > +   ev->level = lvl;
> > +
> > +   mutex_lock(&vmpr->events_lock);
> > +   list_add(&ev->node, &vmpr->events);
> 
> What's the upper bound on the length of this list?

As of now, it is controlled by the cgroup core, so I would say the number
of opened FDs, and if that is a problem, it should be fixed for everyone.
The good thing is that the list is per-cgroup, it is not global.


Thanks for the review, Andrew!

Anton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 3/9] uretprobes/x86: Hijack return address

2013-04-03 Thread Anton Arapov

Hijack the return address and replace it with a trampoline address.

v1 changes:
* use force_sig_info()
* rework and simplify logic

RFCv5 changes:
* change the fail return code, because orig_ret_vaddr=0 is possible
* style fixup
RFCv2 changes:
* remove ->doomed flag, kill task immediately

Signed-off-by: Anton Arapov 
---
 arch/x86/include/asm/uprobes.h |  1 +
 arch/x86/kernel/uprobes.c  | 29 +
 2 files changed, 30 insertions(+)

diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
index 8ff8be7..6e51979 100644
--- a/arch/x86/include/asm/uprobes.h
+++ b/arch/x86/include/asm/uprobes.h
@@ -55,4 +55,5 @@ extern int  arch_uprobe_post_xol(struct arch_uprobe *aup, 
struct pt_regs *regs);
 extern bool arch_uprobe_xol_was_trapped(struct task_struct *tsk);
 extern int  arch_uprobe_exception_notify(struct notifier_block *self, unsigned 
long val, void *data);
 extern void arch_uprobe_abort_xol(struct arch_uprobe *aup, struct pt_regs 
*regs);
+extern unsigned long arch_uretprobe_hijack_return_addr(unsigned long 
trampoline_vaddr, struct pt_regs *regs);
 #endif /* _ASM_UPROBES_H */
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 0ba4cfb..2ed8459 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -697,3 +697,32 @@ bool arch_uprobe_skip_sstep(struct arch_uprobe *auprobe, 
struct pt_regs *regs)
send_sig(SIGTRAP, current, 0);
return ret;
 }
+
+unsigned long
+arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct 
pt_regs *regs)
+{
+   int rasize, ncopied;
+   unsigned long orig_ret_vaddr = 0; /* clear high bits for 32-bit apps */
+
+   rasize = is_ia32_task() ? 4 : 8;
+   ncopied = copy_from_user(&orig_ret_vaddr, (void __user *)regs->sp, 
rasize);
+   if (unlikely(ncopied))
+   return -1;
+
+   /* check whether address has been already hijacked */
+   if (orig_ret_vaddr == trampoline_vaddr)
+   return orig_ret_vaddr;
+
+   ncopied = copy_to_user((void __user *)regs->sp, &trampoline_vaddr, 
rasize);
+   if (likely(!ncopied))
+   return orig_ret_vaddr;
+
+   if (ncopied != rasize) {
+   pr_err("uprobe: return address clobbered: pid=%d, %%sp=%#lx, "
+   "%%ip=%#lx\n", current->pid, regs->sp, regs->ip);
+
+   force_sig_info(SIGSEGV, SEND_SIG_FORCED, current);
+   }
+
+   return -1;
+}
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 5/9] uretprobes: Return probe entry, prepare_uretprobe()

2013-04-03 Thread Anton Arapov

When a uprobe with return probe consumer is hit, prepare_uretprobe()
function is invoked. It creates return_instance, hijacks return address
and replaces it with the trampoline.

* Return instances are kept as stack per uprobed task.
* Return instance is chained, when the original return address is
  trampoline's page vaddr (e.g. recursive call of the probed function).

v1 changes:
* preserve address of the breakpoint in return_instance.
* don't forget NULLify return_instances on free_utask.
* simplify prepare_uretprobe().

RFCv6 changes:
* rework prepare_uretprobe() logic in order to make further unwinding
  in handler_uretprobe() simplier.
* introduce the 'dirty' field.

RFCv5 changes:
* switch from hlist to simply linked list for tracking ->*return_uprobes.
* preallocate first slot xol_area for return probes, see xol_get_area()
  changes.
* add get_trampoline_vaddr() helper, to emphasize area->vaddr overload.

RFCv4 changes:
* get rid of area->rp_trampoline_vaddr as it always the same as ->vaddr.
* cleanup ->return_uprobes list in uprobe_free_utask(), because the
  task can exit from inside the ret-probe'd function(s).
* in find_active_uprobe(): Once we inserted "int3" we must ensure that
  handle_swbp() will be called even if this uprobe goes away. We have
  the reference but it only protects uprobe itself, it can't protect
  agains delete_uprobe().
  IOW, we must ensure that uprobe_pre_sstep_notifier() can't return 0.

RFCv3 changes:
* protected uprobe with refcounter. See atomic_inc in prepare_uretprobe()
  and put_uprobe() in a following patch in handle_uretprobe().

RFCv2 changes:
* get rid of ->return_consumers member from struct uprobe, introduce
  ret_handler() in consumer.

Signed-off-by: Anton Arapov 
---
 include/linux/uprobes.h |  1 +
 kernel/events/uprobes.c | 92 -
 2 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 4042cad..5f8960e 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -71,6 +71,7 @@ struct uprobe_task {
enum uprobe_task_state  state;
struct arch_uprobe_task autask;
 
+   struct return_instance  *return_instances;
struct uprobe   *active_uprobe;
 
unsigned long   xol_vaddr;
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index d3c8201..08ecfff 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -75,6 +75,15 @@ struct uprobe {
struct arch_uprobe  arch;
 };
 
+struct return_instance {
+   struct uprobe   *uprobe;
+   unsigned long   func;
+   unsigned long   orig_ret_vaddr; /* original return address */
+   boolchained;/* true, if instance is nested 
*/
+
+   struct return_instance  *next;  /* keep as stack */
+};
+
 /*
  * valid_vma: Verify if the specified vma is an executable vma
  * Relax restrictions while unregistering: vm_flags might have
@@ -1294,6 +1303,7 @@ unsigned long __weak uprobe_get_swbp_addr(struct pt_regs 
*regs)
 void uprobe_free_utask(struct task_struct *t)
 {
struct uprobe_task *utask = t->utask;
+   struct return_instance *ri, *tmp;
 
if (!utask)
return;
@@ -1301,6 +1311,15 @@ void uprobe_free_utask(struct task_struct *t)
if (utask->active_uprobe)
put_uprobe(utask->active_uprobe);
 
+   ri = utask->return_instances;
+   while (ri) {
+   tmp = ri;
+   ri = ri->next;
+
+   put_uprobe(tmp->uprobe);
+   kfree(tmp);
+   }
+
xol_free_insn_slot(t);
kfree(utask);
t->utask = NULL;
@@ -1348,6 +1367,65 @@ static unsigned long get_trampoline_vaddr(void)
return trampoline_vaddr;
 }
 
+static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs)
+{
+   struct return_instance *ri;
+   struct uprobe_task *utask;
+   unsigned long orig_ret_vaddr, trampoline_vaddr;
+   bool chained = false;
+
+   if (!get_xol_area())
+   return;
+
+   utask = get_utask();
+   if (!utask)
+   return;
+
+   ri = kzalloc(sizeof(struct return_instance), GFP_KERNEL);
+   if (!ri)
+   goto fail;
+
+   trampoline_vaddr = get_trampoline_vaddr();
+   orig_ret_vaddr = arch_uretprobe_hijack_return_addr(trampoline_vaddr, 
regs);
+   if (orig_ret_vaddr == -1)
+   goto fail;
+
+   /*
+* We don't want to keep trampoline address in stack, rather keep the
+* original return address of first caller thru all the consequent
+* instances. This also makes breakpoint unwrapping easier.
+*/
+   if (orig_ret_vaddr == trampoline_vaddr) {
+   if (!utask->return_insta

[PATCH v1 4/9] uretprobes/ppc: Hijack return address

2013-04-03 Thread Anton Arapov

Hijack the return address and replace it with a trampoline address.
PowerPC implementation.

Signed-off-by: Anton Arapov 
---
 arch/powerpc/include/asm/uprobes.h |  1 +
 arch/powerpc/kernel/uprobes.c  | 13 +
 2 files changed, 14 insertions(+)

diff --git a/arch/powerpc/include/asm/uprobes.h 
b/arch/powerpc/include/asm/uprobes.h
index b532060..2301602 100644
--- a/arch/powerpc/include/asm/uprobes.h
+++ b/arch/powerpc/include/asm/uprobes.h
@@ -51,4 +51,5 @@ extern int  arch_uprobe_post_xol(struct arch_uprobe *aup, 
struct pt_regs *regs);
 extern bool arch_uprobe_xol_was_trapped(struct task_struct *tsk);
 extern int  arch_uprobe_exception_notify(struct notifier_block *self, unsigned 
long val, void *data);
 extern void arch_uprobe_abort_xol(struct arch_uprobe *aup, struct pt_regs 
*regs);
+extern unsigned long arch_uretprobe_hijack_return_addr(unsigned long 
trampoline_vaddr, struct pt_regs *regs);
 #endif /* _ASM_UPROBES_H */
diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index bc77834..567b975 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -188,3 +188,16 @@ bool arch_uprobe_skip_sstep(struct arch_uprobe *auprobe, 
struct pt_regs *regs)
 
return false;
 }
+
+unsigned long
+arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct 
pt_regs *regs)
+{
+   unsigned long orig_ret_vaddr;
+
+   orig_ret_vaddr = regs->link;
+
+   /* Replace the return addr with trampoline addr */
+   regs->link = trampoline_vaddr;
+
+   return orig_ret_vaddr;
+}
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 8/9] uretprobes: Remove -ENOSYS as return probes implemented

2013-04-03 Thread Anton Arapov

Enclose return probes implementation.

Signed-off-by: Anton Arapov 
---
 kernel/events/uprobes.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 489f5e3..9af52f7 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -828,10 +828,6 @@ int uprobe_register(struct inode *inode, loff_t offset, 
struct uprobe_consumer *
if (!uc->handler && !uc->ret_handler)
return -EINVAL;
 
-   /* TODO: Implement return probes */
-   if (uc->ret_handler)
-   return -ENOSYS;
-
/* Racy, just to catch the obvious mistakes */
if (offset > i_size_read(inode))
return -EINVAL;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 7/9] uretprobes: Limit the depth of return probe nestedness

2013-04-03 Thread Anton Arapov

Unlike the kretprobes we can't trust userspace, thus must have
protection from user space attacks. User-space have  "unlimited"
stack, and this patch limits the return probes nestedness as a
simple remedy for it.

Note that this implementation leaks return_instance on siglongjmp
until exit()/exec().

The intention is to have KISS and bare minimum solution for the
initial implementation in order to not complicate the uretprobes
code.

In the future we may come up with more sophisticated solution that
remove this depth limitation. It is not easy task and lays beyond
this patchset.

Signed-off-by: Anton Arapov 
---
 include/linux/uprobes.h |  3 +++
 kernel/events/uprobes.c | 11 +++
 2 files changed, 14 insertions(+)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 5f8960e..d7bcf10 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -38,6 +38,8 @@ struct inode;
 #define UPROBE_HANDLER_REMOVE  1
 #define UPROBE_HANDLER_MASK1
 
+#define MAX_URETPROBE_DEPTH64
+
 enum uprobe_filter_ctx {
UPROBE_FILTER_REGISTER,
UPROBE_FILTER_UNREGISTER,
@@ -72,6 +74,7 @@ struct uprobe_task {
struct arch_uprobe_task autask;
 
struct return_instance  *return_instances;
+   unsigned intdepth;
struct uprobe   *active_uprobe;
 
unsigned long   xol_vaddr;
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index d129c1d..489f5e3 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1381,6 +1381,13 @@ static void prepare_uretprobe(struct uprobe *uprobe, 
struct pt_regs *regs)
if (!utask)
return;
 
+   if (utask->depth >= MAX_URETPROBE_DEPTH) {
+   printk_ratelimited(KERN_INFO "uprobe: omit uretprobe due to"
+   " nestedness limit pid/tgid=%d/%d\n",
+   current->pid, current->tgid);
+   return;
+   }
+
ri = kzalloc(sizeof(struct return_instance), GFP_KERNEL);
if (!ri)
goto fail;
@@ -1416,6 +1423,8 @@ static void prepare_uretprobe(struct uprobe *uprobe, 
struct pt_regs *regs)
ri->orig_ret_vaddr = orig_ret_vaddr;
ri->chained = chained;
 
+   utask->depth++;
+
/* add instance to the stack */
ri->next = utask->return_instances;
utask->return_instances = ri;
@@ -1652,6 +1661,8 @@ static bool handler_uretprobe(struct pt_regs *regs)
if (!chained)
break;
 
+   utask->depth--;
+
BUG_ON(!ri);
}
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 6/9] uretprobes: Return probe exit, invoke handlers

2013-04-03 Thread Anton Arapov

Uretprobe handlers are invoked when the trampoline is hit, on completion the
trampoline is replaced with the saved return address and the uretprobe instance
deleted.

v1 changes:
* pass bp_vaddr to ret_handler()
* simplify handle_uretprobe()

RFCv6 changes:
* rework handle_uretprobe()

RFCv5 changes:
* switch to simply linked list ->return_uprobes
* rework handle_uretprobe()

RFCv4 changes:
* check, whether utask is not NULL in handle_uretprobe()
* get rid of area->rp_trampoline_vaddr
* minor handle_uretprobe() fixups

RFCv3 changes:
* protected uprobe with refcounter. See put_uprobe() in handle_uretprobe()
  that reflects increment in prepare_uretprobe()

RFCv2 changes:
* get rid of ->return_consumers member from struct uprobe, introduce
  ret_handler() in consumer instead

Signed-off-by: Anton Arapov 
---
 kernel/events/uprobes.c | 60 -
 1 file changed, 59 insertions(+), 1 deletion(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 08ecfff..d129c1d 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1609,6 +1609,57 @@ static void handler_chain(struct uprobe *uprobe, struct 
pt_regs *regs)
up_read(&uprobe->register_rwsem);
 }
 
+static void
+handler_uretprobe_chain(struct return_instance *ri, struct pt_regs *regs)
+{
+   struct uprobe *uprobe = ri->uprobe;
+   struct uprobe_consumer *uc;
+
+   down_read(&uprobe->register_rwsem);
+   for (uc = uprobe->consumers; uc; uc = uc->next) {
+   if (uc->ret_handler)
+   uc->ret_handler(uc, ri->func, regs);
+   }
+   up_read(&uprobe->register_rwsem);
+}
+
+static bool handler_uretprobe(struct pt_regs *regs)
+{
+   struct uprobe_task *utask;
+   struct return_instance *ri, *tmp;
+   bool chained;
+
+   utask = current->utask;
+   if (!utask)
+   return false;
+
+   ri = utask->return_instances;
+   if (!ri)
+   return false;
+
+   instruction_pointer_set(regs, ri->orig_ret_vaddr);
+
+   for (;;) {
+   handler_uretprobe_chain(ri, regs);
+
+   chained = ri->chained;
+   put_uprobe(ri->uprobe);
+
+   tmp = ri;
+   ri = ri->next;
+   kfree(tmp);
+
+   if (!chained)
+   break;
+
+   BUG_ON(!ri);
+   }
+
+   utask->return_instances = ri;
+
+   return true;
+}
+
 /*
  * Run handler and ask thread to singlestep.
  * Ensure all non-fatal signals cannot interrupt thread while it singlesteps.
@@ -1620,8 +1671,15 @@ static void handle_swbp(struct pt_regs *regs)
int uninitialized_var(is_swbp);
 
bp_vaddr = uprobe_get_swbp_addr(regs);
-   uprobe = find_active_uprobe(bp_vaddr, &is_swbp);
+   if (bp_vaddr == get_trampoline_vaddr()) {
+   if (handler_uretprobe(regs))
+   return;
 
+   pr_warn("uprobe: unable to handle uretprobe pid/tgid=%d/%d\n",
+   current->pid, current->tgid);
+   }
+
+   uprobe = find_active_uprobe(bp_vaddr, &is_swbp);
if (!uprobe) {
if (is_swbp > 0) {
/* No matching uprobe; signal SIGTRAP. */
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 0/9] uretprobes: Return uprobes implementation

2013-04-03 Thread Anton Arapov

Hello All!

Uretprobes' core implementation. Enables a function's return probes in uprobe-
based event tracing.

Patchset introduce additional handler (ret_handler) in uprobe consumer that
defines uretprobe.

There is a regular uprobe with return probe handler behind every uretprobe.
Once hit the uprobe that has ret_handler set, we hijack the return address of
the probed function and replace it with the address of trampoline. Trampoline
is a preallocated page in probed task's xol area that filled with breakpoint
opcode. In turn, when the return breakpoint is hit, we invoke the ret_handler.

The patchset shouldn't be difficult to read and hopefully the comments to
commits will help. Please, review.

patchset in git:
  http://github.com/arapov/linux-aa/commits/uretprobes_v1

previous versions:
  v0: https://lkml.org/lkml/2013/3/22/218

RFC reviews:
  RFCv4: https://lkml.org/lkml/2013/3/4/246
  RFCv3: https://lkml.org/lkml/2013/2/28/148
  RFCv2: https://lkml.org/lkml/2013/1/9/157
  RFCv1: https://lkml.org/lkml/2012/12/21/133

thanks,
Anton.

Anton Arapov (9):
  uretprobes: Introduce uprobe_consumer->ret_handler()
  uretprobes: Reserve the first slot in xol_vma for trampoline
  uretprobes/x86: Hijack return address
  uretprobes/ppc: Hijack return address
  uretprobes: Return probe entry, prepare_uretprobe()
  uretprobes: Return probe exit, invoke handlers
  uretprobes: Limit the depth of return probe nestedness
  uretprobes: Remove -ENOSYS as return probes implemented
  uretprobes: Documentation update
 
 Documentation/trace/uprobetracer.txt | 126 +-
 arch/powerpc/include/asm/uprobes.h   |   1 +
 arch/powerpc/kernel/uprobes.c|  13 +++
 arch/x86/include/asm/uprobes.h   |   1 +
 arch/x86/kernel/uprobes.c|  29 +
 include/linux/uprobes.h  |   7 ++
 kernel/events/uprobes.c  | 202 +--
 7 files changed, 320 insertions(+), 59 deletions(-)

-- 
1.8.1.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 2/9] uretprobes: Reserve the first slot in xol_vma for trampoline

2013-04-03 Thread Anton Arapov

Allocate trampoline page, as the very first one in uprobed
task xol area, and fill it with breakpoint opcode.

Also introduce get_trampoline_vaddr() helper, to wrap the
trampoline address extraction from area->vaddr. That removes
confusion and eases the debug experience in case ->vaddr
notion will be changed.

v1 changes:
* rework get_trampoline_vaddr() helper.
* init xol_area->slot_count.

Signed-off-by: Anton Arapov 
---
 kernel/events/uprobes.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 27c964b..d3c8201 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1109,6 +1109,7 @@ static struct xol_area *get_xol_area(void)
 {
struct mm_struct *mm = current->mm;
struct xol_area *area;
+   uprobe_opcode_t insn = UPROBE_SWBP_INSN;
 
area = mm->uprobes_state.xol_area;
if (area)
@@ -1126,7 +1127,12 @@ static struct xol_area *get_xol_area(void)
if (!area->page)
goto free_bitmap;
 
+   /* allocate first slot of task's xol_area for the return probes */
+   set_bit(0, area->bitmap);
+   copy_to_page(area->page, 0, &insn, UPROBE_SWBP_INSN_SIZE);
+   atomic_set(&area->slot_count, 1);
init_waitqueue_head(&area->wq);
+
if (!xol_add_vma(area))
return area;
 
@@ -1323,6 +1329,25 @@ static struct uprobe_task *get_utask(void)
return current->utask;
 }
 
+/*
+ * Current area->vaddr notion assume the trampoline address is always
+ * equal area->vaddr.
+ *
+ * Returns -1 in case the xol_area is not allocated.
+ */
+static unsigned long get_trampoline_vaddr(void)
+{
+   struct xol_area *area;
+   unsigned long trampoline_vaddr = -1;
+
+   area = current->mm->uprobes_state.xol_area;
+   smp_read_barrier_depends();
+   if (area)
+   trampoline_vaddr = area->vaddr;
+
+   return trampoline_vaddr;
+}
+
 /* Prepare to single-step probed instruction out of line. */
 static int
 pre_ssout(struct uprobe *uprobe, struct pt_regs *regs, unsigned long bp_vaddr)
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 1/9] uretprobes: Introduce uprobe_consumer->ret_handler()

2013-04-03 Thread Anton Arapov

Enclose return probes implementation, introduce ->ret_handler() and update
existing code to rely on ->handler() *and* ->ret_handler() for uprobe and
uretprobe respectively.

v1 changes:
* add bp_vaddr argument to ->ret_handler()

RFCv5 changes:
* don't remove uprobe in case there are no uprobe consumer(handler),
  see handler_chain() changes.

RFCv3 changes: (the patch is introduced in v3)
* check whether at least one of the consumer's handlers were set.
* a 'TODO' cap that will be removed once return probes be implemented.
* introduce ->ret_handler().

Signed-off-by: Anton Arapov 
---
 include/linux/uprobes.h |  3 +++
 kernel/events/uprobes.c | 17 ++---
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 02b83db..4042cad 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -46,6 +46,9 @@ enum uprobe_filter_ctx {
 
 struct uprobe_consumer {
int (*handler)(struct uprobe_consumer *self, struct pt_regs *regs);
+   int (*ret_handler)(struct uprobe_consumer *self,
+   unsigned long func,
+   struct pt_regs *regs);
bool (*filter)(struct uprobe_consumer *self,
enum uprobe_filter_ctx ctx,
struct mm_struct *mm);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 21d8a65..27c964b 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -815,6 +815,14 @@ int uprobe_register(struct inode *inode, loff_t offset, 
struct uprobe_consumer *
struct uprobe *uprobe;
int ret;
 
+   /* Uprobe must have at least one set consumer */
+   if (!uc->handler && !uc->ret_handler)
+   return -EINVAL;
+
+   /* TODO: Implement return probes */
+   if (uc->ret_handler)
+   return -ENOSYS;
+
/* Racy, just to catch the obvious mistakes */
if (offset > i_size_read(inode))
return -EINVAL;
@@ -1473,10 +1481,13 @@ static void handler_chain(struct uprobe *uprobe, struct 
pt_regs *regs)
 
down_read(&uprobe->register_rwsem);
for (uc = uprobe->consumers; uc; uc = uc->next) {
-   int rc = uc->handler(uc, regs);
+   int rc = 0;
 
-   WARN(rc & ~UPROBE_HANDLER_MASK,
-   "bad rc=0x%x from %pf()\n", rc, uc->handler);
+   if (uc->handler) {
+   rc = uc->handler(uc, regs);
+   WARN(rc & ~UPROBE_HANDLER_MASK,
+   "bad rc=0x%x from %pf()\n", rc, uc->handler);
+   }
remove &= rc;
}
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 9/9] uretprobes: Documentation update

2013-04-03 Thread Anton Arapov

add the uretprobe syntax and update an example

Signed-off-by: Anton Arapov 
---
 Documentation/trace/uprobetracer.txt | 114 ---
 1 file changed, 67 insertions(+), 47 deletions(-)

diff --git a/Documentation/trace/uprobetracer.txt 
b/Documentation/trace/uprobetracer.txt
index 24ce682..d9c3e68 100644
--- a/Documentation/trace/uprobetracer.txt
+++ b/Documentation/trace/uprobetracer.txt
@@ -1,6 +1,8 @@
-   Uprobe-tracer: Uprobe-based Event Tracing
-   =
- Documentation written by Srikar Dronamraju
+Uprobe-tracer: Uprobe-based Event Tracing
+=
+
+   Documentation written by Srikar Dronamraju
+
 
 Overview
 
@@ -13,78 +15,94 @@ current_tracer. Instead of that, add probe points via
 /sys/kernel/debug/tracing/events/uprobes//enabled.
 
 However unlike kprobe-event tracer, the uprobe event interface expects the
-user to calculate the offset of the probepoint in the object
+user to calculate the offset of the probepoint in the object.
 
 Synopsis of uprobe_tracer
 -
-  p[:[GRP/]EVENT] PATH:SYMBOL[+offs] [FETCHARGS]   : Set a probe
+  p[:[GRP/]EVENT] PATH:SYMBOL[+offs] [FETCHARGS] : Set a uprobe
+  r[:[GRP/]EVENT] PATH:SYMBOL[+offs] [FETCHARGS] : Set a return uprobe 
(uretprobe)
+  -:[GRP/]EVENT  : Clear uprobe or uretprobe 
event
 
- GRP   : Group name. If omitted, use "uprobes" for it.
- EVENT : Event name. If omitted, the event name is generated
- based on SYMBOL+offs.
- PATH  : path to an executable or a library.
- SYMBOL[+offs] : Symbol+offset where the probe is inserted.
+  GRP   : Group name. If omitted, "uprobes" is the default value.
+  EVENT : Event name. If omitted, the event name is generated based
+  on SYMBOL+offs.
+  PATH  : Path to an executable or a library.
+  SYMBOL[+offs] : Symbol+offset where the probe is inserted.
 
- FETCHARGS : Arguments. Each probe can have up to 128 args.
-  %REG : Fetch register REG
+  FETCHARGS : Arguments. Each probe can have up to 128 args.
+   %REG : Fetch register REG
 
 Event Profiling
 ---
- You can check the total number of probe hits and probe miss-hits via
+You can check the total number of probe hits and probe miss-hits via
 /sys/kernel/debug/tracing/uprobe_profile.
- The first column is event name, the second is the number of probe hits,
+The first column is event name, the second is the number of probe hits,
 the third is the number of probe miss-hits.
 
 Usage examples
 --
-To add a probe as a new event, write a new definition to uprobe_events
-as below.
+ * Add a probe as a new uprobe event, write a new definition to uprobe_events
+as below: (sets a uprobe at an offset of 0x4245c0 in the executable /bin/bash)
+
+echo 'p: /bin/bash:0x4245c0' > /sys/kernel/debug/tracing/uprobe_events
+
+ * Add a probe as a new uretprobe event:
+
+echo 'r: /bin/bash:0x4245c0' > /sys/kernel/debug/tracing/uprobe_events
+
+ * Unset registered event:
 
-  echo 'p: /bin/bash:0x4245c0' > /sys/kernel/debug/tracing/uprobe_events
+echo '-:bash_0x4245c0' >> /sys/kernel/debug/tracing/uprobe_events
 
- This sets a uprobe at an offset of 0x4245c0 in the executable /bin/bash
+ * Print out the events that are registered:
 
-  echo > /sys/kernel/debug/tracing/uprobe_events
+cat /sys/kernel/debug/tracing/uprobe_events
 
- This clears all probe points.
+ * Clear all events:
 
-The following example shows how to dump the instruction pointer and %ax
-a register at the probed text address.  Here we are trying to probe
-function zfree in /bin/zsh
+echo > /sys/kernel/debug/tracing/uprobe_events
+
+Following example shows how to dump the instruction pointer and %ax register
+at the probed text address. Probe zfree function in /bin/zsh:
 
 # cd /sys/kernel/debug/tracing/
-# cat /proc/`pgrep  zsh`/maps | grep /bin/zsh | grep r-xp
+# cat /proc/`pgrep zsh`/maps | grep /bin/zsh | grep r-xp
 0040-0048a000 r-xp  08:03 130904 /bin/zsh
 # objdump -T /bin/zsh | grep -w zfree
 00446420 gDF .text  0012  Basezfree
 
-0x46420 is the offset of zfree in object /bin/zsh that is loaded at
-0x0040. Hence the command to probe would be :
+  0x46420 is the offset of zfree in object /bin/zsh that is loaded at
+  0x0040. Hence the command to uprobe would be:
+
+# echo 'p:zfree_entry /bin/zsh:0x46420 %ip %ax' > uprobe_events
+
+  And the same for the uretprobe would be:
 
-# echo 'p /bin/zsh:0x46420 %ip %ax' > uprobe_events
+# echo 'r:zfree_exit /bin/zsh:0x46420 %ip %ax' >> uprobe_events
 
-Please note: User has to explici

Re: [PATCH 0/3] Sync Android pstore updates

2013-04-03 Thread Anton Vorontsov

On Mon, Apr 01, 2013 at 12:16:57PM -0700, Kees Cook wrote:
> On Sun, Mar 31, 2013 at 8:22 PM, Anton Vorontsov  wrote:
> > Hi all,
> >
> > Here are a few updates from the Android dev tree. Thanks to Arve Hjønnevåg
> > for the code, and John Stultz for actually preparing commits for
> > submission.
> >
> > Unless there are objections, I'll push these updates to linux-pstore.git.
> 
> These look good; thanks. Feel free to add my ack if you want:
> 
> Acked-by: Kees Cook 

Added and applied to linux-pstore. Thanks a lot!

Anton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] uprobes: pre-filtering

2013-01-24 Thread Anton Arapov

Hello Ingo,

On Thu, Jan 24, 2013 at 11:17 PM, Ingo Molnar  wrote:
>
> * Oleg Nesterov  wrote:
>
>> Ingo, please pull from
>>
>>   git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc uprobes/core
>>
>> Mostly pre-filtering. This needs more work and perhaps more functionality.
>> In particular, perhaps dup_mmap() should remove the unwanted breakpoints.
>> And we can add more ->filter() hooks to, say, speedup uprobe_register().
>> Plus we can do some optimizations to avoid register_for_each_vma() in
>> case when we know that all mm's were previously acked/nacked.
>>
>> Srikar, the only patch you did not ack explicitely is 1fecb96d
>> "Do not allocate current->utask unnecessary", but afaics you do not
>> object.
>>
>> And the patch from Josh which exports uprobe_register/unregister for modules.
>> Christoph (cc'ed) doesn't like this change, but I disagree. Whatever you
>> think about systemtap it is the widely used tool, and uprobes can have other
>> out-of-tree users. This is like kprobes, kprobe_register() is exported but
>> it doesn't have a modular in-kernel user too. I do not see why should we
>> limit the usage of uprobes.
>>
>>
>>
>> Josh Stone (1):
>>   uprobes: Add exports for module use
>>
>> Oleg Nesterov (26):
>>   uprobes: Move __set_bit(UPROBE_SKIP_SSTEP) into alloc_uprobe()
>>   uprobes: Kill the "uprobe != NULL" check in uprobe_unregister()
>>   uprobes: Kill the pointless inode/uc checks in register/unregister
>>   uprobes: Kill uprobe_consumer->filter()
>>   uprobes: Introduce filter_chain()
>>   uprobes: _unregister() should always do register_for_each_vma(false)
>>   uprobes: _register() should always do register_for_each_vma(true)
>>   uprobes: Introduce uprobe->register_rwsem
>>   uprobes: Change filter_chain() to iterate ->consumers list
>>   uprobes: Kill UPROBE_RUN_HANDLER flag
>>   uprobes: Kill uprobe->copy_mutex
>>   uprobes: Kill uprobe_events, use RB_EMPTY_ROOT() instead
>>   uprobes: Introduce uprobe_is_active()
>>   uprobes: Kill uprobes_mutex[], separate alloc_uprobe() and 
>> __uprobe_register()
>>   uprobes: Rationalize the usage of filter_chain()
>>   uprobes: Reintroduce uprobe_consumer->filter()
>>   uprobes: Teach handler_chain() to filter out the probed task
>>   uprobes/x86: Change __skip_sstep() to actually skip the whole insn
>>   uprobes: Change handle_swbp() to expose bp_vaddr to handler_chain()
>>   uprobes: Move alloc_page() from xol_add_vma() to xol_alloc_area()
>>   uprobes: Fold xol_alloc_area() into get_xol_area()
>>   uprobes: Turn add_utask() into get_utask()
>>   uprobes: Do not play with utask in xol_get_insn_slot()
>>   uprobes: Fix utask->xol_vaddr leak in pre_ssout()
>>   uprobes: Do not allocate current->utask unnecessary
>>   uprobes: Kill the bogus IS_ERR_VALUE(xol_vaddr) check
>>
>>  arch/x86/kernel/uprobes.c   |4 +-
>>  include/linux/uprobes.h |   17 ++-
>>  kernel/events/uprobes.c |  433 
>> ++-
>>  kernel/ptrace.c |6 +
>>  kernel/trace/trace_uprobe.c |5 +-
>>  5 files changed, 243 insertions(+), 222 deletions(-)
>
> The kernel side looks good to me - but how does 'perf uprobe'
> make use of it in practice, how can I test it?

I'm not sure whether you looking into testing specific changes in this
pull, but in general, syntax is:
  perf probe -x /lib64/libc.so.6 malloc
  perf record -e probe_libc:p_malloc -aR sleep 30

hope this is what you was looking for,
Anton.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] power: make goldfish option have a dependency on goldfish

2013-02-27 Thread Anton Vorontsov

Hello Paul,

On Wed, Feb 27, 2013 at 01:27:46PM -0500, Paul Gortmaker wrote:
> Nearly all the other goldfish peripherals (mtd, keyboard, etc)
> have a dependency on the main platform's GOLDFISH Kconfig item,
> but this one got skipped, so add it.
> 
> Cc: Anton Vorontsov 
> Cc: David Woodhouse 
> Signed-off-by: Paul Gortmaker 
> ---

Our policy, IIRC, has been quite the opposite: we try to put as little
dependencies as possible, letting drivers build on all different kinds of
configs/arches/machines. Why? It helps us keep the code in a good shape.

So, unless there are any unresolable build issues with the driver, I
believe we should keep it as is.

Thanks,

Anton

>  drivers/power/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/power/Kconfig b/drivers/power/Kconfig
> index 9e00c38..d314528 100644
> --- a/drivers/power/Kconfig
> +++ b/drivers/power/Kconfig
> @@ -348,7 +348,7 @@ config AB8500_BM
>  
>  config BATTERY_GOLDFISH
>   tristate "Goldfish battery driver"
> - depends on GENERIC_HARDIRQS
> + depends on GENERIC_HARDIRQS && GOLDFISH
>   help
> Say Y to enable support for the battery and AC power in the
> Goldfish emulator.
> -- 
> 1.8.1.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] power: make goldfish option have a dependency on goldfish

2013-02-27 Thread Anton Vorontsov

On Wed, Feb 27, 2013 at 07:48:38PM -0500, Paul Gortmaker wrote:
[...]
> If you don't want to take the commit, I won't argue it any further, but
> I genuinely do hope the above logical arguments perhaps might cause
> you to  change your mind.

As a maintainer of drivers/power/, I have to keep things in a sane state,
which means, that I want to compile-test the patches that I am merging.

Testing patches takes time, and if I have to do it for all bunch of
different machines and architectures, it becomes mess and unmanageable. In
another words, it is easier for me to use as little different
setups/.configs/cross-compilers/trees/etc as possible.

For example, here is the scenario:

- someone sends me a patch for goldfish driver;
- I quickly compile it on my [underpowered] x86 laptop without need for
  cross-compilation and special configs;
- It compiles fine, produces no warnings, so I happily send out the
  'Applied, thanks!' email;

Now, with your patch:

- Someone sends me a patch for goldfish driver;
- I starting to look for my cross-compilers collections, making all the
  environment setups, setting up a new build tree, looking for the
  outdated goldfish config, the ARM thing fails to build somewhere in
  arch/arm/plat-foo and drivers/mfd/bar. I become grumpy... and, you might
  not believe me, I open and edit drivers/power/Kconfig and remove
  needless 'depends on' (that is, I actually do these kind of things when
  I feel lazy enough).

Do you agree that without the additional deps life is easier for me? :)

If so, please do help me and the rest of the maintainers: instead of
adding the unneeded deps, for consistency just remove those which are not
needed.

Thanks,

Anton

p.s. Quick answers for the rest of your arguments:

> The linux-next compile queue as it is today barely gets completed within
>  a 24h window.

So, it is large enough already, and nothing we can do about it, things are
growing. But the good news is that no human attention is needed to compile
things. That is what machines are for. :)

> --why shouldn't we restrict the maintenance overhead of CONFIG_FOO
> to people who really do care about supporting and testing and updating
> features conditional on CONFIG_FOO?  Given the size of the kernel
> today, this seems to make sense in terms of developer "load balancing".

You don't have to enable/fix everything. Some things become broken, so
just disable them for the time being. But when people stumble upon broken
drivers, the drivers are either get fixed, or removed (if no one wants to
fix it). Sweeping drivers under 'depends on' doesn't do anything good in
this regard, IMO.

If we notice that goldfish is broken for long time and nobody cares to fix
it, then it is a good time to think about its removal. Since goldfish has
nothing goldfish-specific, it can be broeken only in a generic way, so if
it is broken on x86, it is as well broken for ARM.

> If you don't want to take the commit, I won't argue it any further

I don't want to take it for a reason, but I do care whether my reasons
make sense to you.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] power: make goldfish option have a dependency on goldfish

2013-02-27 Thread Anton Vorontsov

On Wed, Feb 27, 2013 at 09:17:06PM -0500, Paul Gortmaker wrote:
[...]
> > Testing patches takes time, and if I have to do it for all bunch of
> > different machines and architectures, it becomes mess and unmanageable. In
> 
> So, you actually want my change then -- you do not want to test for
> goldfish power issues unless goldfish is selected.  This is how I see
> the situation.

No, I want to cover as much as possible code in one go. This is not 'all
or nothing' thing, plus I can't test whether the driver actually works,
but testing that it compiles is a great deal, because quite often patches
just don't compile. :-)

And part of my task is to catch the trivial errors (but not all! 99% of
them). Putting unneeded(!) 'depends on' makes my life harder.

> > another words, it is easier for me to use as little different
> > setups/.configs/cross-compilers/trees/etc as possible.
> >
> > For example, here is the scenario:
> >
> > - someone sends me a patch for goldfish driver;
> > - I quickly compile it on my [underpowered] x86 laptop without need for
> >   cross-compilation and special configs;
> > - It compiles fine, produces no warnings, so I happily send out the
> >   'Applied, thanks!' email;
> 
> I can not see how the above is at all relevant.  If someone sends you a
> patch for ARCH != x86, then your workflow will fail, regardless of what
> actually is within the patch.

Again, it is not 'all or nothing'. If the patch, supposedly, for Alpha, I
won't even bother compile-testing it. Instead, I will spend more brain
power to look into the patch, and see if it is actually sane. Doing so
will take 5 seconds more of my time for this single patch for the rare
architecture, instead of *hours* setting up cross tools and compile the
patch for this weird arch.

> As a maintainer, you need to be able to
> handle cross compiles -- you can't escape the slightly more detailed
> testing requirements we need to get from subsystem maintainers.  There
> are cross compile capable toolchains on kernel.org, and I use them
> regularly before sending out changes that can impact multiple arch.
> 
> ftp://ftp.kernel.org/pub/tools/crosstool/files/bin/
> 
> We can forgive individual contributors for not being arch aware, but
> at the subsystem maintainer level, folks should be arch aware, and
> doing regular x-compiles.

Seriously? You mean, as a maintainer, I must compile it for every weird
architecture out there? No way. I care not to break 99% users -- x86, and
maybe ARM. If MIPS guys (or automated linux-next builds) see an error, I
will fix it -- but by no chance I will spend my time compiling myself for
all the crazy architectures and configs for every patch that I apply.

But don't get me wrong, I am perfectly fine with actually testing the
patches on any weird arches -- if you are ready to lend me time on a ready
to use cloud-based automated build farm with all kinds of cross tools, I
will set up a 'testing' branch and will use it for compile-testing.

> > Now, with your patch:
> >
> > - Someone sends me a patch for goldfish driver;
> > - I starting to look for my cross-compilers collections, making all the
> >   environment setups, setting up a new build tree, looking for the
> >   outdated goldfish config, the ARM thing fails to build somewhere in
> >   arch/arm/plat-foo and drivers/mfd/bar. I become grumpy... and, you might
> >   not believe me, I open and edit drivers/power/Kconfig and remove
> 
> I am confused here.  If there are correct depends lines here in the Kconfig,
> then it _protects_ you from suffering like this.

No, if it says 'depends on FLUFFY_ARCH', it says that all that
'1%-users-fluffy-arch' must be buildable so that I could test the single
patch. Which is so often not true, the fluffy arches are quite often
broken. (Unlike x86, since we have tons of users for x86.)

So for me it is easier to remove the 'depends on FLUFFY_ARCH' and
compile-test it on x86, if that is possible (and in case of goldfish it is
surely possible).

> Yet, you want me to think
> that it adds new work, new cross compile requirements and so on.  I do
> not see how that can possibly happen.

Without additional 'depends on' I cover 99% of drivers, that is enough.
Again, I am not trying to be perfect.

> 
> >   needless 'depends on' (that is, I actually do these kind of things when
> >   I feel lazy enough).
> >
> > Do you agree that without the additional deps life is easier for me? :)
> 
> No, I do not agree.  In fact I see it as totally the opposite.  But I already
> said that I would not pursue this further, based only on differing points
> of view, and so, after sending

[RFC PATCH v3 4/6] uretprobes: return probe entry, prepare uretprobe

2013-02-28 Thread Anton Arapov

  When a uprobe with return consumer is hit, prepare_uretprobe function is
invoked. It creates return_instance, hijacks return address and replaces
it with the trampoline.

 N.B. it might be a good idea to introduce get_uprobe() to reflect
put_uprobe() later, but it is not a subject of this patchset.

v3:
 - protected uprobe with refcounter. See atomic_inc in prepare_uretprobe()
and put_uprobe() in a following patch in handle_uretprobe()
v2:
 - get rid of ->return_consumers member from struct uprobe, introduce
rp_handler() in consumer

Signed-off-by: Anton Arapov 
---
 include/linux/uprobes.h |  5 +
 kernel/events/uprobes.c | 52 +++--
 2 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index a28bdee..6aaa1ce 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -69,6 +69,10 @@ struct uprobe_task {
enum uprobe_task_state  state;
struct arch_uprobe_task autask;
 
+   /*
+* list for tracking uprobes with return consumers
+*/
+   struct hlist_head   return_uprobes;
struct uprobe   *active_uprobe;
 
unsigned long   xol_vaddr;
@@ -92,6 +96,7 @@ struct xol_area {
 * the vma go away, and we must handle that reasonably gracefully.
 */
unsigned long   vaddr;  /* Page(s) of instruction slots 
*/
+   unsigned long   rp_trampoline_vaddr; /* trampoline address */
 };
 
 struct uprobes_state {
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 69bf060..57f70cd 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -75,6 +75,12 @@ struct uprobe {
struct arch_uprobe  arch;
 };
 
+struct return_uprobe_i {
+   struct uprobe   *uprobe;
+   struct hlist_node   hlist;  /* node in list */
+   unsigned long   orig_ret_vaddr; /* original return address */
+};
+
 /*
  * valid_vma: Verify if the specified vma is an executable vma
  * Relax restrictions while unregistering: vm_flags might have
@@ -1336,11 +1342,48 @@ void uprobe_copy_process(struct task_struct *t)
  */
 static struct uprobe_task *get_utask(void)
 {
-   if (!current->utask)
+   if (!current->utask) {
current->utask = kzalloc(sizeof(struct uprobe_task), 
GFP_KERNEL);
+   if (current->utask)
+   INIT_HLIST_HEAD(¤t->utask->return_uprobes);
+   }
return current->utask;
 }
 
+static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs)
+{
+   struct return_uprobe_i *ri;
+   struct uprobe_task *utask;
+   struct xol_area *area;
+   unsigned long rp_trampoline_vaddr = 0;
+   uprobe_opcode_t insn = UPROBE_SWBP_INSN;
+
+   area = get_xol_area();
+   if (area)
+   rp_trampoline_vaddr = area->rp_trampoline_vaddr;
+   if (!rp_trampoline_vaddr) {
+   rp_trampoline_vaddr = xol_get_insn_slot(&insn);
+   if (!rp_trampoline_vaddr)
+   return;
+   }
+   area->rp_trampoline_vaddr = rp_trampoline_vaddr;
+
+   ri = kzalloc(sizeof(struct return_uprobe_i), GFP_KERNEL);
+   if (!ri)
+   return;
+
+   utask = get_utask();
+   ri->orig_ret_vaddr = 
arch_uretprobe_hijack_return_addr(rp_trampoline_vaddr, regs);
+   if (likely(ri->orig_ret_vaddr)) {
+   /* TODO: uretprobe bypass logic */
+   atomic_inc(&uprobe->ref);
+   ri->uprobe = uprobe;
+   INIT_HLIST_NODE(&ri->hlist);
+   hlist_add_head(&ri->hlist, &utask->return_uprobes);
+   } else
+   kfree(ri);
+}
+
 /* Prepare to single-step probed instruction out of line. */
 static int
 pre_ssout(struct uprobe *uprobe, struct pt_regs *regs, unsigned long bp_vaddr)
@@ -1494,12 +1537,17 @@ static struct uprobe *find_active_uprobe(unsigned long 
bp_vaddr, int *is_swbp)
 
 static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs)
 {
+   int rc = 0;
struct uprobe_consumer *uc;
int remove = UPROBE_HANDLER_REMOVE;
 
down_read(&uprobe->register_rwsem);
for (uc = uprobe->consumers; uc; uc = uc->next) {
-   int rc = uc->handler(uc, regs);
+   if (uc->handler)
+   rc = uc->handler(uc, regs);
+
+   if (uc->rp_handler)
+   prepare_uretprobe(uprobe, regs); /* put bp at return */
 
WARN(rc & ~UPROBE_HANDLER_MASK,
"bad rc=0x%x from %pf()\n", rc, uc->handler);
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v3 1/6] uretprobes: preparation patch

2013-02-28 Thread Anton Arapov

  1/6 and 6/6 patches are here to enclose return probes implementation as well
as prohibit uprobe_register() routine with no consumer set.

v3 changes: (the patch is introduced in v3)
  - check whether at least one of the consumer's handlers were set
  - a 'TODO' cap that will be removed once return probes be implemented

Signed-off-by: Anton Arapov 
---
 include/linux/uprobes.h | 1 +
 kernel/events/uprobes.c | 8 
 2 files changed, 9 insertions(+)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 02b83db..a28bdee 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -46,6 +46,7 @@ enum uprobe_filter_ctx {
 
 struct uprobe_consumer {
int (*handler)(struct uprobe_consumer *self, struct pt_regs *regs);
+   int (*rp_handler)(struct uprobe_consumer *self, struct pt_regs *regs);
bool (*filter)(struct uprobe_consumer *self,
enum uprobe_filter_ctx ctx,
struct mm_struct *mm);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index a567c8c..d904164 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -828,6 +828,14 @@ int uprobe_register(struct inode *inode, loff_t offset, 
struct uprobe_consumer *
struct uprobe *uprobe;
int ret;
 
+   /* Uprobe must have at least one set consumer */
+   if (!uc->handler && !uc->rp_handler)
+   return -EINVAL;
+
+   /* TODO: Implement return probes */
+   if (uc->rp_handler)
+   return -ENOSYS;
+
/* Racy, just to catch the obvious mistakes */
if (offset > i_size_read(inode))
return -EINVAL;
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v3 0/6] uprobes: return probe implementation

2013-02-28 Thread Anton Arapov

Hello,

  RFC v3 uretprobes implementation. I'd be grateful for review.

  These patches extending uprobes by enabling tools, such as perf(trace_event),
set a breakpoint on probed function's return address. 

v3 changes:
  - removed uretprobe_bypass logic, it will be better to send it as
independent patch
  - unified xol_get_trampoline_slot() and xol_get_insn_slot()
  - protected uprobe with refcounter in prepare_uretprobe()
  - uprobe_register() routine fails now, if neither consumer is set
  - enclosed implementation into 1/6, 6/6 patches -ENOSYS bits

v2 changes:
  - introduced rp_handler(), get rid of return_consumers
  - get rid of uretprobe_[un]register()
  - introduced arch_uretprobe_get_sp()
  - removed uprobe_task->doomed, kill task immediately
  - fix arch_uretprobe_hijack_return_addr()'s returns
  - address the v1 minor issues

integrated patchset:
  http://github.com/arapov/linux-aa/commits/uretprobes_v3

previous implementations:
  RFCv2: https://lkml.org/lkml/2013/1/9/157
  RFCv1: https://lkml.org/lkml/2012/12/21/133

thanks,
Anton

Anton Arapov (6):
  uretprobes: preparation patch
  uretprobes/x86: hijack return address
  uretprobes: generalize xol_get_insn_slot()
  uretprobes: return probe entry, prepare uretprobe
  uretprobes: invoke return probe handlers
  uretprobes: implemented, thus remove -ENOSYS

 arch/x86/include/asm/uprobes.h |   6 +++
 arch/x86/kernel/uprobes.c  |  29 +++
 include/linux/uprobes.h|   6 +++
 kernel/events/uprobes.c| 112 ++---
 4 files changed, 146 insertions(+), 7 deletions(-)

-- 
1.8.1.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v3 6/6] uretprobes: implemented, thus remove -ENOSYS

2013-02-28 Thread Anton Arapov

  1/6 and 6/6 patches are here to enclose return probes implementation
as well as prohibit uprobe_register() routine with no consumer set.

v3 changes: (the patch is introduced in v3)
  - remove 'TODO' as return probes implemented now

Signed-off-by: Anton Arapov 
---
 kernel/events/uprobes.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index deacb35..7ae338b 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -838,10 +838,6 @@ int uprobe_register(struct inode *inode, loff_t offset, 
struct uprobe_consumer *
if (!uc->handler && !uc->rp_handler)
return -EINVAL;
 
-   /* TODO: Implement return probes */
-   if (uc->rp_handler)
-   return -ENOSYS;
-
/* Racy, just to catch the obvious mistakes */
if (offset > i_size_read(inode))
return -EINVAL;
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v3 5/6] uretprobes: invoke return probe handlers

2013-02-28 Thread Anton Arapov

  Uretprobe handlers are invoked when the trampoline is hit, on completion the
trampoline is replaced with the saved return address and the uretprobe instance
deleted.

v3:
 - protected uprobe with refcounter. See put_uprobe() in handle_uretprobe()
that reflects increment in prepare_uretprobe()
v2:
 - get rid of ->return_consumers member from struct uprobe, introduce
rp_handler() in consumer instead

Signed-off-by: Anton Arapov 
---
 arch/x86/include/asm/uprobes.h |  5 +
 kernel/events/uprobes.c| 50 --
 2 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
index c353555..fa9d9de 100644
--- a/arch/x86/include/asm/uprobes.h
+++ b/arch/x86/include/asm/uprobes.h
@@ -56,4 +56,9 @@ extern bool arch_uprobe_xol_was_trapped(struct task_struct 
*tsk);
 extern int  arch_uprobe_exception_notify(struct notifier_block *self, unsigned 
long val, void *data);
 extern void arch_uprobe_abort_xol(struct arch_uprobe *aup, struct pt_regs 
*regs);
 extern unsigned long arch_uretprobe_hijack_return_addr(unsigned long 
rp_trampoline_vaddr, struct pt_regs *regs);
+
+static inline unsigned long arch_uretprobe_get_sp(struct pt_regs *regs)
+{
+   return (unsigned long)regs->sp;
+}
 #endif /* _ASM_UPROBES_H */
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 57f70cd..deacb35 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1561,6 +1561,52 @@ static void handler_chain(struct uprobe *uprobe, struct 
pt_regs *regs)
up_read(&uprobe->register_rwsem);
 }
 
+static void handler_uretprobe_chain(struct uprobe *uprobe, struct pt_regs 
*regs)
+{
+   struct uprobe_consumer *uc;
+
+   down_read(&uprobe->register_rwsem);
+   for (uc = uprobe->consumers; uc; uc = uc->next) {
+   if (uc->rp_handler)
+   uc->rp_handler(uc, regs);
+   }
+   up_read(&uprobe->register_rwsem);
+}
+
+static void handle_uretprobe(struct pt_regs *regs)
+{
+   struct hlist_head *head;
+   struct hlist_node *tmp;
+   struct xol_area *area;
+   struct return_uprobe_i *ri;
+   struct uprobe_task *utask;
+
+   /* TODO: uretprobe bypass logic */
+
+   area = get_xol_area();
+   utask = get_utask();
+   head = &utask->return_uprobes;
+   hlist_for_each_entry_safe(ri, tmp, head, hlist) {
+   if (ri->uprobe->consumers) {
+   instruction_pointer_set(regs, ri->orig_ret_vaddr);
+   handler_uretprobe_chain(ri->uprobe, regs);
+   }
+
+   put_uprobe(ri->uprobe);
+   hlist_del(&ri->hlist);
+   kfree(ri);
+
+   if (ri->orig_ret_vaddr != area->rp_trampoline_vaddr)
+   return;
+   }
+
+   /* TODO: change the message */
+   printk(KERN_ERR "uprobe: no instance found! pid/tgid=%d/%d",
+   current->pid, current->tgid);
+   /* TODO: May be sigtrap? if possible to restore utask */
+   send_sig(SIGSEGV, current, 0);
+}
+
 /*
  * Run handler and ask thread to singlestep.
  * Ensure all non-fatal signals cannot interrupt thread while it singlesteps.
@@ -1576,8 +1622,8 @@ static void handle_swbp(struct pt_regs *regs)
 
if (!uprobe) {
if (is_swbp > 0) {
-   /* No matching uprobe; signal SIGTRAP. */
-   send_sig(SIGTRAP, current, 0);
+   /* No matching uprobe; Try with uretprobe */
+   handle_uretprobe(regs);
} else {
/*
 * Either we raced with uprobe_unregister() or we can't
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v3 2/6] uretprobes/x86: hijack return address

2013-02-28 Thread Anton Arapov

  hijack the return address and replace it with a "trampoline"

v2:
  - remove ->doomed flag, kill task immediately

Signed-off-by: Anton Arapov 
---
 arch/x86/include/asm/uprobes.h |  1 +
 arch/x86/kernel/uprobes.c  | 29 +
 2 files changed, 30 insertions(+)

diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
index 8ff8be7..c353555 100644
--- a/arch/x86/include/asm/uprobes.h
+++ b/arch/x86/include/asm/uprobes.h
@@ -55,4 +55,5 @@ extern int  arch_uprobe_post_xol(struct arch_uprobe *aup, 
struct pt_regs *regs);
 extern bool arch_uprobe_xol_was_trapped(struct task_struct *tsk);
 extern int  arch_uprobe_exception_notify(struct notifier_block *self, unsigned 
long val, void *data);
 extern void arch_uprobe_abort_xol(struct arch_uprobe *aup, struct pt_regs 
*regs);
+extern unsigned long arch_uretprobe_hijack_return_addr(unsigned long 
rp_trampoline_vaddr, struct pt_regs *regs);
 #endif /* _ASM_UPROBES_H */
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 0ba4cfb..85e2153 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -697,3 +697,32 @@ bool arch_uprobe_skip_sstep(struct arch_uprobe *auprobe, 
struct pt_regs *regs)
send_sig(SIGTRAP, current, 0);
return ret;
 }
+
+extern unsigned long arch_uretprobe_hijack_return_addr(unsigned long
+   rp_trampoline_vaddr, struct pt_regs *regs)
+{
+   int rasize, ncopied;
+   unsigned long orig_ret_vaddr = 0; /* clear high bits for 32-bit apps */
+
+   rasize = is_ia32_task() ? 4 : 8;
+   ncopied = copy_from_user(&orig_ret_vaddr, (void __user *)regs->sp, 
rasize);
+   if (unlikely(ncopied))
+   return 0;
+
+   /* check whether address has been already hijacked */
+   if (orig_ret_vaddr == rp_trampoline_vaddr)
+   return orig_ret_vaddr;
+
+   ncopied = copy_to_user((void __user *)regs->sp, &rp_trampoline_vaddr, 
rasize);
+   if (unlikely(ncopied)) {
+   if (ncopied != rasize) {
+   printk(KERN_ERR "uretprobe: return address clobbered: "
+   "pid=%d, %%sp=%#lx, %%ip=%#lx\n",
+   current->pid, regs->sp, regs->ip);
+   /* kill task immediately */
+   send_sig(SIGSEGV, current, 0);
+   }
+   }
+
+   return orig_ret_vaddr;
+}
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v3 3/6] uretprobes: generalize xol_get_insn_slot()

2013-02-28 Thread Anton Arapov

  Generalize xol_take_insn_slot() to enable more consumers of the
function, e.g. trampoline implementation for return probes.

  The first time a uprobe with return consumer is hit for a process, a
trampoline slot is obtained in the xol_area and initialized with a
breakpoint instruction. This slot is subsequently used by all
uretprobes.

v3:
 - unified xol_get_insn_slot(), thus get rid of xol_get_trampoline_slot()

Signed-off-by: Anton Arapov 
---
 kernel/events/uprobes.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index d904164..69bf060 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1221,7 +1221,7 @@ static unsigned long xol_take_insn_slot(struct xol_area 
*area)
  * xol_get_insn_slot - allocate a slot for xol.
  * Returns the allocated slot address or 0.
  */
-static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
+static unsigned long xol_get_insn_slot(unsigned char *insn)
 {
struct xol_area *area;
unsigned long offset;
@@ -1239,7 +1239,7 @@ static unsigned long xol_get_insn_slot(struct uprobe 
*uprobe)
/* Initialize the slot */
offset = xol_vaddr & ~PAGE_MASK;
vaddr = kmap_atomic(area->page);
-   memcpy(vaddr + offset, uprobe->arch.insn, MAX_UINSN_BYTES);
+   memcpy(vaddr + offset, insn, MAX_UINSN_BYTES);
kunmap_atomic(vaddr);
/*
 * We probably need flush_icache_user_range() but it needs vma.
@@ -1353,7 +1353,7 @@ pre_ssout(struct uprobe *uprobe, struct pt_regs *regs, 
unsigned long bp_vaddr)
if (!utask)
return -ENOMEM;
 
-   xol_vaddr = xol_get_insn_slot(uprobe);
+   xol_vaddr = xol_get_insn_slot(uprobe->arch.insn);
if (!xol_vaddr)
return -ENOMEM;
 
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] memcg: Add memory.pressure_level events

2013-02-18 Thread Anton Vorontsov

With this patch userland applications that want to maintain the
interactivity/memory allocation cost can use the pressure level
notifications. The levels are defined like this:

The "low" level means that the system is reclaiming memory for new
allocations. Monitoring this reclaiming activity might be useful for
maintaining cache level. Upon notification, the program (typically
"Activity Manager") might analyze vmstat and act in advance (i.e.
prematurely shutdown unimportant services).

The "medium" level means that the system is experiencing medium memory
pressure, the system might be making swap, paging out active file caches,
etc. Upon this event applications may decide to further analyze
vmstat/zoneinfo/memcg or internal memory usage statistics and free any
resources that can be easily reconstructed or re-read from a disk.

The "critical" level means that the system is actively thrashing, it is
about to out of memory (OOM) or even the in-kernel OOM killer is on its
way to trigger. Applications should do whatever they can to help the
system. It might be too late to consult with vmstat or any other
statistics, so it's advisable to take an immediate action.

The events are propagated upward until the event is handled, i.e. the
events are not pass-through. Here is what this means: for example you have
three cgroups: A->B->C. Now you set up an event listener on cgroups A, B
and C, and suppose group C experiences some pressure. In this situation,
only group C will receive the notification, i.e. groups A and B will not
receive it. This is done to avoid excessive "broadcasting" of messages,
which disturbs the system and which is especially bad if we are low on
memory or thrashing. So, organize the cgroups wisely, or propagate the
events manually (or, ask us to implement the pass-through events,
explaining why would you need them.)

Signed-off-by: Anton Vorontsov 
Acked-by: Kirill A. Shutemov 
---

Hi all,

Many thanks for the previous reviews! In this revision:

- Addressed Glauber Costa's comments:
  o Use parent_mem_cgroup() instead of own parent function (also suggested
by Kamezawa). This change also affected events distribution logic, so
it became more like memory thresholds notifications, i.e. we deliver
the event to the cgroup where the event originated, not to the parent
cgroup; (This also addreses Kamezawa's remark regarding which cgroup
receives which event.)
  o Register vmpressure cgroup file directly in memcontrol.c.

- Addressed Greg Thelen's comments:
  o Fixed bool/int inconsistency in the code;
  o Fixed nr_scanned accounting;
  o Don't use cryptic 's', 'r' abbreviations; get rid of confusing
'window' argument.

- Addressed Kamezawa Hiroyuki's comments:
  o Moved declarations from mm/internal.h into linux/vmpressue.h;
  o Removed Kconfig symbol. Vmpressure is pretty lightweight (especially
comparing to the memcg accounting). If it ever causes any measurable
performance effect, we want to fix it, not paper it over with a
Kconfig option. :-)
  o Removed read operation on pressure_level cgroup file. In apps, we only
use notifications, we don't need the content of the file, so let's
keep things simple for now. Plus this resolves questions like what
should we return there when the system is not reclaiming;
  o Reworded documentation;
  o Improved comments for vmpressure_prio().


Old changelogs/submissions:
  v1: http://lkml.org/lkml/2013/2/10/140
  mempressure cgroup: http://lkml.org/lkml/2013/1/4/55

Thanks!

Anton

 Documentation/cgroups/memory.txt |  61 +-
 include/linux/vmpressure.h   |  47 
 mm/Makefile  |   2 +-
 mm/memcontrol.c  |  28 +
 mm/vmpressure.c  | 252 +++
 mm/vmscan.c  |   8 ++
 6 files changed, 396 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/vmpressure.h
 create mode 100644 mm/vmpressure.c

diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index addb1f1..0c004de 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -40,6 +40,7 @@ Features:
  - soft limit
  - moving (recharging) account at moving a task is selectable.
  - usage threshold notifier
+ - memory pressure notifier
  - oom-killer disable knob and oom-notifier
  - Root cgroup has no limit controls.
 
@@ -65,6 +66,7 @@ Brief summary of control files.
  memory.stat# show various statistics
  memory.use_hierarchy   # set/show hierarchical account enabled
  memory.force_empty # trigger forced move charge to parent
+ memory.pressure_level  # set memory pressure notifications
  memory.swappiness  # set/show swappiness parameter of vmscan
 (See sysctl

[GIT PULL] battery-2.6.git

2013-02-18 Thread Anton Vorontsov

Hello Linus,

Here is a battery/power_supply queue for v3.9. There are two tags:
for-v3.9 and for-v3.9-merged. The second one is exactly the same as the
first one, except that in the -merged tag I merged v3.8 back into battery
tree to fix some conflicts so that you wouldn't need to do it.

Highlights (included in tag message):

- Four new drivers:

  o goldfish_battery:
This is Android Emulator battery driver. Originally from Google, but
Intel folks reshaped it for mainline;

  o pm2301_charger:
A new driver for ST-Ericsson 2301 Power Management chip, uses AB8500
battery management core;

  o qnap-poweroff:
The driver adds poweroff functionality for QNAP NAS boxes;

  o restart-poweroff:
A generic driver that implements 'power off by restarting'. The actual
poweroff functionality is implemented through a bootloader, so Linux'
task is just to restart the box. The driver is useful on Buffalo
Linkstation LS-XHL and LS-CHLv2 boards. Andrew Lunn worked on
submitting the driver (as well as qnap-poweroff above).

- A lot of fixes for ab8500 drivers. This is a part of efforts of syncing
  internal ST-Ericsson development tree with the mainline. Lee Jones @
  Linaro worked on compilation and reshaping these series;

- New health properties for the power supplies: "Watchdog timer expire"
  and "Safety timer expire";

- As usual, a bunch of fixes/cleanups here and there.

Please pull.

Thanks!


The following changes since commit 19f949f52599ba7c3f67a5897ac6be14bfcb1200:

  Linux 3.8 (2013-02-18 15:58:34 -0800)

are available in the git repository at:

  git://git.infradead.org/battery-2.6.git tags/for-v3.9-merged

for you to fetch changes up to 845e37e009281b63fe15348f37bd7f815ea65da0:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux 
(2013-02-18 22:55:13 -0800)


Andrew Lunn (3):
  power/reset: Add a new driver to turn QNAP board power off
  power/reset: Add a new driver implementing 'power off by restarting'
  qnap-poweroff: Fix license string

Anton Vorontsov (4):
  Merge branch 'for-anton' of 
git://git.linaro.org/people/ljones/linux-3.0-ux500
  Merge branch 'ab8500-from-Rajanikanth'
  Merge branch 'tb-power-2' of 
git://git.linaro.org/people/ljones/linux-3.0-ux500
  Merge git://git.kernel.org/.../torvalds/linux

Chanwoo Choi (1):
  charger-manager: Split _probe funtion to make the code more clean

Dan Carpenter (2):
  lp8727_charger: Small cleanup in naming
  generic-adc-battery: Fix forever loop in gab_remove()

Devendra Naga (1):
  max17040_battery: Use devm_kzalloc

Evgeny Romanov (1):
  ds2782_battery: Add power_supply_changed() calls for proper uevent support

Hakan Berg (5):
  ab8500_btemp: Ignore false btemp low interrupt
  ab8500_bm: Adds support for Car/Travel Adapters
  ab8500_btemp: Remove superfluous BTEMP thermal comp
  ab8500_fg: Added support for BATT_OVV
  ab8500-fg: Adjust for RF bursts voltage drops

Heiko Carstens (1):
  goldfish_battery: Add missing GENERIC_HARDIRQS dependency

Henrik Sölver (1):
  ab8500-charger: AB workaround for invalid charger

Johan Bjornstedt (2):
  ab8500_charger: Charger current step-up/down
  ab8500_bm: Skip first CCEOC irq for instant current

Jonas Aaberg (5):
  ab8500_btemp: Detect battery type in workqueue
  ab8500_fg: Replace msleep() with usleep_range() for greater accuracy
  ab8500_charger: Handle gpadc errors
  ab8500-bm: Flush all work queues before suspending
  ab8500-charger: Do not touch VBUSOVV bits

Julia Lawall (1):
  88pm860x_battery: Eliminate possible references to released resources

Kalle Komierowski (1):
  ab8500_fg: Don't clear the CCMuxOffset bit

Kim, Milo (4):
  MAINTAINERS: Add LP8727 charger driver entry
  MAINTAINERS: Add LP8788 MFD driver entry
  lp8788-charger: Fix a parent device in _probe()
  lp8788-charger: Fix a parent device in kernel messages

Lee Jones (19):
  ab8500_btemp: Fix crazy tabbing implementation
  ab8500-bmdata: Re-jiggle bmdevs_of_probe to be more succinct
  ab8500_bm: Rename battery management platform data to something more 
logical
  ab8500_bm: Always send platform specific battery information via pdata
  ab8500_btemp: Reorder obtainment of platform specific battery management 
data
  ab8500_charger: Reorder obtainment of platform specific battery 
management data
  ab8500_fg: Reorder obtainment of platform specific battery management data
  abx500_chargalg: Reorder obtainment of platform specific battery 
management data
  ab8500_bm: Make the battery Device Tree node reference less cryptic
  ab8500_charger: Detect charger removal
  ab8500_btemp: Allign battery temperature resolution with the framework
  ab8500_fg: Remove pointless r

Re: [PATCH v2] memcg: Add memory.pressure_level events

2013-02-21 Thread Anton Vorontsov

On Tue, Feb 19, 2013 at 04:21:28PM -0800, Tejun Heo wrote:
> On Tue, Feb 19, 2013 at 4:17 PM, Minchan Kim  wrote:
> > Should we really enable memcg for just pressure notificaion in embedded 
> > side?
> > I didn't check the size(cgroup + memcg) and performance penalty but I don't 
> > want
> > to add unnecessary overhead if it is possible.
> > Do you have a plan to support it via global knob(ie, /proc/mempressure), 
> > NOT memcg?
> 
> That should be handled by mempressure at the root cgroup. If that adds
> significant amount of overhead code or memory-wise, we just need to
> fix root cgroup handling in memcg. No reason to further complicate the
> interface which already is pretty complex.

For what it worth, I agree here. Even if we decide to make another
interface to vmpressure (which, say, would not require memcg), then it is
better to keep the API the same: eventfd + control file. That way,
API/ABI-wise there will be no differnce between memcg and non-memcg
kernels, which is cool.

Thanks,
Anton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] memcg: Add memory.pressure_level events

2013-02-21 Thread Anton Vorontsov

On Fri, Feb 22, 2013 at 08:56:08AM +0900, Minchan Kim wrote:
> [...] The my point is that you have a plan to support? Why I have a
> question is that you said your goal is to replace lowmemory killer

In short: yes, of course, if the non-memcg interface will be in demand.

> but android don't have enabled CONFIG_MEMCG as you know well
> so they should enable it for using just notifier? or they need another hack to
> connect notifier to global thing?

A hack is not an option for me. :-) My final goal is to switch Android to
use the notifier without need for hacks/external patches or
drivers/staging.

But my current goal is to make the most generic case work, and do this in
the most correct way. That is, vmpressure + MEMCG. Once I accomplish this,
I can then think of any niche needs (such as Android).

There will be two possibilities for Android:

1. Obviously, turn on CONFIG_MEMCG. We need to measure its effect on real
   devices, and see if it makes sense. (Plus, maybe there are other uses
   for MEMCG on Android?)

or

2. Implement /sys/fs/cgroups/memory/memory.pressure_level interface
   without MEMCG. Doing this will be really easy as we'll already have
   vmpressure() core, and Android has CROUPS=y. But I do expect some
   discussion like 'why don't you fix memcg instead?'. We'll have to
   answer this question by looking back at '1.'

Also note that cgroups vmpressure notifiers were tried by QEMU folks, and
it seemed to be useful:

   http://lists.gnu.org/archive/html/qemu-devel/2012-12/msg02821.html 

So, nowadays it is not only about Android. Some time ago I also got an
email from Orna Agmon Ben-Yehuda, who suggested to use vmpressure stuff
with 'memcached' (but I didn't find time to actually try it, so far. :(
Thanks for the email, btw!).

So it is useful with or without MEMCG, and if we will really need to
support vmpressure without MEMCG, I will have to implement the support in
addition to MEMCG case, yes.

Thanks,

Anton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] lp8727_charger: use IRQF_ONESHOT

2012-08-30 Thread Anton Vorontsov

On Thu, Aug 30, 2012 at 11:41:54AM +, Kim, Milo wrote:
> > > ERROR: Threaded IRQ with no primary handler requested without
> > > IRQF_ONESHOT
> > >
> > > Make sure threaded IRQs without a primary handler are always request
> > > with IRQF_ONESHOT
> > >
> > > Signed-off-by: Fengguang Wu 
> > > ---
> > >
> > > Note: I don't really know much about the situation, feel free to
> > > ignore it if it's an false warning.
> > >
> > >  cocci-output-25411-2ae2e0-lp8727_charger.c |2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > --- a/drivers/power/lp8727_charger.c
> > > +++ b/drivers/power/lp8727_charger.c
> > > @@ -255,7 +255,7 @@ static int lp8727_intr_config(struct lp8
> > >   return request_threaded_irq(pchg->client->irq,
> > >   NULL,
> > >   lp8727_isr_func,
> > > - IRQF_TRIGGER_FALLING,
> > > + IRQF_TRIGGER_FALLING | IRQF_ONESHOT,
> > >   "lp8727_irq",
> > >   pchg);
> > >  }
> > Thanks for the update.
> > 
> > Acked-by: Milo(Woogyom) Kim 
> > 
> 
> Please refer to the patch below.
> [PATCH 4/8] lp8727_charger: cleanup the interrupt handler code
> This includes previous Fengguang' patch.

Nope, I'll include Fengguang's patch (he posted it first), and then
will try to apply the rest of your cleanups on top of it. If it helps,
your can incorporate the fix in your series, preserving the authorship.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/8] lp8727_charger: cleanup the interrupt handler code

2012-08-30 Thread Anton Vorontsov

On Thu, Aug 30, 2012 at 11:39:57AM +, Kim, Milo wrote:
> (a) add configurable debounce timer in the platform data
> : if it is not defined, default time(270ms) is set.
> (b) use schedule_delay_work() and remove unnecessary workqueue resource
> : for delayed interrupt handling, use the schedule_delay_work()
> (c) add lp8727_release_irq() for clearing the irq
> (c) clear interrupts while loading the driver

It seems to me that all of the issues desire their own separate
patches. A few benefits for this:

- It helps reviewing changes;
- In case of regressions, that would help bisecting a lot, plus we will
  not need to revert the whole thing, if we'd have to fall back to
  revert.

Thanks,
Anton.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 8/8] lp8727_charger: make cosmetic code

2012-08-30 Thread Anton Vorontsov

On Thu, Aug 30, 2012 at 11:41:01AM +, Kim, Milo wrote:
> (a) change the return type of lp8727_is_charger_attached()
> (b) remove unnecessary name comparison in lp8727_is_charger_attached()

This seems not too 'cosmetic'. Maybe a separate patch for this?
It looks trivial enough, but still, it's usually a good idea to
do one thing per patch.

> (c) return as the result of function in lp8727_init_device()
> (d) make inline function for lp8727_ctrl_switch()
> (e) use specific definition rather than magic number in lp8727_delayed_func()
> (f) make simple code in power_supply_get_properties
> (g) make simple code in lp8727_charger_changed()

Ditto for these two.

> (h) fix checkpatch warning on MODULE_AUTHOR
> (i) fit spaces for description in header
> 
> Signed-off-by: Milo(Woogyom) Kim 
> ---
>  drivers/power/lp8727_charger.c   |   77 
> ++
>  include/linux/platform_data/lp8727.h |2 +-
>  2 files changed, 41 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/power/lp8727_charger.c b/drivers/power/lp8727_charger.c
> index 610d286..41bc0f3 100644
> --- a/drivers/power/lp8727_charger.c
> +++ b/drivers/power/lp8727_charger.c
> @@ -121,14 +121,12 @@ static inline int lp8727_write_byte(struct lp8727_chg 
> *pchg, u8 reg, u8 data)
>   return i2c_smbus_write_byte_data(pchg->client, reg, data);
>  }
>  
> -static int lp8727_is_charger_attached(const char *name, int id)
> +static bool lp8727_is_charger_attached(const char *name, int id)
>  {
> - if (name) {
> - if (!strcmp(name, "ac"))
> - return id == LP8727_CHG_TA || id == 
> LP8727_CHG_DEDICATED;
> - else if (!strcmp(name, "usb"))
> - return id == LP8727_CHG_USB;
> - }
> + if (!strcmp(name, "ac"))
> + return id == LP8727_CHG_TA || id == LP8727_CHG_DEDICATED;
> + else if (!strcmp(name, "usb"))
> + return id == LP8727_CHG_USB;
>  
>   return id >= LP8727_CHG_TA && id <= LP8727_CHG_USB;
>  }
> @@ -150,16 +148,13 @@ static int lp8727_init_device(struct lp8727_chg *pchg)
>   return ret;
>  
>   val = LP8727_INT_EN | LP8727_CHGDET_EN;
> - ret = lp8727_write_byte(pchg, LP8727_CTRL2, val);
> - if (ret)
> - return ret;
> -
> - return 0;
> + return lp8727_write_byte(pchg, LP8727_CTRL2, val);
>  }
>  
>  static int lp8727_is_dedicated_charger(struct lp8727_chg *pchg)
>  {
>   u8 val;
> +
>   lp8727_read_byte(pchg, LP8727_STATUS1, &val);
>   return val & LP8727_DCPORT;
>  }
> @@ -167,11 +162,12 @@ static int lp8727_is_dedicated_charger(struct 
> lp8727_chg *pchg)
>  static int lp8727_is_usb_charger(struct lp8727_chg *pchg)
>  {
>   u8 val;
> +
>   lp8727_read_byte(pchg, LP8727_STATUS1, &val);
>   return val & LP8727_CHPORT;
>  }
>  
> -static void lp8727_ctrl_switch(struct lp8727_chg *pchg, u8 sw)
> +static inline void lp8727_ctrl_switch(struct lp8727_chg *pchg, u8 sw)
>  {
>   lp8727_write_byte(pchg, LP8727_SWCTRL, sw);
>  }
> @@ -221,11 +217,13 @@ static void lp8727_enable_chgdet(struct lp8727_chg 
> *pchg)
>  
>  static void lp8727_delayed_func(struct work_struct *_work)
>  {
> - u8 intstat[2], idno, vbus;
> - struct lp8727_chg *pchg =
> - container_of(_work, struct lp8727_chg, work.work);
> + struct lp8727_chg *pchg = container_of(_work, struct lp8727_chg,
> + work.work);
> + u8 intstat[LP8788_NUM_INTREGS];
> + u8 idno;
> + u8 vbus;
>  
> - if (lp8727_read_bytes(pchg, LP8727_INT1, intstat, 2)) {
> + if (lp8727_read_bytes(pchg, LP8727_INT1, intstat, LP8788_NUM_INTREGS)) {
>   dev_err(pchg->dev, "can not read INT registers\n");
>   return;
>   }
> @@ -308,9 +306,10 @@ static int lp8727_charger_get_property(struct 
> power_supply *psy,
>  {
>   struct lp8727_chg *pchg = dev_get_drvdata(psy->dev->parent);
>  
> - if (psp == POWER_SUPPLY_PROP_ONLINE)
> - val->intval = lp8727_is_charger_attached(psy->name,
> -  pchg->chg_type);
> + if (psp != POWER_SUPPLY_PROP_ONLINE)
> + return 0;
> +
> + val->intval = lp8727_is_charger_attached(psy->name, pchg->chg_type);
>  
>   return 0;
>  }
> @@ -338,15 +337,16 @@ static int lp8727_battery_get_property(struct 
> power_supply *psy,
>  
>   switch (psp) {
>   case POWER_SUPPLY_PROP_STATUS:
> - if (lp8727_is_charger_attached(psy->name, pchg->chg_type)) {
> - lp8727_read_byte(pchg, LP8727_STATUS1, &read);
> + if (!lp8727_is_charger_attached(psy->name, pchg->chg_type)) {
> + val->intval = POWER_SUPPLY_STATUS_DISCHARGING;
> + return 0;
> + }
> +
> + lp8727_read_byte(pchg, LP8727_STATUS1, &read);
>  
> - val->intval = (read & LP8727_CHGSTAT) == 
> LP8727_STAT_EOC ?
> +

Re: [PATCH 0/8] lp8727_charger: cleanup code

2012-08-30 Thread Anton Vorontsov

On Thu, Aug 30, 2012 at 11:37:16AM +, Kim, Milo wrote:
> LP8727 driver should be patched for several reasons.
> 
> (a) Need to clean up _probe()/_remove()
> (b) Not secure code when the platform data is NULL
> (c) Interrupt handling
> Two threads are running for handling one IRQ.
> One is for the IRQ pin, the other is used for delayed processing.
> This is unusual and can be enhanced.
> (d) Misuse of mutex code
> (e) Lots of definitions should be fixed
> (f) Others..

Thanks a lot for the cleanups, this is much appreciated! The cleanups
themselves look great, but I'd really like to see them more separated.

Thanks,
Anton.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 19/21] ntfs: drop vmtruncate

2012-09-01 Thread Anton Altaparmakov

Hi,

Looks good though if you call ntfs_truncate() rather than ntfs_truncate_vfs() 
you can use it to return an error code in the case that ntfs_truncate() fails.  
That could be done as a separate patch of course.

You can add my Reviewed-by: or Signed-off-by: to this patch.

Best regards,

Anton

On 31 Aug 2012, at 15:02, Marco Stornelli  wrote:

> Removed vmtruncate.
> 
> Signed-off-by: Marco Stornelli 
> ---
> fs/ntfs/file.c  |8 +---
> fs/ntfs/inode.c |   11 +--
> fs/ntfs/inode.h |4 
> 3 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
> index 1ecf464..5332e90 100644
> --- a/fs/ntfs/file.c
> +++ b/fs/ntfs/file.c
> @@ -2022,8 +2022,11 @@ static ssize_t ntfs_file_buffered_write(struct kiocb 
> *iocb,
>* allocated space, which is not a disaster.
>*/
>   i_size = i_size_read(vi);
> - if (pos + bytes > i_size)
> - vmtruncate(vi, i_size);
> + if ((pos + bytes > i_size) &&
> + inode_newsize_ok(vi, i_size) == 0) {
> + truncate_setsize(vi, i_size);
> + ntfs_truncate_vfs(vi);
> + }
>   break;
>   }
>   }
> @@ -2227,7 +2230,6 @@ const struct file_operations ntfs_file_ops = {
> 
> const struct inode_operations ntfs_file_inode_ops = {
> #ifdef NTFS_RW
> - .truncate   = ntfs_truncate_vfs,
>   .setattr= ntfs_setattr,
> #endif /* NTFS_RW */
> };
> diff --git a/fs/ntfs/inode.c b/fs/ntfs/inode.c
> index c6dbd3d..fe05a92 100644
> --- a/fs/ntfs/inode.c
> +++ b/fs/ntfs/inode.c
> @@ -2865,9 +2865,11 @@ conv_err_out:
>  *
>  * See ntfs_truncate() description above for details.
>  */
> +#ifdef NTFS_RW
> void ntfs_truncate_vfs(struct inode *vi) {
>   ntfs_truncate(vi);
> }
> +#endif
> 
> /**
>  * ntfs_setattr - called from notify_change() when an attribute is being 
> changed
> @@ -2913,8 +2915,13 @@ int ntfs_setattr(struct dentry *dentry, struct iattr 
> *attr)
>   NInoCompressed(ni) ?
>   "compressed" : "encrypted");
>   err = -EOPNOTSUPP;
> - } else
> - err = vmtruncate(vi, attr->ia_size);
> + } else {
> + err = inode_newsize_ok(vi, attr->ia_size);
> + if (!err) {
> + truncate_setsize(vi, attr->ia_size);
> + ntfs_truncate_vfs(vi);
> + }
> + }
>   if (err || ia_valid == ATTR_SIZE)
>   goto out;
>   } else {
> diff --git a/fs/ntfs/inode.h b/fs/ntfs/inode.h
> index db29695..76b6cfb 100644
> --- a/fs/ntfs/inode.h
> +++ b/fs/ntfs/inode.h
> @@ -316,6 +316,10 @@ static inline void ntfs_commit_inode(struct inode *vi)
>   return;
> }
> 
> +#else
> +
> +static inline void ntfs_truncate_vfs(struct inode *vi) {}
> +
> #endif /* NTFS_RW */
> 
> #endif /* _LINUX_NTFS_INODE_H */
> -- 
> 1.7.3.4

Best regards,

Anton
-- 
Anton Altaparmakov  (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer, http://www.linux-ntfs.org/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH/RFC 0/4] Deferrable timers support for timerfd API

2012-09-01 Thread Anton Vorontsov

Hi all,

This patch set implements a userland-side API for generic deferrable
timers, per linux/timer.h:

 * A deferrable timer will work normally when the system is busy, but
 * will not cause a CPU to come out of idle just to service it; instead,
 * the timer will be serviced when the CPU eventually wakes up with a
 * subsequent non-deferrable timer.

These timers are crucial for power saving, i.e. periodic tasks that want
to work in background when the system is under use, but don't want to
cause wakeups themselves.

The deferred timers are somewhat orthogonal to high-res external timers,
since the deferred timer is tied to the system load, not just to some
external decrementer source.

So, currently, the implementation has a HZ precision, and the maximum
interval is jiffies resolution (i.e. with HZ=1000, on 32 bit that would
be around max 49 days).  Of course we can implement longer timeouts by
rearming the timer, although it probably  wouldn't make much sense in
real world, so we keep it simple and just return E2BIG if we don't like
the interval.

Note that the code is still using time calculation that is done by the
hrtimer routines, so we pretty much reuse everything except for the
timer events themselves (i.e. we use calculation results of
hrtimer_forward_now() and hrtimer_expires_remaining(), but never start
the hrtimer). So the code path is pretty much the same for both hrtimers
and deferrable timers.

We will use the timers to periodically read /proc/vmstat without
forcibly waking up the system; but let's see, maybe there are other use
cases that might be interesting for PM folks.

Thanks!

Anton.

--
 fs/timerfd.c| 87 +++--
 include/linux/jiffies.h |  3 ++
 include/linux/ktime.h   |  3 +-
 include/linux/timerfd.h |  4 ++-
 kernel/time.c   | 23 +
 5 files changed, 108 insertions(+), 12 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/4] kernel/time: Add new helpers to convert ktime to/from jiffies

2012-09-01 Thread Anton Vorontsov

Two new functions: jiffies_to_ktime() and ktime_to_jiffies(), we'll use
them for timerfd deferred timers handling.

We fully reuse the logic from timespec implementations, so the functions
are pretty straightforward.

The only tricky part is in headers: we have to include jiffies.h after
we defined ktime_t, this is because ktime.h needs some declarations from
jiffies.h (e.g. TICK_NSEC).

Signed-off-by: Anton Vorontsov 
---
 include/linux/jiffies.h |  3 +++
 include/linux/ktime.h   |  3 ++-
 kernel/time.c   | 23 +++
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h
index 265e2c3..4451241 100644
--- a/include/linux/jiffies.h
+++ b/include/linux/jiffies.h
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include  /* for HZ */
 
 /*
@@ -303,6 +304,8 @@ extern void jiffies_to_timespec(const unsigned long jiffies,
 extern unsigned long timeval_to_jiffies(const struct timeval *value);
 extern void jiffies_to_timeval(const unsigned long jiffies,
   struct timeval *value);
+extern unsigned long ktime_to_jiffies(ktime_t *value);
+extern void jiffies_to_ktime(const unsigned long jiffies, ktime_t *value);
 extern clock_t jiffies_to_clock_t(unsigned long x);
 extern unsigned long clock_t_to_jiffies(unsigned long x);
 extern u64 jiffies_64_to_clock_t(u64 x);
diff --git a/include/linux/ktime.h b/include/linux/ktime.h
index 603bec2..9551856 100644
--- a/include/linux/ktime.h
+++ b/include/linux/ktime.h
@@ -22,7 +22,6 @@
 #define _LINUX_KTIME_H
 
 #include 
-#include 
 
 /*
  * ktime_t:
@@ -58,6 +57,8 @@ union ktime {
 
 typedef union ktime ktime_t;   /* Kill this */
 
+#include 
+
 #define KTIME_MAX  ((s64)~((u64)1 << 63))
 #if (BITS_PER_LONG == 64)
 # define KTIME_SEC_MAX (KTIME_MAX / NSEC_PER_SEC)
diff --git a/kernel/time.c b/kernel/time.c
index ba744cf..82c06c5 100644
--- a/kernel/time.c
+++ b/kernel/time.c
@@ -29,6 +29,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -566,6 +567,28 @@ void jiffies_to_timeval(const unsigned long jiffies, 
struct timeval *value)
 }
 EXPORT_SYMBOL(jiffies_to_timeval);
 
+unsigned long ktime_to_jiffies(ktime_t *value)
+{
+   struct timespec ts = ktime_to_timespec(*value);
+
+   /*
+* nsecs_to_jiffies(ktime_to_ns(*ktime)) is unsafe as nsecs_to_jiffies
+* doesn't handle MAX_JIFFY_OFFSET. So we reuse the logic from the
+* timespec to jiffies conversion function.
+*/
+   return timespec_to_jiffies(&ts);
+}
+EXPORT_SYMBOL(ktime_to_jiffies);
+
+void jiffies_to_ktime(const unsigned long jiffies, ktime_t *value)
+{
+   struct timespec ts;
+
+   jiffies_to_timespec(jiffies, &ts);
+   *value = timespec_to_ktime(ts);
+}
+EXPORT_SYMBOL(jiffies_to_ktime);
+
 /*
  * Convert jiffies/jiffies_64 to clock_t and back.
  */
-- 
1.7.11.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/4] timerfd: Move repeated logic into timefd_rearm()

2012-09-01 Thread Anton Vorontsov

This patch introduces timerfd_rearm(), this small helper is used to
forward and restart the hrtimer.

This small refactoring would be also useful if/when we'll add other
backend for timerfd (like deferrable timers), so we won't need to
duplicate the code more.

Signed-off-by: Anton Vorontsov 
---
 fs/timerfd.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/timerfd.c b/fs/timerfd.c
index dffeb37..ecfb3f3 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -178,6 +178,14 @@ static unsigned int timerfd_poll(struct file *file, 
poll_table *wait)
return events;
 }
 
+static u64 timerfd_rearm(struct timerfd_ctx *ctx)
+{
+   u64 orun = hrtimer_forward_now(&ctx->tmr, ctx->tintv) - 1;
+
+   hrtimer_restart(&ctx->tmr);
+   return orun;
+}
+
 static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,
loff_t *ppos)
 {
@@ -214,9 +222,7 @@ static ssize_t timerfd_read(struct file *file, char __user 
*buf, size_t count,
 * callback to avoid DoS attacks specifying a very
 * short timer period.
 */
-   ticks += hrtimer_forward_now(&ctx->tmr,
-ctx->tintv) - 1;
-   hrtimer_restart(&ctx->tmr);
+   ticks += timerfd_rearm(ctx);
}
ctx->expired = 0;
ctx->ticks = 0;
@@ -355,9 +361,7 @@ SYSCALL_DEFINE2(timerfd_gettime, int, ufd, struct 
itimerspec __user *, otmr)
spin_lock_irq(&ctx->wqh.lock);
if (ctx->expired && ctx->tintv.tv64) {
ctx->expired = 0;
-   ctx->ticks +=
-   hrtimer_forward_now(&ctx->tmr, ctx->tintv) - 1;
-   hrtimer_restart(&ctx->tmr);
+   ctx->ticks += timerfd_rearm(ctx);
}
kotmr.it_value = ktime_to_timespec(timerfd_get_remaining(ctx));
kotmr.it_interval = ktime_to_timespec(ctx->tintv);
-- 
1.7.11.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/4] timerfd: Factor out timer-type unspecific timerfd_expire()

2012-09-01 Thread Anton Vorontsov

There is nothing hrtimer-specific inside the timerfd_tmrproc(), except
the function prototype. We're about to add other timer types, so factor
out generic timerfd_expire() helper from timerfd_tmrproc().

Signed-off-by: Anton Vorontsov 
---
 fs/timerfd.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/timerfd.c b/fs/timerfd.c
index ecfb3f3..222db32 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -45,9 +45,8 @@ static DEFINE_SPINLOCK(cancel_lock);
  * flag, but we do not re-arm the timer (in case it's necessary,
  * tintv.tv64 != 0) until the timer is accessed.
  */
-static enum hrtimer_restart timerfd_tmrproc(struct hrtimer *htmr)
+static void timerfd_expire(struct timerfd_ctx *ctx)
 {
-   struct timerfd_ctx *ctx = container_of(htmr, struct timerfd_ctx, tmr);
unsigned long flags;
 
spin_lock_irqsave(&ctx->wqh.lock, flags);
@@ -56,6 +55,11 @@ static enum hrtimer_restart timerfd_tmrproc(struct hrtimer 
*htmr)
wake_up_locked(&ctx->wqh);
spin_unlock_irqrestore(&ctx->wqh.lock, flags);
 
+}
+
+static enum hrtimer_restart timerfd_tmrproc(struct hrtimer *htmr)
+{
+   timerfd_expire(container_of(htmr, struct timerfd_ctx, tmr));
return HRTIMER_NORESTART;
 }
 
-- 
1.7.11.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/4] timerfd: Add support for deferrable timers

2012-09-01 Thread Anton Vorontsov

This patch implements a userland-side API for generic deferrable timers,
per linux/timer.h:

 * A deferrable timer will work normally when the system is busy, but
 * will not cause a CPU to come out of idle just to service it; instead,
 * the timer will be serviced when the CPU eventually wakes up with a
 * subsequent non-deferrable timer.

These timers are crucial for power saving, i.e. periodic tasks that want
to work in background when the system is under use, but don't want to
cause wakeups themselves.

The deferred timers are somewhat orthogonal to high-res external timers,
since the deferred timer is tied to the system load, not just to some
external decrementer source.

So, currently, the implementation has a HZ precision, and the maximum
interval is jiffies resolution (i.e. with HZ=1000, on 32 bit that would
be around max 49 days).  Of course we can implement longer timeouts by
rearming the timer, although it probably  wouldn't make much sense in
real world, so we keep it simple and just return E2BIG if we don't like
the interval.

Note that the code is still using time calculation that is done by the
hrtimer routines, so we pretty much reuse everything except for the
timer events themselves (i.e. we use calculation results of
hrtimer_forward_now() and hrtimer_expires_remaining(), but never start
the hrtimer). So the code path is pretty much the same for both hrtimers
and deferrable timers.

Signed-off-by: Anton Vorontsov 
---
 fs/timerfd.c| 65 ++---
 include/linux/timerfd.h |  4 ++-
 2 files changed, 65 insertions(+), 4 deletions(-)

diff --git a/fs/timerfd.c b/fs/timerfd.c
index 222db32..d608a57 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -26,6 +27,8 @@
 
 struct timerfd_ctx {
struct hrtimer tmr;
+   struct timer_list dtmr;
+   bool deferrable;
ktime_t tintv;
ktime_t moffs;
wait_queue_head_t wqh;
@@ -63,6 +66,11 @@ static enum hrtimer_restart timerfd_tmrproc(struct hrtimer 
*htmr)
return HRTIMER_NORESTART;
 }
 
+static void timerfd_dtmrproc(unsigned long data)
+{
+   timerfd_expire((struct timerfd_ctx *)data);
+}
+
 /*
  * Called when the clock was set to cancel the timers in the cancel
  * list. This will wake up processes waiting on these timers. The
@@ -131,6 +139,30 @@ static ktime_t timerfd_get_remaining(struct timerfd_ctx 
*ctx)
return remaining.tv64 < 0 ? ktime_set(0, 0): remaining;
 }
 
+static bool timerfd_deferrable_valid(ktime_t intv)
+{
+   ktime_t max;
+
+   jiffies_to_ktime(MAX_JIFFY_OFFSET, &max);
+   if (intv.tv64 > max.tv64)
+   return 0;
+   return 1;
+}
+
+static int timerfd_setup_deferrable(struct timerfd_ctx *ctx)
+{
+   ktime_t rem = timerfd_get_remaining(ctx);
+
+   if (ctx->clockid != CLOCK_MONOTONIC)
+   return -EINVAL;
+   if (!timerfd_deferrable_valid(ctx->tintv) ||
+   !timerfd_deferrable_valid(rem))
+   return -E2BIG;
+
+   mod_timer(&ctx->dtmr, jiffies + ktime_to_jiffies(&rem) + 1);
+   return 0;
+}
+
 static int timerfd_setup(struct timerfd_ctx *ctx, int flags,
 const struct itimerspec *ktmr)
 {
@@ -148,8 +180,18 @@ static int timerfd_setup(struct timerfd_ctx *ctx, int 
flags,
hrtimer_init(&ctx->tmr, clockid, htmode);
hrtimer_set_expires(&ctx->tmr, texp);
ctx->tmr.function = timerfd_tmrproc;
+   ctx->dtmr.function = timerfd_dtmrproc;
+   ctx->dtmr.data = (unsigned long)ctx;
if (texp.tv64 != 0) {
-   hrtimer_start(&ctx->tmr, texp, htmode);
+   if (ctx->deferrable) {
+   int ret;
+
+   ret = timerfd_setup_deferrable(ctx);
+   if (ret)
+   return ret;
+   } else {
+   hrtimer_start(&ctx->tmr, texp, htmode);
+   }
if (timerfd_canceled(ctx))
return -ECANCELED;
}
@@ -162,6 +204,7 @@ static int timerfd_release(struct inode *inode, struct file 
*file)
 
timerfd_remove_cancel(ctx);
hrtimer_cancel(&ctx->tmr);
+   del_timer_sync(&ctx->dtmr);
kfree_rcu(ctx, rcu);
return 0;
 }
@@ -186,7 +229,12 @@ static u64 timerfd_rearm(struct timerfd_ctx *ctx)
 {
u64 orun = hrtimer_forward_now(&ctx->tmr, ctx->tintv) - 1;
 
-   hrtimer_restart(&ctx->tmr);
+   if (ctx->deferrable)
+   mod_timer(&ctx->dtmr, jiffies +
+ ktime_to_jiffies(&ctx->tintv) + 1);
+   else
+   hrtimer_restart(&ctx->tmr);
+
return orun;
 }
 
@@ -280,6 +328,7 @@ SYSCALL_DEFINE2(timerfd_create, int, clockid, int,

Re: [PATCH] [RFC]power: battery: Generic battery driver using IIO

2012-09-02 Thread Anton Vorontsov

On Sun, Sep 02, 2012 at 09:44:04PM +0530, Sannu K wrote: [...]
> Just Curious what is the use of this module? Is there any user space
> program uses this? When ACPI drivers for battery is available how
> useful will this be?

That's mostly for embedded devices. They often have batteries directly
connected to ADC (analog to digital converters) channels, so that's the
only way to read e.g. voltage.

As for userspace, since it's battery class driver, pretty much every
userspace stack can use it.

Thanks,
Anton.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v6 0/12] KGDB/KDB FIQ (NMI) debugger

2012-09-11 Thread Anton Vorontsov

Hi all,

Here is a new revision, mostly tty reworks. The new tty_port stuff is a
bliss: no more per-driver mutex, no more counting for open(),
well-separated initialization callbacks (I hope I got them right :-).

But since I now use a lot of new tty_port stuff, I had to rebase the
patch set on top of tty-next, so there's no point in cherry-picking
anymore.

So, in v6:

- Converted the NMI tty driver to use tty_port helpers, per Alan Cox's
  suggestions;

- In uart's poll_init callback fixed a race, spotted by Alan;

- Use test_bit instead of touching port->flags directly;

These patches can be found in the following repo (based on tty-next):

git://git.infradead.org/users/cbou/linux-nmi-kdb.git master

Old changelogs and rationale for these patches can be found here:

v1-v5: http://lkml.org/lkml/2012/9/10/2

Thanks,

--
 arch/arm/Kconfig|  19 ++
 arch/arm/common/vic.c   |  28 +++
 arch/arm/include/asm/hardware/vic.h |   2 +
 arch/arm/include/asm/kgdb.h |   8 +
 arch/arm/kernel/Makefile|   1 +
 arch/arm/kernel/entry-armv.S| 167 +
 arch/arm/kernel/entry-header.S  | 170 +
 arch/arm/kernel/kgdb_fiq.c  |  99 
 arch/arm/kernel/kgdb_fiq_entry.S|  87 +++
 arch/arm/mach-versatile/Makefile|   1 +
 arch/arm/mach-versatile/kgdb_fiq.c  |  31 +++
 drivers/tty/serial/Kconfig  |  19 ++
 drivers/tty/serial/Makefile |   1 +
 drivers/tty/serial/amba-pl011.c |  66 -
 drivers/tty/serial/kgdb_nmi.c   | 391 ++
 drivers/tty/serial/kgdboc.c |  16 ++
 drivers/tty/serial/serial_core.c|  32 +++
 include/linux/kdb.h |  29 ++-
 include/linux/kgdb.h|  34 +++
 include/linux/serial_core.h |   2 +
 include/linux/tty_driver.h  |   1 +
 kernel/debug/debug_core.c   |  36 ++-
 kernel/debug/kdb/kdb_main.c |  29 +++
 23 files changed, 1076 insertions(+), 193 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 01/12] kernel/debug: Mask KGDB NMI upon entry

2012-09-11 Thread Anton Vorontsov

The new arch callback should manage NMIs that usually cause KGDB to
enter. That is, not all NMIs should be enabled/disabled, but only
those that issue kgdb_handle_exception().

We must mask it as serial-line interrupt can be used as an NMI, so
if the original KGDB-entry cause was say a breakpoint, then every
input to KDB console will cause KGDB to reenter, which we don't want.

Signed-off-by: Anton Vorontsov 
---
 include/linux/kgdb.h  | 23 +++
 kernel/debug/debug_core.c | 36 +---
 2 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h
index c4d2fc1..3b111a6 100644
--- a/include/linux/kgdb.h
+++ b/include/linux/kgdb.h
@@ -221,6 +221,29 @@ extern int kgdb_arch_remove_breakpoint(struct kgdb_bkpt 
*bpt);
  */
 extern void kgdb_arch_late(void);
 
+/**
+ * kgdb_arch_enable_nmi - Enable or disable KGDB-entry NMI
+ * @on: Flag to either enable or disable an NMI
+ *
+ * This is an architecture-specific "back-end" for kgdb_enable_nmi(). The
+ * call does not count disable/enable requests, do not use it directly.
+ */
+extern void kgdb_arch_enable_nmi(bool on);
+
+/**
+ * kgdb_enable_nmi - Enable or disable KGDB-entry NMI
+ * @on: Flag to either enable or disable an NMI
+ *
+ * This function manages NMIs that usually cause KGDB to enter. That is,
+ * not all NMIs should be enabled or disabled, but only those that issue
+ * kgdb_handle_exception().
+ *
+ * The call counts disable requests, and thus allows to nest disables.
+ * But trying to enable already enabled NMI is an error. The call returns
+ * 1 if NMI has been actually enabled after the call, and a value <= 0 if
+ * it is still disabled.
+ */
+extern int kgdb_enable_nmi(bool on);
 
 /**
  * struct kgdb_arch - Describe architecture specific values.
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 0557f24..b621d1e 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -214,6 +214,30 @@ int __weak kgdb_skipexception(int exception, struct 
pt_regs *regs)
return 0;
 }
 
+void __weak kgdb_arch_enable_nmi(bool on)
+{
+}
+
+int kgdb_enable_nmi(bool on)
+{
+   static atomic_t cnt;
+   int ret;
+
+   ret = atomic_add_return(on ? 1 : -1, &cnt);
+   if (ret > 1 && on) {
+   /*
+* There should be only one instance that calls this function
+* in "enable, disable" order. All other users must call
+* disable first, then enable. If not, something is wrong.
+*/
+   WARN_ON(1);
+   return 1;
+   }
+
+   kgdb_arch_enable_nmi(ret > 0);
+   return ret;
+}
+
 /*
  * Some architectures need cache flushes when we set/clear a
  * breakpoint:
@@ -672,6 +696,9 @@ kgdb_handle_exception(int evector, int signo, int ecode, 
struct pt_regs *regs)
 {
struct kgdb_state kgdb_var;
struct kgdb_state *ks = &kgdb_var;
+   int ret = 0;
+
+   kgdb_enable_nmi(0);
 
ks->cpu = raw_smp_processor_id();
ks->ex_vector   = evector;
@@ -681,11 +708,14 @@ kgdb_handle_exception(int evector, int signo, int ecode, 
struct pt_regs *regs)
ks->linux_regs  = regs;
 
if (kgdb_reenter_check(ks))
-   return 0; /* Ouch, double exception ! */
+   goto out; /* Ouch, double exception ! */
if (kgdb_info[ks->cpu].enter_kgdb != 0)
-   return 0;
+   goto out;
 
-   return kgdb_cpu_enter(ks, regs, DCPU_WANT_MASTER);
+   ret = kgdb_cpu_enter(ks, regs, DCPU_WANT_MASTER);
+out:
+   kgdb_enable_nmi(1);
+   return ret;
 }
 
 int kgdb_nmicallback(int cpu, void *regs)
-- 
1.7.11.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 04/12] tty/serial/core: Introduce poll_init callback

2012-09-11 Thread Anton Vorontsov

It was noticed that polling drivers (like KGDB) are not able to use
serial ports if the ports were not previously initialized via console.
I.e.  when booting with console=ttyAMA0 kgdboc=ttyAMA0, everything works
fine, but with console=ttyFOO kgdboc=ttyAMA0, the kgdboc doesn't work.

This is because we don't initialize the hardware. Calling ->startup() is
not an option, because drivers request interrupts there, and drivers
fail to handle situations when tty isn't opened with interrupts enabled.

So, we have to implement a new callback (actually, tty_ops already have
a similar callback), which does everything needed to initialize just the
hardware.

Signed-off-by: Anton Vorontsov 
---
 drivers/tty/serial/serial_core.c | 17 +
 include/linux/serial_core.h  |  1 +
 2 files changed, 18 insertions(+)

diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index 046279c..dcb2d5a 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -2129,6 +2129,7 @@ static int uart_poll_init(struct tty_driver *driver, int 
line, char *options)
int bits = 8;
int parity = 'n';
int flow = 'n';
+   int ret;
 
if (!state || !state->uart_port)
return -1;
@@ -2137,6 +2138,22 @@ static int uart_poll_init(struct tty_driver *driver, int 
line, char *options)
if (!(port->ops->poll_get_char && port->ops->poll_put_char))
return -1;
 
+   if (port->ops->poll_init) {
+   struct tty_port *tport = &state->port;
+
+   ret = 0;
+   mutex_lock(&tport->mutex);
+   /*
+* We don't set ASYNCB_INITIALIZED as we only initialized the
+* hw, e.g. state->xmit is still uninitialized.
+*/
+   if (!test_bit(ASYNCB_INITIALIZED, &tport->flags))
+   ret = port->ops->poll_init(port);
+   mutex_unlock(&tport->mutex);
+   if (ret)
+   return ret;
+   }
+
if (options) {
uart_parse_options(options, &baud, &parity, &bits, &flow);
return uart_set_options(port, NULL, baud, parity, bits, flow);
diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
index 7cf0b68..822c887 100644
--- a/include/linux/serial_core.h
+++ b/include/linux/serial_core.h
@@ -274,6 +274,7 @@ struct uart_ops {
int (*verify_port)(struct uart_port *, struct serial_struct 
*);
int (*ioctl)(struct uart_port *, unsigned int, unsigned 
long);
 #ifdef CONFIG_CONSOLE_POLL
+   int (*poll_init)(struct uart_port *);
void(*poll_put_char)(struct uart_port *, unsigned char);
int (*poll_get_char)(struct uart_port *);
 #endif
-- 
1.7.11.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 07/12] tty/serial/amba-pl011: Implement clear_irqs callback

2012-09-11 Thread Anton Vorontsov

It's all pretty straightforward, except for TXIM interrupt. The interrupt
has meaning "ready to transmit", so it's almost always raised, and the
only way to silence it is to mask it. But that's OK, ops->start_tx will
unmask it.

Signed-off-by: Anton Vorontsov 
---
 drivers/tty/serial/amba-pl011.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c
index 45137e4..ec15312 100644
--- a/drivers/tty/serial/amba-pl011.c
+++ b/drivers/tty/serial/amba-pl011.c
@@ -1307,6 +1307,27 @@ static void pl011_put_poll_char(struct uart_port *port,
writew(ch, uap->port.membase + UART01x_DR);
 }
 
+static void pl011_clear_irqs(struct uart_port *port)
+{
+   struct uart_amba_port *uap = (struct uart_amba_port *)port;
+   unsigned char __iomem *regs = uap->port.membase;
+
+   writew(readw(regs + UART011_MIS), regs + UART011_ICR);
+   /*
+* There is no way to clear TXIM as this is "ready to transmit IRQ", so
+* we simply mask it. start_tx() will unmask it.
+*
+* Note we can race with start_tx(), and if the race happens, the
+* clear_irq() caller might get another interrupt just after we clear
+* it.  But it should be OK and can happen even w/o the race, e.g.
+* controller immediately got some new data and raised the IRQ.
+*
+* And whoever calls clear_irqs() assumes that the caller manages the
+* device (including tx queue), so we're also fine with start_tx()'s
+* caller side.
+*/
+   writew(readw(regs + UART011_IMSC) & ~UART011_TXIM, regs + UART011_IMSC);
+}
 #endif /* CONFIG_CONSOLE_POLL */
 
 static int pl011_hwinit(struct uart_port *port)
@@ -1712,6 +1733,7 @@ static struct uart_ops amba_pl011_pops = {
.poll_init = pl011_hwinit,
.poll_get_char = pl011_get_poll_char,
.poll_put_char = pl011_put_poll_char,
+   .clear_irqs= pl011_clear_irqs,
 #endif
 };
 
-- 
1.7.11.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 10/12] ARM: Add KGDB/KDB FIQ debugger generic code

2012-09-11 Thread Anton Vorontsov

The FIQ debugger may be used to debug situations when the kernel stuck
in uninterruptable sections, e.g. the kernel infinitely loops or
deadlocked in an interrupt or with interrupts disabled.

By default KGDB FIQ is disabled in runtime, but can be enabled with
kgdb_fiq.enable=1 kernel command line option.

Signed-off-by: Anton Vorontsov 
---
 arch/arm/Kconfig | 18 
 arch/arm/include/asm/kgdb.h  |  8 
 arch/arm/kernel/Makefile |  1 +
 arch/arm/kernel/kgdb_fiq.c   | 99 
 arch/arm/kernel/kgdb_fiq_entry.S | 87 +++
 5 files changed, 213 insertions(+)
 create mode 100644 arch/arm/kernel/kgdb_fiq.c
 create mode 100644 arch/arm/kernel/kgdb_fiq_entry.S

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 6d6e18f..c978c74 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -171,6 +171,24 @@ config GENERIC_ISA_DMA
 config FIQ
bool
 
+config ARCH_MIGHT_HAVE_KGDB_FIQ
+   bool
+
+config KGDB_FIQ
+   bool "KGDB/KDB FIQ debugger"
+   depends on KGDB_KDB && ARCH_MIGHT_HAVE_KGDB_FIQ && !THUMB2_KERNEL
+   select FIQ
+   help
+ The FIQ debugger may be used to debug situations when the
+ kernel stuck in uninterruptable sections, e.g. the kernel
+ infinitely loops or deadlocked in an interrupt or with
+ interrupts disabled.
+
+ By default KGDB FIQ is disabled in runtime, but can be
+ enabled with kgdb_fiq.enable=1 kernel command line option.
+
+ If unsure, say N.
+
 config NEED_RET_TO_USER
bool
 
diff --git a/arch/arm/include/asm/kgdb.h b/arch/arm/include/asm/kgdb.h
index 48066ce..807e547 100644
--- a/arch/arm/include/asm/kgdb.h
+++ b/arch/arm/include/asm/kgdb.h
@@ -11,6 +11,8 @@
 #define __ARM_KGDB_H__
 
 #include 
+#include 
+#include 
 
 /*
  * GDB assumes that we're a user process being debugged, so
@@ -47,6 +49,12 @@ static inline void arch_kgdb_breakpoint(void)
 extern void kgdb_handle_bus_error(void);
 extern int kgdb_fault_expected;
 
+extern char kgdb_fiq_handler;
+extern char kgdb_fiq_handler_end;
+asmlinkage void __exception_irq_entry kgdb_fiq_do_handle(struct pt_regs *regs);
+extern int __init kgdb_register_fiq(unsigned int mach_kgdb_fiq,
+void (*mach_kgdb_enable_fiq)(unsigned int irq, bool on),
+bool (*mach_is_kgdb_fiq)(unsigned int irq));
 #endif /* !__ASSEMBLY__ */
 
 /*
diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
index 7ad2d5c..5aa079b 100644
--- a/arch/arm/kernel/Makefile
+++ b/arch/arm/kernel/Makefile
@@ -56,6 +56,7 @@ obj-$(CONFIG_ATAGS_PROC)  += atags.o
 obj-$(CONFIG_OABI_COMPAT)  += sys_oabi-compat.o
 obj-$(CONFIG_ARM_THUMBEE)  += thumbee.o
 obj-$(CONFIG_KGDB) += kgdb.o
+obj-$(CONFIG_KGDB_FIQ) += kgdb_fiq_entry.o kgdb_fiq.o
 obj-$(CONFIG_ARM_UNWIND)   += unwind.o
 obj-$(CONFIG_HAVE_TCM) += tcm.o
 obj-$(CONFIG_OF)   += devtree.o
diff --git a/arch/arm/kernel/kgdb_fiq.c b/arch/arm/kernel/kgdb_fiq.c
new file mode 100644
index 000..8443af1
--- /dev/null
+++ b/arch/arm/kernel/kgdb_fiq.c
@@ -0,0 +1,99 @@
+/*
+ * KGDB FIQ
+ *
+ * Copyright 2010 Google, Inc.
+ *   Arve Hjønnevåg 
+ *   Colin Cross 
+ * Copyright 2012 Linaro Ltd.
+ *   Anton Vorontsov 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static int kgdb_fiq_enabled;
+module_param_named(enable, kgdb_fiq_enabled, int, 0600);
+MODULE_PARM_DESC(enable, "set to 1 to enable FIQ KGDB");
+
+static unsigned int kgdb_fiq;
+static bool (*is_kgdb_fiq)(unsigned int irq);
+
+asmlinkage void __exception_irq_entry kgdb_fiq_do_handle(struct pt_regs *regs)
+{
+   if (!is_kgdb_fiq(kgdb_fiq))
+   return;
+   if (!kgdb_nmi_poll_knock())
+   return;
+
+   nmi_enter();
+   kgdb_handle_exception(1, 0, 0, regs);
+   nmi_exit();
+}
+
+static struct fiq_handler kgdb_fiq_desc = {
+   .name = "kgdb",
+};
+
+static long kgdb_fiq_setup_stack(void *info)
+{
+   struct pt_regs regs;
+
+   regs.ARM_sp = __get_free_pages(GFP_KERNEL, THREAD_SIZE_ORDER) +
+   THREAD_START_SP;
+   WARN_ON(!regs.ARM_sp);
+
+   set_fiq_regs(®s);
+   return 0;
+}
+
+static void (*kgdb_enable_fiq)(unsigned int irq, bool on);
+
+void kgdb_arch_enable_nmi(bool on)
+{
+   if (!kgdb_enable_fiq)
+   return;
+   kgdb_enable_fiq(kgdb_fiq, on);
+}
+
+int __init kgdb_register_fiq(unsigned int mach_kgdb_fiq,
+   void (*mach_kgdb_enable_fiq)(unsigned int irq, bool on),
+   bool (*mach_is_kgdb_fiq)(unsigned int irq)

[PATCH 12/12] ARM: versatile: Make able to use UART ports for KGDB FIQ debugger

2012-09-11 Thread Anton Vorontsov

If enabled, kernel will able to enter KGDB upon serial line activity on
UART ports.

Note that even with this patch and CONFIG_KGDB_FIQ is enabled, you still
need to pass kgdb_fiq.enable=1 kernel command line option, otherwise UART
will behave in a normal way.

By default UART0 is used, but this can be changed via kgdb_fiq.uart_num
kernel command line option.

Signed-off-by: Anton Vorontsov 
---
 arch/arm/Kconfig   |  1 +
 arch/arm/mach-versatile/Makefile   |  1 +
 arch/arm/mach-versatile/kgdb_fiq.c | 31 +++
 3 files changed, 33 insertions(+)
 create mode 100644 arch/arm/mach-versatile/kgdb_fiq.c

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index c978c74..1a9881a 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -336,6 +336,7 @@ config ARCH_VERSATILE
select PLAT_VERSATILE_CLCD
select PLAT_VERSATILE_FPGA_IRQ
select ARM_TIMER_SP804
+   select ARCH_MIGHT_HAVE_KGDB_FIQ
help
  This enables support for ARM Ltd Versatile board.
 
diff --git a/arch/arm/mach-versatile/Makefile b/arch/arm/mach-versatile/Makefile
index 81fa3fe..bfd761f 100644
--- a/arch/arm/mach-versatile/Makefile
+++ b/arch/arm/mach-versatile/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_ARCH_VERSATILE_PB) += versatile_pb.o
 obj-$(CONFIG_MACH_VERSATILE_AB)+= versatile_ab.o
 obj-$(CONFIG_MACH_VERSATILE_DT)+= versatile_dt.o
 obj-$(CONFIG_PCI)  += pci.o
+obj-$(CONFIG_KGDB_FIQ) += kgdb_fiq.o
diff --git a/arch/arm/mach-versatile/kgdb_fiq.c 
b/arch/arm/mach-versatile/kgdb_fiq.c
new file mode 100644
index 000..3cdf71d
--- /dev/null
+++ b/arch/arm/mach-versatile/kgdb_fiq.c
@@ -0,0 +1,31 @@
+/*
+ * KGDB FIQ board support
+ *
+ * Copyright 2012 Linaro Ltd.
+ *   Anton Vorontsov 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static int kgdb_fiq;
+module_param_named(uart_num, kgdb_fiq, int, 0600);
+MODULE_PARM_DESC(uart_num, "UART port to use for KGDB FIQ");
+
+static int __init kgdb_fiq_init(void)
+{
+   WARN_ON(kgdb_fiq > INT_UARTINT2 - INT_UARTINT0);
+
+   return kgdb_register_fiq(INT_UARTINT0 + kgdb_fiq,
+vic_fiq_select,
+vic_is_fiq_rised);
+}
+console_initcall(kgdb_fiq_init);
-- 
1.7.11.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 11/12] ARM: VIC: Add a couple of low-level FIQ management helpers

2012-09-11 Thread Anton Vorontsov

Just a couple of calls to manage VIC FIQ routing. We'll use them for
KGDB FIQ support on ARM Versatile machines.

Signed-off-by: Anton Vorontsov 
---
 arch/arm/common/vic.c   | 28 
 arch/arm/include/asm/hardware/vic.h |  2 ++
 2 files changed, 30 insertions(+)

diff --git a/arch/arm/common/vic.c b/arch/arm/common/vic.c
index e0d5388..df2fc82 100644
--- a/arch/arm/common/vic.c
+++ b/arch/arm/common/vic.c
@@ -66,6 +66,34 @@ static struct vic_device vic_devices[CONFIG_ARM_VIC_NR];
 
 static int vic_id;
 
+static void __iomem *vic_base(struct irq_data *d)
+{
+   return (void __iomem *)irq_data_get_irq_chip_data(d);
+}
+
+void vic_fiq_select(unsigned int irq, bool on)
+{
+   void __iomem *base = vic_base(&irq_to_desc(irq)->irq_data);
+   void __iomem *sel = base + VIC_INT_SELECT;
+   u32 msk = 1 << irq;
+   u32 val;
+
+   pr_debug("rerouting VIC vector %d to %s\n", irq, on ? "FIQ" : "IRQ");
+
+   val = readl(sel);
+   val &= ~msk;
+   if (on)
+   val |= msk;
+   writel(val, sel);
+}
+
+bool vic_is_fiq_rised(unsigned int irq)
+{
+   void __iomem *base = vic_base(&irq_to_desc(irq)->irq_data);
+
+   return readl(base + VIC_FIQ_STATUS) & (1 << irq);
+}
+
 /**
  * vic_init2 - common initialisation code
  * @base: Base of the VIC.
diff --git a/arch/arm/include/asm/hardware/vic.h 
b/arch/arm/include/asm/hardware/vic.h
index e14af1a..2728975 100644
--- a/arch/arm/include/asm/hardware/vic.h
+++ b/arch/arm/include/asm/hardware/vic.h
@@ -52,6 +52,8 @@ void __vic_init(void __iomem *base, unsigned int irq_start, 
u32 vic_sources,
 void vic_init(void __iomem *base, unsigned int irq_start, u32 vic_sources, u32 
resume_sources);
 int vic_of_init(struct device_node *node, struct device_node *parent);
 void vic_handle_irq(struct pt_regs *regs);
+void vic_fiq_select(unsigned int irq, bool on);
+bool vic_is_fiq_rised(unsigned int irq);
 
 #endif /* __ASSEMBLY__ */
 #endif
-- 
1.7.11.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 09/12] ARM: Move some macros from entry-armv to entry-header

2012-09-11 Thread Anton Vorontsov

Just move the macros into header file as we would want to use them for
KGDB FIQ entry code.

The following macros were moved:

 - svc_entry
 - usr_entry
 - kuser_cmpxchg_check
 - vector_stub

To make kuser_cmpxchg_check actually work across different files, we
also have to make kuser_cmpxchg64_fixup global.

Signed-off-by: Anton Vorontsov 
---
 arch/arm/kernel/entry-armv.S   | 167 +---
 arch/arm/kernel/entry-header.S | 170 +
 2 files changed, 171 insertions(+), 166 deletions(-)

diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index 0f82098..0f15368 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -136,57 +136,6 @@ common_invalid:
b   bad_mode
 ENDPROC(__und_invalid)
 
-/*
- * SVC mode handlers
- */
-
-#if defined(CONFIG_AEABI) && (__LINUX_ARM_ARCH__ >= 5)
-#define SPFIX(code...) code
-#else
-#define SPFIX(code...)
-#endif
-
-   .macro  svc_entry, stack_hole=0
- UNWIND(.fnstart   )
- UNWIND(.save {r0 - pc})
-   sub sp, sp, #(S_FRAME_SIZE + \stack_hole - 4)
-#ifdef CONFIG_THUMB2_KERNEL
- SPFIX(str r0, [sp])   @ temporarily saved
- SPFIX(mov r0, sp  )
- SPFIX(tst r0, #4  )   @ test original stack alignment
- SPFIX(ldr r0, [sp])   @ restored
-#else
- SPFIX(tst sp, #4  )
-#endif
- SPFIX(subeq   sp, sp, #4  )
-   stmia   sp, {r1 - r12}
-
-   ldmia   r0, {r3 - r5}
-   add r7, sp, #S_SP - 4   @ here for interlock avoidance
-   mov r6, #-1 @  ""  ""  ""   ""
-   add r2, sp, #(S_FRAME_SIZE + \stack_hole - 4)
- SPFIX(addeq   r2, r2, #4  )
-   str r3, [sp, #-4]!  @ save the "real" r0 copied
-   @ from the exception stack
-
-   mov r3, lr
-
-   @
-   @ We are now ready to fill in the remaining blanks on the stack:
-   @
-   @  r2 - sp_svc
-   @  r3 - lr_svc
-   @  r4 - lr_, already fixed up for correct return/restart
-   @  r5 - spsr_
-   @  r6 - orig_r0 (see pt_regs definition in ptrace.h)
-   @
-   stmia   r7, {r2 - r6}
-
-#ifdef CONFIG_TRACE_IRQFLAGS
-   bl  trace_hardirqs_off
-#endif
-   .endm
-
.align  5
 __dabt_svc:
svc_entry
@@ -348,71 +297,8 @@ ENDPROC(__pabt_svc)
 
 /*
  * User mode handlers
- *
- * EABI note: sp_svc is always 64-bit aligned here, so should S_FRAME_SIZE
  */
 
-#if defined(CONFIG_AEABI) && (__LINUX_ARM_ARCH__ >= 5) && (S_FRAME_SIZE & 7)
-#error "sizeof(struct pt_regs) must be a multiple of 8"
-#endif
-
-   .macro  usr_entry
- UNWIND(.fnstart   )
- UNWIND(.cantunwind)   @ don't unwind the user space
-   sub sp, sp, #S_FRAME_SIZE
- ARM(  stmib   sp, {r1 - r12}  )
- THUMB(stmia   sp, {r0 - r12}  )
-
-   ldmia   r0, {r3 - r5}
-   add r0, sp, #S_PC   @ here for interlock avoidance
-   mov r6, #-1 @  ""  "" """"
-
-   str r3, [sp]@ save the "real" r0 copied
-   @ from the exception stack
-
-   @
-   @ We are now ready to fill in the remaining blanks on the stack:
-   @
-   @  r4 - lr_, already fixed up for correct return/restart
-   @  r5 - spsr_
-   @  r6 - orig_r0 (see pt_regs definition in ptrace.h)
-   @
-   @ Also, separately save sp_usr and lr_usr
-   @
-   stmia   r0, {r4 - r6}
- ARM(  stmdb   r0, {sp, lr}^   )
- THUMB(store_user_sp_lr r0, r1, S_SP - S_PC)
-
-   @
-   @ Enable the alignment trap while in kernel mode
-   @
-   alignment_trap r0
-
-   @
-   @ Clear FP to mark the first stack frame
-   @
-   zero_fp
-
-#ifdef CONFIG_IRQSOFF_TRACER
-   bl  trace_hardirqs_off
-#endif
-   .endm
-
-   .macro  kuser_cmpxchg_check
-#if !defined(CONFIG_CPU_32v6K) && !defined(CONFIG_NEEDS_SYSCALL_FOR_CMPXCHG)
-#ifndef CONFIG_MMU
-#warning "NPTL on non MMU needs fixing"
-#else
-   @ Make sure our user space atomic helper is restarted
-   @ if it was interrupted in a critical region.  Here we
-   @ perform a quick test inline since it should be false
-   @ 99.% of the time.  The rest is done out of line.
-   cmp r4, #TASK_SIZE
-   blhskuser_cmpxchg64_fixup
-#endif
-#endif
-   .endm
-
.align  5
 __dabt_usr:
usr_entry
@@ -846,6 +732,7 @@ __kuser_cmpxchg64:  @ 0x0f60
ldmfd   sp!, {r4, r5, r6, pc}
 
.text
+   .global kuser_cmpxchg64_fixup
 kuser_cmpxchg64_fixup:

[PATCH 08/12] tty/serial: Add kgdb_nmi driver

2012-09-11 Thread Anton Vorontsov

This special driver makes it possible to temporary use NMI debugger port
as a normal console by issuing 'nmi_console' command (assuming that the
port is attached to KGDB).

Unlike KDB's disable_nmi command, with this driver you are always able
to go back to the debugger using KGDB escape sequence ($3#33).  This is
because this console driver processes the input in NMI context, and thus
is able to intercept the magic sequence.

Note that since the console interprets input and uses polling
communication methods, for things like PPP it is still better to fully
detach debugger port from the KGDB NMI (i.e. disable_nmi), and use raw
console.

Usually, to enter the debugger one have to type the magic sequence, so
initially the kernel will print the following prompt on the NMI debugger
console:

Type $3#33 to enter the debugger>

For convenience, there is a kgdb_fiq.knock kernel command line option,
when set to 0, this turns the special command to just a return key
press, so the kernel will be printing this:

Hit  to enter the debugger>

This is more convenient for long debugging sessions, although it makes
nmi_console feature somewhat useless.

And for the cases when NMI connected to a dedicated button, the knocking
can be disabled altogether by setting kgdb_fiq.knock to -1.

Suggested-by: Colin Cross 
Signed-off-by: Anton Vorontsov 
---
 drivers/tty/serial/Kconfig|  19 ++
 drivers/tty/serial/Makefile   |   1 +
 drivers/tty/serial/kgdb_nmi.c | 391 ++
 drivers/tty/serial/kgdboc.c   |   6 +
 include/linux/kgdb.h  |  10 ++
 5 files changed, 427 insertions(+)
 create mode 100644 drivers/tty/serial/kgdb_nmi.c

diff --git a/drivers/tty/serial/Kconfig b/drivers/tty/serial/Kconfig
index 26907cf..b22e45b 100644
--- a/drivers/tty/serial/Kconfig
+++ b/drivers/tty/serial/Kconfig
@@ -141,6 +141,25 @@ config SERIAL_ATMEL_TTYAT
 
  Say Y if you have an external 8250/16C550 UART.  If unsure, say N.
 
+config SERIAL_KGDB_NMI
+   bool "Serial console over KGDB NMI debugger port"
+   depends on KGDB_SERIAL_CONSOLE
+   help
+ This special driver allows you to temporary use NMI debugger port
+ as a normal console (assuming that the port is attached to KGDB).
+
+ Unlike KDB's disable_nmi command, with this driver you are always
+ able to go back to the debugger using KGDB escape sequence ($3#33).
+ This is because this console driver processes the input in NMI
+ context, and thus is able to intercept the magic sequence.
+
+ Note that since the console interprets input and uses polling
+ communication methods, for things like PPP you still must fully
+ detach debugger port from the KGDB NMI (i.e. disable_nmi), and
+ use raw console.
+
+ If unsure, say N.
+
 config SERIAL_KS8695
bool "Micrel KS8695 (Centaur) serial port support"
depends on ARCH_KS8695
diff --git a/drivers/tty/serial/Makefile b/drivers/tty/serial/Makefile
index ce88667..4f694da 100644
--- a/drivers/tty/serial/Makefile
+++ b/drivers/tty/serial/Makefile
@@ -61,6 +61,7 @@ obj-$(CONFIG_SERIAL_MSM_HS) += msm_serial_hs.o
 obj-$(CONFIG_SERIAL_NETX) += netx-serial.o
 obj-$(CONFIG_SERIAL_OF_PLATFORM) += of_serial.o
 obj-$(CONFIG_SERIAL_OF_PLATFORM_NWPSERIAL) += nwpserial.o
+obj-$(CONFIG_SERIAL_KGDB_NMI) += kgdb_nmi.o
 obj-$(CONFIG_SERIAL_KS8695) += serial_ks8695.o
 obj-$(CONFIG_SERIAL_OMAP) += omap-serial.o
 obj-$(CONFIG_SERIAL_ALTERA_UART) += altera_uart.o
diff --git a/drivers/tty/serial/kgdb_nmi.c b/drivers/tty/serial/kgdb_nmi.c
new file mode 100644
index 000..fdb7d26
--- /dev/null
+++ b/drivers/tty/serial/kgdb_nmi.c
@@ -0,0 +1,391 @@
+/*
+ * KGDB NMI serial console
+ *
+ * Copyright 2010 Google, Inc.
+ *   Arve Hjønnevåg 
+ *   Colin Cross 
+ * Copyright 2012 Linaro Ltd.
+ *   Anton Vorontsov 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static int kgdb_nmi_knock = 1;
+module_param_named(knock, kgdb_nmi_knock, int, 0600);
+MODULE_PARM_DESC(knock, "if set to 1 (default), the special '$3#33' command "
+   "must be used to enter the debugger; when set to 0, "
+   "hitting return key is enough to enter the debugger; "
+   "when set to -1, the debugger is entered immediately "
+   "upon NMI");
+
+static char *kgdb_nmi_magic = "$3#33";
+module_param_named(magic, kgdb_nmi_magic, charp, 0600);
+MODULE_PARM_DESC(

[PATCH 06/12] tty/serial/kgdboc: Add and wire up clear_irqs callback

2012-09-11 Thread Anton Vorontsov

This patch implements a new callback: clear_irqs. It is used for the
cases when KDB-entry (e.g. NMI) and KDB IO (e.g. serial port) shares the
same interrupt. To get the idea, let's take some real example (ARM
machine): we have a serial port which interrupt is routed to an NMI, and
the interrupt is used to enter KDB. Once there is some activity on the
serial port, the CPU receives NMI exception, and we fall into KDB shell.
So, it is our "debug console", and it is able to interrupt (and thus
debug) even IRQ handlers themselves.

When used that way, the interrupt never reaches serial driver's IRQ
handler routine, which means that serial driver will not silence the
interrupt. NMIs behaviour are quite arch-specific, and we can't assume
that we can use them as ordinary IRQs, e.g. on some arches (like ARM) we
can't handle data aborts, the behaviour is undefined then. So we can't
just handle execution to serial driver's IRQ handler from the NMI
context once we're done with KDB (plus this would defeat the debugger's
purpose: we want the NMI handler be as simple as possible, so it will
have less chances to hang).

So, given that have to deal with it somehow, we have two options:

1. Implement something that clears the interrupt; 2. Implement a whole
new concept of grabbing tty for exclusive KDB use, plus implement
mask/unmask callbacks, i.e.:
   - Since consoles might use ttys w/o opending them, we would have to
 make kdb respect CON_ENABLED flag (maybe a good idea to do it
 anyway);
   - Add 'bool exclusive' argument to tty_find_polling_driver(), if set
 to 1, the function will refuse to return an already opened tty; and
 will use the flag in tty_reopen() to not allow multiple users
 (there are already checks for pty masters, which are "open once"
 ttys);
   - Once we got the tty exclusively, we would need to call some new
 uart->mask_all_but_rx_interrupts call before we want to use the
 port for NMI/KDB, and unmask_all_but_rx_interrupts after we're done
 with it.

The second option is obviously more complex, needlessly so, and less
generic. So I went with the first one: we just consume all the
interrupts.  The tty becomes silently unusable for the rest of the world
when we use it with KDB; but once we reroute the serial IRQ source back
from NMI to an ordinary IRQ (in KDB this can be done with 'disable_nmi'
command), it will behave as normal.

p.s. Since the callback is so far used only by polling users, we place
it under the appropriate #ifdef.

Signed-off-by: Anton Vorontsov 
---
 drivers/tty/serial/kgdboc.c  | 10 ++
 drivers/tty/serial/serial_core.c | 15 +++
 include/linux/kgdb.h |  1 +
 include/linux/serial_core.h  |  1 +
 include/linux/tty_driver.h   |  1 +
 5 files changed, 28 insertions(+)

diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
index 2b42a01..0aa08c8 100644
--- a/drivers/tty/serial/kgdboc.c
+++ b/drivers/tty/serial/kgdboc.c
@@ -227,6 +227,15 @@ static int kgdboc_get_char(void)
kgdb_tty_line);
 }
 
+static void kgdboc_clear_irqs(void)
+{
+   if (!kgdb_tty_driver)
+   return;
+   if (kgdb_tty_driver->ops->clear_irqs)
+   kgdb_tty_driver->ops->clear_irqs(kgdb_tty_driver,
+kgdb_tty_line);
+}
+
 static void kgdboc_put_char(u8 chr)
 {
if (!kgdb_tty_driver)
@@ -298,6 +307,7 @@ static struct kgdb_io kgdboc_io_ops = {
.name   = "kgdboc",
.read_char  = kgdboc_get_char,
.write_char = kgdboc_put_char,
+   .clear_irqs = kgdboc_clear_irqs,
.pre_exception  = kgdboc_pre_exp_handler,
.post_exception = kgdboc_post_exp_handler,
 };
diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index dcb2d5a..93c36cb 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -2187,6 +2187,20 @@ static void uart_poll_put_char(struct tty_driver 
*driver, int line, char ch)
port = state->uart_port;
port->ops->poll_put_char(port, ch);
 }
+
+static void uart_clear_irqs(struct tty_driver *driver, int line)
+{
+   struct uart_driver *drv = driver->driver_state;
+   struct uart_state *state = drv->state + line;
+   struct uart_port *port;
+
+   if (!state || !state->uart_port)
+   return;
+
+   port = state->uart_port;
+   if (port->ops->clear_irqs)
+   port->ops->clear_irqs(port);
+}
 #endif
 
 static const struct tty_operations uart_ops = {
@@ -2219,6 +2233,7 @@ static const struct tty_operations uart_ops = {
.poll_init  = uart_poll_init,
.poll_get_char  = uart_poll_get_char,
.poll_put_char  = uart

[PATCH 05/12] tty/serial/amba-pl011: Implement poll_init callback

2012-09-11 Thread Anton Vorontsov

The callback is used to initialize the hardware, nothing else should be
done, i.e. we should not request interrupts (but we can and do unmask
some of them, as they might be useful for NMI entry).

As a side-effect, the patch also fixes a division by zero[1] when booting
with kgdboc options specified (e.g. kgdboc=ttyAMA0,115200n8). The issue
happens because serial core calls set_termios callback, but the driver
doesn't know clock frequency, and thus cannot calculate proper baud rate
values.

[1]
WARNING: at drivers/tty/serial/serial_core.c:400 uart_get_baud_rate+0xe8/0x14c()
Modules linked in:
[] (unwind_backtrace+0x0/0xf0) from [] 
(warn_slowpath_common+0x4c/0x64)
[] (warn_slowpath_common+0x4c/0x64) from [] 
(warn_slowpath_null+0x1c/0x24)
[] (warn_slowpath_null+0x1c/0x24) from [] 
(uart_get_baud_rate+0xe8/0x14c)
[] (uart_get_baud_rate+0xe8/0x14c) from [] 
(pl011_set_termios+0x48/0x278)
[] (pl011_set_termios+0x48/0x278) from [] 
(uart_set_options+0xe8/0x114)
[] (uart_set_options+0xe8/0x114) from [] 
(uart_poll_init+0xd4/0xe0)
[] (uart_poll_init+0xd4/0xe0) from [] 
(tty_find_polling_driver+0x100/0x17c)
[] (tty_find_polling_driver+0x100/0x17c) from [] 
(configure_kgdboc+0xc8/0x1b8)
[] (configure_kgdboc+0xc8/0x1b8) from [] 
(do_one_initcall+0x30/0x168)
[] (do_one_initcall+0x30/0x168) from [] 
(do_basic_setup+0x94/0xc8)
[] (do_basic_setup+0x94/0xc8) from [] 
(kernel_init+0x60/0xf4)
[] (kernel_init+0x60/0xf4) from [] 
(kernel_thread_exit+0x0/0x8)
---[ end trace 7d41c9186f342c40 ]---
Division by zero in kernel.
[] (unwind_backtrace+0x0/0xf0) from [] (Ldiv0+0x8/0x10)
[] (Ldiv0+0x8/0x10) from [] (pl011_set_termios+0x68/0x278)
[] (pl011_set_termios+0x68/0x278) from [] 
(uart_set_options+0xe8/0x114)
[] (uart_set_options+0xe8/0x114) from [] 
(uart_poll_init+0xd4/0xe0)
[] (uart_poll_init+0xd4/0xe0) from [] 
(tty_find_polling_driver+0x100/0x17c)
[] (tty_find_polling_driver+0x100/0x17c) from [] 
(configure_kgdboc+0xc8/0x1b8)
[] (configure_kgdboc+0xc8/0x1b8) from [] 
(do_one_initcall+0x30/0x168)
[] (do_one_initcall+0x30/0x168) from [] 
(do_basic_setup+0x94/0xc8)
[] (do_basic_setup+0x94/0xc8) from [] 
(kernel_init+0x60/0xf4)
[] (kernel_init+0x60/0xf4) from [] 
(kernel_thread_exit+0x0/0x8)
Division by zero in kernel.
[] (unwind_backtrace+0x0/0xf0) from [] (Ldiv0+0x8/0x10)
[] (Ldiv0+0x8/0x10) from [] (uart_update_timeout+0x4c/0x5c)
[] (uart_update_timeout+0x4c/0x5c) from [] 
(pl011_set_termios+0xc8/0x278)
[] (pl011_set_termios+0xc8/0x278) from [] 
(uart_set_options+0xe8/0x114)
[] (uart_set_options+0xe8/0x114) from [] 
(uart_poll_init+0xd4/0xe0)
[] (uart_poll_init+0xd4/0xe0) from [] 
(tty_find_polling_driver+0x100/0x17c)
[] (tty_find_polling_driver+0x100/0x17c) from [] 
(configure_kgdboc+0xc8/0x1b8)
[] (configure_kgdboc+0xc8/0x1b8) from [] 
(do_one_initcall+0x30/0x168)
[] (do_one_initcall+0x30/0x168) from [] 
(do_basic_setup+0x94/0xc8)
[] (do_basic_setup+0x94/0xc8) from [] 
(kernel_init+0x60/0xf4)
[] (kernel_init+0x60/0xf4) from [] 
(kernel_thread_exit+0x0/0x8)

Signed-off-by: Anton Vorontsov 
---
 drivers/tty/serial/amba-pl011.c | 44 ++---
 1 file changed, 33 insertions(+), 11 deletions(-)

diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c
index cede938..45137e4 100644
--- a/drivers/tty/serial/amba-pl011.c
+++ b/drivers/tty/serial/amba-pl011.c
@@ -1309,10 +1309,9 @@ static void pl011_put_poll_char(struct uart_port *port,
 
 #endif /* CONFIG_CONSOLE_POLL */
 
-static int pl011_startup(struct uart_port *port)
+static int pl011_hwinit(struct uart_port *port)
 {
struct uart_amba_port *uap = (struct uart_amba_port *)port;
-   unsigned int cr;
int retval;
 
/* Optionaly enable pins to be muxed in and configured */
@@ -1337,6 +1336,37 @@ static int pl011_startup(struct uart_port *port)
   UART011_RTIS | UART011_RXIS, uap->port.membase + UART011_ICR);
 
/*
+* Save interrupts enable mask, and enable RX interrupts in case if
+* the interrupt is used for NMI entry.
+*/
+   uap->im = readw(uap->port.membase + UART011_IMSC);
+   writew(UART011_RTIM | UART011_RXIM, uap->port.membase + UART011_IMSC);
+
+   if (uap->port.dev->platform_data) {
+   struct amba_pl011_data *plat;
+
+   plat = uap->port.dev->platform_data;
+   if (plat->init)
+   plat->init();
+   }
+   return 0;
+ out:
+   return retval;
+}
+
+static int pl011_startup(struct uart_port *port)
+{
+   struct uart_amba_port *uap = (struct uart_amba_port *)port;
+   unsigned int cr;
+   int retval;
+
+   retval = pl011_hwinit(port);
+   if (retval)
+   goto clk_dis;
+
+   writew(uap->im, uap->port.membase + UART011_IMSC);
+
+   /*
 * Allocate the IRQ
 */
retval = request_irq(uap->port.irq, pl011_int, 0, "uart-pl011", uap);
@@ -1395,19 +1425

[PATCH 03/12] kdb: Turn KGDB_KDB=n stubs into static inlines

2012-09-11 Thread Anton Vorontsov

This makes the stubs actually usable, since e.g. 'foo = kdb_register();'
leads to build errors in !KGDB_KDB case. Plus, with static inlines we
do type checking.

Signed-off-by: Anton Vorontsov 
---
 include/linux/kdb.h | 29 -
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/include/linux/kdb.h b/include/linux/kdb.h
index 42d9e86..7f6fe6e 100644
--- a/include/linux/kdb.h
+++ b/include/linux/kdb.h
@@ -13,6 +13,14 @@
  * Copyright (C) 2009 Jason Wessel 
  */
 
+typedef enum {
+   KDB_REPEAT_NONE = 0,/* Do not repeat this command */
+   KDB_REPEAT_NO_ARGS, /* Repeat the command without arguments */
+   KDB_REPEAT_WITH_ARGS,   /* Repeat the command including its arguments */
+} kdb_repeat_t;
+
+typedef int (*kdb_func_t)(int, const char **);
+
 #ifdef CONFIG_KGDB_KDB
 #include 
 #include 
@@ -32,14 +40,6 @@ extern atomic_t kdb_event;
 
 #define KDB_MAXARGS16 /* Maximum number of arguments to a function  */
 
-typedef enum {
-   KDB_REPEAT_NONE = 0,/* Do not repeat this command */
-   KDB_REPEAT_NO_ARGS, /* Repeat the command without arguments */
-   KDB_REPEAT_WITH_ARGS,   /* Repeat the command including its arguments */
-} kdb_repeat_t;
-
-typedef int (*kdb_func_t)(int, const char **);
-
 /* KDB return codes from a command or internal kdb function */
 #define KDB_NOTFOUND   (-1)
 #define KDB_ARGCOUNT   (-2)
@@ -149,11 +149,14 @@ extern int kdb_register_repeat(char *, kdb_func_t, char 
*, char *,
   short, kdb_repeat_t);
 extern int kdb_unregister(char *);
 #else /* ! CONFIG_KGDB_KDB */
-#define kdb_printf(...)
-#define kdb_init(x)
-#define kdb_register(...)
-#define kdb_register_repeat(...)
-#define kdb_uregister(x)
+static inline __printf(1, 2) int kdb_printf(const char *fmt, ...) { return 0; }
+static inline void kdb_init(int level) {}
+static inline int kdb_register(char *cmd, kdb_func_t func, char *usage,
+  char *help, short minlen) { return 0; }
+static inline int kdb_register_repeat(char *cmd, kdb_func_t func, char *usage,
+ char *help, short minlen,
+ kdb_repeat_t repeat) { return 0; }
+static inline int kdb_unregister(char *cmd) { return 0; }
 #endif /* CONFIG_KGDB_KDB */
 enum {
KDB_NOT_INITIALIZED,
-- 
1.7.11.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 02/12] kdb: Implement disable_nmi command

2012-09-11 Thread Anton Vorontsov

This command disables NMI-entry. If NMI source has been previously shared
with a serial console ("debug port"), this effectively releases the port
from KDB exclusive use, and makes the console available for normal use.

Of course, NMI can be reenabled, enable_nmi modparam is used for that:

echo 1 > /sys/module/kdb/parameters/enable_nmi

Signed-off-by: Anton Vorontsov 
---
 kernel/debug/kdb/kdb_main.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index 31df170..9fadff1 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2107,6 +2108,32 @@ static int kdb_dmesg(int argc, const char **argv)
return 0;
 }
 #endif /* CONFIG_PRINTK */
+
+/* Make sure we balance enable/disable calls, must disable first. */
+static atomic_t kdb_nmi_disabled;
+
+static int kdb_disable_nmi(int argc, const char *argv[])
+{
+   if (atomic_read(&kdb_nmi_disabled))
+   return 0;
+   atomic_set(&kdb_nmi_disabled, 1);
+   kgdb_enable_nmi(0);
+   return 0;
+}
+
+static int kdb_param_enable_nmi(const char *val, const struct kernel_param *kp)
+{
+   if (!atomic_add_unless(&kdb_nmi_disabled, -1, 0))
+   return -EINVAL;
+   kgdb_enable_nmi(1);
+   return 0;
+}
+
+static const struct kernel_param_ops kdb_param_ops_enable_nmi = {
+   .set = kdb_param_enable_nmi,
+};
+module_param_cb(enable_nmi, &kdb_param_ops_enable_nmi, NULL, 0600);
+
 /*
  * kdb_cpu - This function implements the 'cpu' command.
  * cpu []
@@ -2851,6 +2878,8 @@ static void __init kdb_inittab(void)
kdb_register_repeat("dmesg", kdb_dmesg, "[lines]",
  "Display syslog buffer", 0, KDB_REPEAT_NONE);
 #endif
+   kdb_register_repeat("disable_nmi", kdb_disable_nmi, "",
+ "Disable NMI entry to KDB", 0, KDB_REPEAT_NONE);
kdb_register_repeat("defcmd", kdb_defcmd, "name \"usage\" \"help\"",
  "Define a set of commands, down to endefcmd", 0, KDB_REPEAT_NONE);
kdb_register_repeat("kill", kdb_kill, "<-signal> ",
-- 
1.7.11.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 08/12] tty/serial: Add kgdb_nmi driver

2012-09-11 Thread Anton Vorontsov

On Tue, Sep 11, 2012 at 03:14:20PM +0100, Alan Cox wrote:
> > +struct kgdb_nmi_tty_priv {
> > +   struct tty_port port;
> > +   int opened;
> > +   struct tasklet_struct tlet;
> > +   STRUCT_KFIFO(char, KGDB_NMI_FIFO_SIZE) fifo;
> 
> I don't see where "opened" is used.

Yup, a leftover, with tty_port no longer needed. Thanks for noticing.

> > +static const struct tty_operations kgdb_nmi_tty_ops = {
> > +   .open   = kgdb_nmi_tty_open,
> > +   .close  = kgdb_nmi_tty_close,
> > +   .install= kgdb_nmi_tty_install,
> > +   .cleanup= kgdb_nmi_tty_cleanup,
> > +   .write_room = kgdb_nmi_tty_write_room,
> > +   .write  = kgdb_nmi_tty_write,
> 
> And a hangup method (just using tty_port helpers will do the job - it's
> needed so vhangup() works as expected on a port)

Will add. Thanks a lot!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] tty/serial/kgdboc: Add and wire up clear_irqs callback

2012-09-11 Thread Anton Vorontsov

On Tue, Sep 11, 2012 at 03:15:40PM +0100, Alan Cox wrote:
> Anton Vorontsov  wrote:
> > This patch implements a new callback: clear_irqs. It is used for the
> 
> This bit I still really don't like. I would like to know what the generic
> IRQ folks thing about it and if Thomas Gleixner has any brilliant ideas
> here. I don't think its a show stopper it would just be nice if there was
> a better solution first.

Yup, good idea, Cc'ing.

Hello Thomas,

We're dissussing a patch that adds a clear_irq callback into UART
drivers. For convenience, the particular patch is inlined at the end of
this email. The rationale and the background for the whole thing can be
found here: http://lkml.org/lkml/2012/9/10/2

So, just for visual clearness, and for the fun of it, here is some
glorious ascii art of what we have:

 ,---NMI-|`|
UART_IRQ---INT_controller| CPU |
 `---IRQ-|,|

Pretty much standard scheme. That is, on the interrupt controller level
we can reroute any IRQ to NMI, and back in 2008 folks at Google found
that rerouting the UART IRQ to NMI brings some really cool features: we
can have a very reliable and powerful debugger pretty much on every
embedded machine, and without loosing the UART/console port itself.

So, it works like this:

- At boot time, Linux arch/ code reroutes UART IRQ to NMI, we connect
  the port to the KGDBoC, and so forth...;
- User starts typing on the serial port;
- UART raises its IRQ line;
- Through the controller, one of CPUs gets an NMI;
- In NMI context, CPU reads a character from UART;
- Then it checks if the character resulted into the special 'enter
  KGDB' magic sequence:
- If yes, then the CPU invites other CPUs into the KGDB, and thus
  kernel enters the debugger;
- If the character wasn't part of the magic command (or the sequence is
  yet incomplete), then CPU exits NMI and continues as normal.

The "problem" is in the last step. If we exit NMI without making UART
know that we're done with the interrupt, we will reenter the NMI
immediately, even without any new characters from the UART.

The obvious solution would be to "mask/reroute NMI at INT_controller
level or queue serial port's IRQ routine from somewhere, e.g. a tasklet
or software raised IRQ". Unfortunately, this means that we have to keep
NMI disabled for this long:

1. We exit the NMI context with NMI source disabled/rerouted;
2. CPU continues to execute the kernel;
3. Kernel receives a timer interrupt, or software-raised interrupt, or
   UART IRQ, which was temporary rerouted back to normal interrupts;
4. It executes normal IRQ-entry, tracing, lots of other stuff,
   interrupt handlers, softirq handlers, and thus we clear the UART
   interrupt;
5. Once the UART is cleared, we reenable NMI (in the arch-code, we can
   do that in our do_IRQ());

As you can see, with this solution the NMI debugger will be effectively
disabled from 1. to 5., thus shall the hang happen in this sensitive
code, we would no longer able to debug it.

And this defeats the main purpose of the NMI debugger: we must keep NMI
enabled all the time when we're not in the debugger, the NMI debugger
is always available (unless the debugger crashed :-)

That's why I came with the clear_irq callback in the serial drivers
that we call from the NMI context, it's much simpler and keeps the
debugger robust. So, personally I too can't think of any other
plausible solution that would keep all the features intact.

Thanks,

Anton.

- - - -
[PATCH] tty/serial/kgdboc: Add and wire up clear_irqs callback

This patch implements a new callback: clear_irqs. It is used for the
cases when KDB-entry (e.g. NMI) and KDB IO (e.g. serial port) shares the
same interrupt. To get the idea, let's take some real example (ARM
machine): we have a serial port which interrupt is routed to an NMI, and
the interrupt is used to enter KDB. Once there is some activity on the
serial port, the CPU receives NMI exception, and we fall into KDB shell.
So, it is our "debug console", and it is able to interrupt (and thus
debug) even IRQ handlers themselves.

When used that way, the interrupt never reaches serial driver's IRQ
handler routine, which means that serial driver will not silence the
interrupt. NMIs behaviour are quite arch-specific, and we can't assume
that we can use them as ordinary IRQs, e.g. on some arches (like ARM) we
can't handle data aborts, the behaviour is undefined then. So we can't
just handle execution to serial driver's IRQ handler from the NMI
context once we're done with KDB (plus this would defeat the debugger's
purpose: we want the NMI handler be as simple as possible, so it will
have less chances to hang).

So, given that have to deal with it somehow, we have two options:

1. Implement something that clears the interrupt; 2. Implement a whol

Re: [RFC] tty/serial/kgdboc: Add and wire up clear_irqs callback

2012-09-11 Thread Anton Vorontsov

On Tue, Sep 11, 2012 at 08:42:46PM -0700, Colin Cross wrote:
[...]
> > The "problem" is in the last step. If we exit NMI without making UART
> > know that we're done with the interrupt, we will reenter the NMI
> > immediately, even without any new characters from the UART.
> 
> The UART irq line should go low when you read the character out of the

Probably some controllers may lower the line by themselves, but not
all, and probably most of them need an explicit clear.

> receive buffer, or the polling rx function should clear the interrupt
> for you.

Yes, that's an option. But that way we add a new semantic for the
polling routines, and effecitvely we just merge the two callbacks.

Of course, if Alan is OK with this, I'm more than OK too. :-)

(But the polling routines would need to clear all interrupts, not
just rx/tx. For example, if the controller indicated some error, and
nobody clears it, then we'll start reentering infinitely.)

> If you use a clear_irqs callback, you can drop characters if
> one arrives between the last character buffer read and calling
> clear_irqs.

Only if we call clear_irqs() after reading the characters, but we do
it before. So if new characters are available, we will reenter NMI,
which is OK.

But if used incorrectly, it truly can cause dropping (or staling) of
characters, so I'd better add some comments about this.

Thanks!

Anton.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] tty/serial/kgdboc: Add and wire up clear_irqs callback

2012-09-12 Thread Anton Vorontsov

On Tue, Sep 11, 2012 at 09:40:35PM -0700, Colin Cross wrote:
[..]
> >> the polling rx function should clear the interrupt
> >> for you.
> >
> > Yes, that's an option. But that way we add a new semantic for the
> > polling routines, and effecitvely we just merge the two callbacks.
> >
> > Of course, if Alan is OK with this, I'm more than OK too. :-)
> >
> > (But the polling routines would need to clear all interrupts, not
> > just rx/tx. For example, if the controller indicated some error, and
> > nobody clears it, then we'll start reentering infinitely.)
> 
> For exynos5, the only non-8250 based serial port I've come across, we
> clear all interrupts in the rx poll function (see
> https://android.googlesource.com/kernel/exynos/+/ef427aafffb7153dde59745e440fd7ec41ea969d/arch/arm/mach-exynos/exynos_fiq_debugger.c).

Yes, but if you'd like to merge your code, some might ask you: why?

You'd answer that you need to clear the interrupts, otherwise you'll
keep reentering NMI. The next that you might get is this: "this does
not belong to the getc callback, it's better to factor it out". :-) And
here comes clear_irq() (or alike, see below).

> >> If you use a clear_irqs callback, you can drop characters if
> >> one arrives between the last character buffer read and calling
> >> clear_irqs.
> >
> > Only if we call clear_irqs() after reading the characters, but we do
> > it before. So if new characters are available, we will reenter NMI,
> > which is OK.
> >
> > But if used incorrectly, it truly can cause dropping (or staling) of
> > characters, so I'd better add some comments about this.
> 
> What does clear_irqs() mean for a status or tx interrupt?  The tx
> interrupt will generally re-assert as long as the tx fifo is empty,
> which would require disabling it.

Yup, and that's exactly what we do: http://lkml.org/lkml/2012/9/11/119

Your words made me think that clear_irq() might be indeed a somewhat
inappropriate name. We have to be even stricter on its definition and
behaviour.

So, returning to your question "What does clear_irqs() mean", I'd
answer that: the function must do whatever needed to lower the IRQ
line, plus the function must leave the port in the state that it's
still able to throw RX interrupts after the call.

So, the 100% proper name for this function would be this:

quiesce_irqs_but_rx()

It's a bit long, but does exactly what the name states.

Thanks!
Anton.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] tty/serial/kgdboc: Add and wire up clear_irqs callback

2012-09-12 Thread Anton Vorontsov

On Wed, Sep 12, 2012 at 10:44:20AM +0100, Alan Cox wrote:
> > Of course, if Alan is OK with this, I'm more than OK too. :-)
> 
> It may well be better.
> 
> > (But the polling routines would need to clear all interrupts, not
> > just rx/tx. For example, if the controller indicated some error, and
> > nobody clears it, then we'll start reentering infinitely.)
> 
> For a lot of devices and platforms you'd probably mask them instead ?

If there is no way to clear them, yes, we obviously would want to
mask them before using the port for NMI debugger. Then we'd need
three callbacks:

- mask_all_irqs_but_rx() -- used before we want to start using the port
  for the NMI debugger;
- clear_rx_irq() -- (optional) clears rx IRQ for controllers that need
  it;
- restore_irqs() -- unmasks interrupts that were previously masked.

If we ever encounter a case when just clearing interrupts doesn't work,
we can surely implement the above scheme... It's just so far I don't
see any need to over-design this, but again, it's your call, I told my
opinion on this, but I'll do whatever you guys like more. :-)

> > > If you use a clear_irqs callback, you can drop characters if
> > > one arrives between the last character buffer read and calling
> > > clear_irqs.
> > 
> > Only if we call clear_irqs() after reading the characters, but we do
> > it before. So if new characters are available, we will reenter NMI,
> > which is OK.
> 
> Recursively or not... again you get platform specific magic in places
> we don't want.

I really really don't see how this is platform-specific. All we ask the
serial driver is to quiesce its interrupt. Whether we can handle
NMIs/IRQs recursively or not is not serial driver's worry, since its
IRQ handler is not going to fire anyway. The polling routines already
gave us the power to steal/inject the data, so now we're stealing the
interrupt too.

How we use the callback is indeed platform-specific, but so is the
whole KGDB, and that knowledge is hidden there.

For serial driver it's all pretty much clear: lower the interrupt, but
don't turn off rx detection.

Thanks!

Anton.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 0/4] Runtime Interpreted Power Sequences

2012-09-12 Thread Anton Vorontsov

On Wed, Sep 12, 2012 at 03:27:04PM -0600, Stephen Warren wrote:
> On 09/12/2012 03:57 AM, Alexandre Courbot wrote:
> > New revision of the power sequences, taking as usual the feedback that was
> > kindly provided about the last version.
> > 
> > I think now is a good time to discuss integrating this and to start looking 
> > for
> > a maintainer who would be willing to merge this into his/her tree (I am
> > especially thinking about the power framework maintainers, since this is 
> > where
> > the code is right now.
> 
> The other alternative is for you to maintain this going forward; I
> believe that would be as simple as:
> 
> * Create a patch to add yourself to MAINTAINERS for the
> drivers/power/power_seq/ directory.
> 
> * Get a kernel.org account, push this patch to a branch there, and add
> the branch into linux-next.
> 
> * Send a pull request to Linus at the appropriate time.
> 
> * Ongoing: Accept any patches, perform any maintenance required, etc.
> 
> Does anyone see any issue with Alexandre doing this? Nobody else has
> volunteered yet:-)

Yup, looks like the best way.

Anton.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v7 0/11] KGDB/KDB FIQ (NMI) debugger

2012-09-13 Thread Anton Vorontsov

Hi all,

Here comes the lucky v7:

- Per Alan Cox's suggestion added hangup method and removed a small
  leftover;
- Per Colin Cross' suggestion moved IRQ quiescing logic into
  poll_get_char routine. IIUC, Alan is less unhappy about it.  As a
  result, clear_irq() callback dropped.

These patches can be found in the following repo (based on tty-next):

git://git.infradead.org/users/cbou/linux-nmi-kdb.git master

Old changelogs and rationale for these patches can be found here:

v1-v5, rationale: http://lkml.org/lkml/2012/9/10/2
v6: http://lkml.org/lkml/2012/9/10/2

Thanks,

--
 arch/arm/Kconfig|  19 ++
 arch/arm/common/vic.c   |  28 ++
 arch/arm/include/asm/hardware/vic.h |   2 +
 arch/arm/include/asm/kgdb.h |   8 +
 arch/arm/kernel/Makefile|   1 +
 arch/arm/kernel/entry-armv.S| 167 +---
 arch/arm/kernel/entry-header.S  | 170 +
 arch/arm/kernel/kgdb_fiq.c  |  99 
 arch/arm/kernel/kgdb_fiq_entry.S|  87 +++
 arch/arm/mach-versatile/Makefile|   1 +
 arch/arm/mach-versatile/kgdb_fiq.c  |  31 +++
 drivers/tty/serial/Kconfig  |  19 ++
 drivers/tty/serial/Makefile |   1 +
 drivers/tty/serial/amba-pl011.c |  73 +-
 drivers/tty/serial/kgdb_nmi.c   | 396 +
 drivers/tty/serial/kgdboc.c |   6 +
 drivers/tty/serial/serial_core.c|  17 ++
 include/linux/kdb.h |  29 ++-
 include/linux/kgdb.h|  33 +++
 include/linux/serial_core.h |   1 +
 kernel/debug/debug_core.c   |  36 ++-
 kernel/debug/kdb/kdb_main.c |  29 +++
 22 files changed, 1060 insertions(+), 193 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2132 matches

Mail list logo