date:20130414

Re: mm: BUG in do_huge_pmd_wp_page

2013-04-14 Thread Minchan Kim

On Thu, Apr 11, 2013 at 04:18:13PM +0300, Kirill A. Shutemov wrote:
> Minchan Kim wrote:
> > On Fri, Mar 29, 2013 at 09:04:16AM -0400, Sasha Levin wrote:
> > > Hi all,
> > > 
> > > While fuzzing with trinity inside a KVM tools guest running latest -next 
> > > kernel,
> > > I've stumbled on the following.
> > > 
> > > It seems that the code in do_huge_pmd_wp_page() was recently modified in
> > > "thp: do_huge_pmd_wp_page(): handle huge zero page".
> > > 
> > > Here's the trace:
> > > 
> > > [  246.244708] BUG: unable to handle kernel paging request at 
> > > 88009c422000
> > > [  246.245743] IP: [] copy_page_rep+0x5/0x10
> > > [  246.250569] PGD 7232067 PUD 7235067 PMD bfefe067 PTE 80009c422060
> > > [  246.251529] Oops:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > > [  246.252325] Dumping ftrace buffer:
> > > [  246.252791](ftrace buffer empty)
> > > [  246.252869] Modules linked in:
> > > [  246.252869] CPU 3
> > > [  246.252869] Pid: 11985, comm: trinity-child12 Tainted: GW
> > > 3.9.0-rc4-next-20130328-sasha-00014-g91a3267 #319
> > > [  246.252869] RIP: 0010:[]  [] 
> > > copy_page_rep+0x5/0x10
> > > [  246.252869] RSP: 0018:8815bc40  EFLAGS: 00010286
> > > [  246.252869] RAX: 8815bfd8 RBX: 02710880 RCX: 
> > > 0200
> > > [  246.252869] RDX:  RSI: 88009c422000 RDI: 
> > > 88009a422000
> > > [  246.252869] RBP: 8815bc98 R08: 02718000 R09: 
> > > 0001
> > > [  246.252869] R10: 0001 R11:  R12: 
> > > 8800
> > > [  246.252869] R13: 8815bfd8 R14: 8815bfd8 R15: 
> > > fff8
> > > [  246.252869] FS:  7f53db93f700() GS:8800bba0() 
> > > knlGS:
> > > [  246.252869] CS:  0010 DS:  ES:  CR0: 80050033
> > > [  246.252869] CR2: 88009c422000 CR3: 00159000 CR4: 
> > > 000406e0
> > > [  246.252869] DR0:  DR1:  DR2: 
> > > 
> > > [  246.252869] DR3:  DR6: 0ff0 DR7: 
> > > 0400
> > > [  246.252869] Process trinity-child12 (pid: 11985, threadinfo 
> > > 8815a000, task 88009c60b000)
> > > [  246.252869] Stack:
> > > [  246.252869]  81234aae 8815bc88 81273639 
> > > 00a0
> > > [  246.252869]  02718000 8800ab36d050 88153800 
> > > ea000269
> > > [  246.252869]  00a0 8800ab36d000 ea000271 
> > > 8815bd48
> > > [  246.252869] Call Trace:
> > > [  246.252869]  [] ? copy_user_huge_page+0x1de/0x240
> > > [  246.252869]  [] ? mem_cgroup_charge_common+0xa9/0xc0
> > > [  246.252869]  [] do_huge_pmd_wp_page+0x9f7/0xc60
> > > [  246.252869]  [] ? __const_udelay+0x29/0x30
> > > [  246.252869]  [] handle_mm_fault+0x26e/0x650
> > > [  246.252869]  [] ? __lock_is_held+0x5a/0x80
> > > [  246.252869]  [] ? __do_page_fault+0x514/0x5e0
> > > [  246.252869]  [] __do_page_fault+0x570/0x5e0
> > > [  246.252869]  [] ? rcu_eqs_exit_common+0x60/0x260
> > > [  246.252869]  [] ? rcu_eqs_enter_common+0x33e/0x3b0
> > > [  246.252869]  [] ? rcu_eqs_exit+0x9c/0xb0
> > > [  246.252869]  [] do_page_fault+0x32/0x50
> > > [  246.252869]  [] do_async_page_fault+0x30/0xc0
> > > [  246.252869]  [] async_page_fault+0x28/0x30
> > > [  246.252869] Code: 90 90 90 90 90 90 9c fa 65 48 3b 06 75 14 65 48 3b 
> > > 56 08 75 0d 65 48 89 1e 65 48 89 4e 08 9d b0 01 c3 9d 30
> > > c0 c3 b9 00 02 00 00  48 a5 c3 0f 1f 80 00 00 00 00 eb ee 66 66 66 90 
> > > 66 66 66 90
> > > [  246.252869] RIP  [] copy_page_rep+0x5/0x10
> > > [  246.252869]  RSP 
> > > [  246.252869] CR2: 88009c422000
> > > [  246.252869] ---[ end trace 09fbe37b108d5766 ]---
> > > 
> > > And this is the code:
> > > 
> > > if (is_huge_zero_pmd(orig_pmd))
> > > clear_huge_page(new_page, haddr, HPAGE_PMD_NR);
> > > else
> > > copy_user_huge_page(new_page, page, haddr, vma, 
> > > HPAGE_PMD_NR); <--- this
> > > 
> > > 
> > > Thanks,
> > > Sasha
> > 
> > I don't know this issue was already resolved. If so, my reply become a just
> > question to Kirill regardless of this BUG.
> > 
> > When I am looking at the code, I was wonder about the logic of GHZP(aka,
> > get_huge_zero_page) reference handling. The logic depends on that page
> > allocator never alocate PFN 0.
> > 
> > Who makes sure it? What happens if allocator allocates PFN 0?
> > I don't know all of architecture makes sure it.
> > You investigated it for all arches?
> > 
> > If not, 
> > CPU 1   CPU 2   CPU 3
> > 
> > shrink_huge_zero_page
> > huge_zero_refcount = 0;
> > GHZP
> > pfn_0_zero_page = alloc_pages 
> >  GHZP
> >  pfn_some_zero_page 
> > = alloc_page
>

[GIT PULL REQUEST] watchdog - v3.9-rc6 Fixes

2013-04-14 Thread Wim Van Sebroeck

Hi Linus,

Please pull from 'master' branch of
git://www.linux-watchdog.org/linux-watchdog.git

It will fix compile errors for teh at91rm9200_wdt driver.

This will update the following files:

 Kconfig |2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

with these Changes:

commit 09549cd01726a7ff8b102a93e46b059531583ab6
Author: Nicolas Ferre 
Date:   Wed Apr 10 14:36:22 2013 +0200

watchdog: Revert the AT91RM9200_WATCHDOG dependency

Compiling the at91rm9200_wdt.c driver without at91rm9200
support was leading to several errors:

drivers/built-in.o: In function `at91_wdt_close':
at91_adc.c:(.text+0xc9fe4): undefined reference to `at91_st_base'
drivers/built-in.o: In function `at91_wdt_write':
at91_adc.c:(.text+0xca004): undefined reference to `at91_st_base'
drivers/built-in.o: In function `at91wdt_shutdown':
at91_adc.c:(.text+0xca01c): undefined reference to `at91_st_base'
drivers/built-in.o: In function `at91wdt_suspend':
at91_adc.c:(.text+0xca038): undefined reference to `at91_st_base'
drivers/built-in.o: In function `at91_wdt_open':
at91_adc.c:(.text+0xca0cc): undefined reference to `at91_st_base'
drivers/built-in.o:at91_adc.c:(.text+0xca2c8): more undefined references to
`at91_st_base' follow

So, reverting the modification of the "depends" Kconfig line
introduced by patch a6a1bcd37 (watchdog: at91rm9200: add DT support)
seems to be the good solution.

Signed-off-by: Nicolas Ferre 
Acked-by: Guenter Roeck 
Signed-off-by: Wim Van Sebroeck 

For completeness, I added the overal diff below.

Greetings,
Wim.


diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index 9fcc70c..e89fc31 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -117,7 +117,7 @@ config ARM_SP805_WATCHDOG
 
 config AT91RM9200_WATCHDOG
tristate "AT91RM9200 watchdog"
-   depends on ARCH_AT91
+   depends on ARCH_AT91RM9200
help
  Watchdog timer embedded into AT91RM9200 chips. This will reboot your
  system when the timeout is reached.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Return value of __mm_populate

2013-04-14 Thread Marco Stornelli


Hi,

Il 14/04/2013 02:18, KOSAKI Motohiro ha scritto:

(4/13/13 5:14 AM), Marco Stornelli wrote:

Hi,

I was seeing the code of __mm_populate (in -next) and I've got a doubt
about the return value. The function __mlock_posix_error_return should
return a proper error for mlock, converting the return value from
__get_user_pages. It checks for EFAULT and ENOMEM. Actually
__get_user_pages could return, in addition, ERESTARTSYS and EHWPOISON.


__get_user_pages doesn't return EHWPOISON if FOLL_HWPOISON is not specified.
I'm not expert ERESTARTSYS. I understand correctly, ERESTARTSYS is only returned
when signal received, and signal handling routine (e.g. do_signal) modify EIP 
and
hidden ERESTARTSYS from userland generically.



Yep, you're right, the "magic" is inside the signal management. Thanks!!

Marco
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v7 00/11] Support vrange for anonymous page

2013-04-14 Thread Minchan Kim

Hi KOSAKI,

On Thu, Apr 11, 2013 at 11:01:11AM -0400, KOSAKI Motohiro wrote:
>  and adding new syscall invokation is unwelcome.
> >>>
> >>> Sure. But one more system call could be cheaper than page-granuarity
> >>> operation on purged range.
> >>
> >> I don't think vrange(VOLATILE) cost is the related of this discusstion.
> >> Whether sending SIGBUS or just nuke pte, purge should be done on vmscan,
> >> not vrange() syscall.
> > 
> > Again, please see the MADV_FREE. http://lwn.net/Articles/230799/
> > It does changes pte and page flags on all pages of the range through
> > zap_pte_range. So it would make vrange(VOLASTILE) expensive and
> > the bigger cost is, the bigger range is.
> 
> This haven't been crossed my mind. now try_to_discard_one() insert vrange
> for making SIGBUS. then, we can insert pte_none() as the same cost too. Am
> I missing something?

For your requirement, we need some tracking model to detect some page is
using by the process currently before VM discards it *if* we don't give
vrange(NOVOLATILE) pair system call(Look at below). So the tracking model
should be formed in vrange(VOLATILE) system call context.

> 
> I couldn't imazine why pte should be zapping on vrange(VOLATILE).

Sorry, my explanation was too bad to understand.
I will try again.

First of all, thing you want is almost like MADV_FREE.
So let's look at it firstly.

If you call madvise(range, MADV_FREE), VM should investigate all of
pages mapped at page table for range(start, start + len) so we need
page table lookup for the range and mark a flag to all page descriptor
(ex,PG_lazyfree) to give hint to kernel for discarding the page instead of
swappint out when reclaim happens. Another thing we need is to clear out
a dirty bit from PTE to detect the pages is dirtied or not, since we call
madvise(range, MADV_FREE) because we can't discard them, which are using by
some process since he called madvise. So if VM find the page has PG_lazyfree
but the page is dirtied recenlty by peeking PTE, VM can't discard the page.
So madivse system call's overhead is folloinwg as in madvise(MADV_FREE)

1. look up all pages from page table for the range.
2. mark some bit(PG_lazyfree) for page descriptors of pages mapped at range
3. clear dirty bit and TLB flush

So, madvise(MADV_FREE) would be better than madvise(DONTNEED) because it can
avoid page fault if memory pressure doesn't happen but system call overhead
could be still huge and expecially the overhead is increased proportionally
by range size.

Let's talk about vrange(range, VOLATILE)
The overhead of it is very small, which is just mark a flag into a
structure which represents the range (ie, struct vrange). When VM want to 
reclaim
some pages, VM find a page is mapped at VOLATILE area, so it could discard it
instead of swapping out. It moves the ovehead from system call itself to
VM reclaim path which is very slow path in the system and I think it's desirable
design(And that's why we have rmap).
But the problem is remained. VM can't detect page using by process after he 
calls
vrange(range, VOLATILE) because we didn't do anything in vrange(VOLATILE) so
VM might discard the page under the process. It didn't happen in 
madvise(MADV_FREE)
because it cleared out dirty bit of PTE to detect the page is used or not
since madvise is called.

Solution in vrange is to make new vrange(range, NOVOLATILE) system call, which 
give
the hint to kernel for preventing descarding pages in the range any more.
The cost of vrange(range, NOVOLATILE) is very small, too.
It just clear out the flags from a struct vrange which represents a range.

So I think calling of pair system call about volatile would be cheaper than a
only madvise(MADV_FREE).

I hope it helps your understanding but not sure because I am writing this
in airport which are very hard to focus my work. :(

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Documentation: cfq-iosched: update documentation help for cfq tunnables

2013-04-14 Thread Jens Axboe

On Sat, Apr 13 2013, Rob Landley wrote:
> Cleaning out "look at this" directory, I don't see this applied upstream but
> it may already be in Jens' tree. (That's the tree it should go in
> through...)

It's already included, see:

http://git.kernel.dk/?p=linux-block.git;a=commit;h=fdc6fdc52e4630f5020281ce5450be7cc1887de2

I changed some of the wording. But since this is your forte, please do
send any incremental patches against what is already in there.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] mtip32xx: mtip32xx: Disable TRIM support

2013-04-14 Thread Jens Axboe

On Fri, Apr 12 2013, Asai Thambi S P wrote:
> 
> Temporarily disabling TRIM support until TRIM related issues
> are addressed in the firmware.

How serious is this? We do have released kernels out there with the
driver, you might want to consider a stable backport too.

Anyway, applied for 3.10.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] mtip32xx: fix a smatch warning

2013-04-14 Thread Jens Axboe

On Fri, Apr 12 2013, Asai Thambi S P wrote:
> 
> Reported smatch warning:
> drivers/block/mtip32xx/mtip32xx.c:4163 mtip_block_shutdown() warn: variable 
> dereferenced before check 'dd->disk' (see line 4159)
> 
> dd->disk->disk_name accessed before the check if dd->disk is NULL. Fixed this
> and access of dd->queue/dd->disk->queue.

Applied for 3.10, thanks.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[git pull] drm fixes

2013-04-14 Thread Dave Airlie


Hi Linus

one fix for a hotplug locking regressions, and one fix for an oops if you 
unplug the monitor at an inopportune moment on the udl device.

Dave.

The following changes since commit cfb63bafdb87bbcdc5d6dbbca623d3f69475f118:

  Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma 
(2013-04-11 20:35:11 -0700)

are available in the git repository at:


  git://people.freedesktop.org/~airlied/linux drm-fixes

for you to fetch changes up to 89ced125472b8551c65526934b7f6c733a6864fa:

  drm/fb-helper: Fix locking in drm_fb_helper_hotplug_event (2013-04-12 
14:21:12 +1000)


Daniel Vetter (1):
  drm/fb-helper: Fix locking in drm_fb_helper_hotplug_event

Dave Airlie (1):
  udl: handle EDID failure properly.

 drivers/gpu/drm/drm_fb_helper.c | 8 +---
 drivers/gpu/drm/udl/udl_connector.c | 4 
 2 files changed, 9 insertions(+), 3 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: Add a Kconfig shortcut for a kvm-bootable kernel

2013-04-14 Thread Pekka Enberg


Hello,

On 4/12/13 9:19 PM, Borislav Petkov wrote:

so I'm currently experimenting with my randconfig build scripts and
thought that maybe it would be a cool thing to not only do the random
builds only but also boot-test them in kvm. Which reminded me that we
have that KVMTOOL_TEST_ENABLE config option in the kvmtool with which we
can select all the stuff needed to boot the kernel in kvm.

So I copied it. I now have an all.config in the repo with
CONFIG_KVM_TEST_ENABLE=y in it so that the random builds can have the
required support.

So what do people think?

It is pretty helpful for such testing; AFAICT Fengguang is doing his
testing with kvm so he probably could use it too. And regardless, there
are more and more reasons to boot the kernel in kvm so having a single
option which selects the needed support makes more sense with time.

And I haven't picked up the 'make kvmconfig' functionality because it
is not strictly needed (yet) but it wouldn't hurt if we took it because
someone has a good reason for needing it.


I obviously support having something like this in mainline. I wonder 
though if we could just call this "default standalone KVM guest config" 
instead of emphasizing testing angle.


Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v6] mmc: core: Add support for idle time BKOPS

2013-04-14 Thread Maya Erez

Devices have various maintenance operations need to perform internally.
In order to reduce latencies during time critical operations like read
and write, it is better to execute maintenance operations in other
times - when the host is not being serviced. Such operations are called
Background operations (BKOPS).
The device notifies the status of the BKOPS need by updating BKOPS_STATUS
(EXT_CSD byte [246]).

According to the standard a host that supports BKOPS shall check the
status periodically and start background operations as needed, so that
the device has enough time for its maintenance operations.

This patch adds support for this periodic check of the BKOPS status.
Since foreground operations are of higher priority than background
operations the host will check the need for BKOPS when it is idle
(in runtime suspend), and in case of an incoming request the BKOPS
operation will be interrupted.

If the card raised an exception with need for urgent BKOPS (level 2/3)
a flag will be set to indicate MMC to start the BKOPS activity when it
becomes idle.

Since running the BKOPS too often can impact the eMMC endurance, the card
need for BKOPS is not checked on every runtime suspend. In order to estimate
when is the best time to check for BKOPS need the host will take into
account the card capacity and percentages of changed sectors in the card.
A future enhancement can be to check the card need for BKOPS only in case
of random activity.

Signed-off-by: Maya Erez 
---
This patch depends on the following patches:
[PATCH V2 1/2] mmc: core: Add bus_ops fro runtime pm callbacks
[PATCH V2 2/2] mmc: block: Enable runtime pm for mmc blkdevice
---
diff --git a/Documentation/mmc/mmc-dev-attrs.txt 
b/Documentation/mmc/mmc-dev-attrs.txt
index 189bab0..8257aa6 100644
--- a/Documentation/mmc/mmc-dev-attrs.txt
+++ b/Documentation/mmc/mmc-dev-attrs.txt
@@ -8,6 +8,15 @@ The following attributes are read/write.
 
force_roEnforce read-only access even if write protect 
switch is off.
 
+   bkops_check_threshold   This attribute is used to determine whether
+   the status bit that indicates the need for BKOPS should be checked.
+   The value should be given in percentages of the card size.
+   This value is used to calculate the minimum number of sectors that
+   needs to be changed in the device (written or discarded) in order to
+   require the status-bit of BKOPS to be checked.
+   The value can modified via sysfs by writing the required value to:
+   /sys/block//bkops_check_threshold
+
 SD and MMC Device Attributes
 
 
diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index 536331a..ef42117 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -116,6 +116,7 @@ struct mmc_blk_data {
unsigned intpart_curr;
struct device_attribute force_ro;
struct device_attribute power_ro_lock;
+   struct device_attribute bkops_check_threshold;
int area_type;
 };
 
@@ -287,6 +288,65 @@ out:
return ret;
 }
 
+static ssize_t
+bkops_check_threshold_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+   struct mmc_blk_data *md = mmc_blk_get(dev_to_disk(dev));
+   struct mmc_card *card = md->queue.card;
+   int ret;
+
+   if (!card)
+   ret = -EINVAL;
+   else
+   ret = snprintf(buf, PAGE_SIZE, "%d\n",
+   card->bkops_info.size_percentage_to_start_bkops);
+
+   mmc_blk_put(md);
+   return ret;
+}
+
+static ssize_t
+bkops_check_threshold_store(struct device *dev,
+struct device_attribute *attr,
+const char *buf, size_t count)
+{
+   int value;
+   struct mmc_blk_data *md = mmc_blk_get(dev_to_disk(dev));
+   struct mmc_card *card = md->queue.card;
+   unsigned int card_size;
+   int ret = count;
+
+   if (!card) {
+   ret = -EINVAL;
+   goto exit;
+   }
+
+   sscanf(buf, "%d", &value);
+   if ((value <= 0) || (value >= 100)) {
+   ret = -EINVAL;
+   goto exit;
+   }
+
+   card_size = (unsigned int)get_capacity(md->disk);
+   if (card_size <= 0) {
+   ret = -EINVAL;
+   goto exit;
+   }
+   card->bkops_info.size_percentage_to_start_bkops = value;
+   card->bkops_info.min_sectors_to_start_bkops =
+   (card_size * value) / 100;
+
+   pr_debug("%s: size_percentage = %d, min_sectors = %d",
+   mmc_hostname(card->host),
+   card->bkops_info.size_percentage_to_start_bkops,
+   card->bkops_info.min_sectors_to_start_bkops);
+
+exit:
+   mmc_blk_put(md);
+   return count;
+}
+
 static int mmc_blk_open(struct block_device *bdev, fmode_t mode)
 {
struct mmc_blk_data *md = mmc_blk_get(bdev->bd_disk);
@@ -

Re: [PATCH] x86: Add a Kconfig shortcut for a kvm-bootable kernel

2013-04-14 Thread Borislav Petkov

On Sun, Apr 14, 2013 at 12:31:12PM +0300, Pekka Enberg wrote:
> I obviously support having something like this in mainline. I wonder
> though if we could just call this "default standalone KVM guest
> config" instead of emphasizing testing angle.

/me nods agreeingly...

And it should be unter HYPERVISOR_GUEST where the rest of this stuff
resides. Good point.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL REQUEST] watchdog - v3.9-rc6 Fixes

2013-04-14 Thread Guenter Roeck

On Sun, Apr 14, 2013 at 09:17:03AM +0200, Wim Van Sebroeck wrote:
> Hi Linus,
> 
> Please pull from 'master' branch of
>   git://www.linux-watchdog.org/linux-watchdog.git
> 
> It will fix compile errors for teh at91rm9200_wdt driver.
> 
> This will update the following files:
> 
>  Kconfig |2 +-
>  1 files changed, 1 insertion(+), 1 deletion(-)
> 
> with these Changes:
> 
> commit 09549cd01726a7ff8b102a93e46b059531583ab6
> Author: Nicolas Ferre 
> Date:   Wed Apr 10 14:36:22 2013 +0200
> 
Hi Wim,

What is your take on "watchdog: Fix race condition in registration code" [1] ?

Thanks,
Guenter

[1] http://www.spinics.net/lists/linux-watchdog/msg02291.html,
https://patchwork.kernel.org/patch/2400801/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 08/30] i2c: s3c2410: make header file local

2013-04-14 Thread Wolfram Sang

On Thu, Apr 11, 2013 at 02:04:50AM +0200, Arnd Bergmann wrote:

> No other file in the kernel besides i2c-s3c2410.c uses the current
> plat/regs-iic.h, so we can simply move the header file to live in the
> same directory as the driver, as a preparation to multiplatform builds.

What about putting the regs in the driver itself?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] doc: Hold down best practices for pull requests

2013-04-14 Thread Randy Dunlap

On 04/13/13 23:46, Rob Landley wrote:
> On 04/06/2013 03:55:26 PM, Randy Dunlap wrote:
>> On 03/03/13 04:43, Borislav Petkov wrote:
>> > From: Borislav Petkov 
>> >
>> >  Documentation/SubmittingPullRequests | 148 
>> > +++
>> >  1 file changed, 148 insertions(+)
>> >  create mode 100644 Documentation/SubmittingPullRequests
>> >
>> > diff --git a/Documentation/SubmittingPullRequests 
>> > b/Documentation/SubmittingPullRequests
>> > new file mode 100644
>> > index ..d123745e0cf5
>> > --- /dev/null
>> > +++ b/Documentation/SubmittingPullRequests
>> > @@ -0,0 +1,148 @@
> 
>> > +1.) The patchset going to an upper level maintainer should NOT be based
>> > +on some random, potentially completely broken commit in the middle of a
>> > +merge window, or some other random point in the tree history.
>> > +
>> > +Tangential to that, it shouldn't contain back-merges - not to "next"
>> > +trees, and not to a "random commit of the day" in Linus' tree.
> 
> Could you do positive advice first instead of negative advice? "Base your 
> tree on a release version, and never re-pull between releases without a damn 
> good reason."
> 
> Not "don't do this, don't do this, don't do this" and make them figure out 
> what they _should_ do by process of elimination.

agreed.

>> > +Here's Linus counting the ways why you shouldn't make merges yourself:
>> > +
>> > +" - I'm usually a day or two behind in my merge queue anyway, partly
>> > +because I get tons of pull requests in a short while and I just want
>> > +to get a feel for what's going on, and partly because I tend to do
>> > +pulls in waves of "ok, I'm going filesystems now, then I'll look at
>>
>>   doing ?
>>
>> > +drivers".
> 
> Given that he's quoting linus, it would be "[doing]".

ack.

>> > +8.) After the maintainer has pulled, it is always a good idea to take a
>> > +look at the merge and verify it has happened as you've expected it to,
>> > +maybe even run your tests on it to double-check everything went fine.
>> > +
>> > +Further reading: Documentation/development-process/*
>> >
>>
>> Looks good and useful overall.
> 
> Looks longer than necessary to me, and if we have a 
> Documentation/development-process why isn't this going in there instead of at 
> the top level? (Although really why isn't it just another couple bullet 
> points under submittingpatches?)

Well, yes, my first thought was actually why not update SubmittingPatches
instead of add this new file.

-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 2/2] pps: new client driver using GPIO

2013-04-14 Thread Yeung

Hi James,

I am a newbie to linux kernel device driver and would like to use this
client driver on my uClinux running on the NIOS2. Can you kindly point me to
the right direction, since I am using a device tree and believe this doesn't
support device tree, right? What do I need to add/modify so I can use a
input GPIO as a source? I saw a google post to add this code to (?? an
unknown) and then you need to call pps_init in the configuration routine (??
not sure what it mean)

add
/* PPS-GPIO platform data */
static struct pps_gpio_platform_data pps_gpio_info = {
.assert_falling_edge = false,
.capture_clear= false,
.gpio_pin=63,
.gpio_label="PPS",
};

static struct platform_device pps_gpio_device = {
.name = "pps-gpio",
.id = -1,
.dev = {
.platform_data = &pps_gpio_info
},
};

static void pps_init(int evm_id, int profile)
{
int err;

err = platform_device_register(&pps_gpio_device);
if (err) {
pr_warning("Could not register PPS_GPIO device");
}
}

Thanks in advance for any help,

Yeung



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6] pstore/ram: Add ramoops support for the Flattened Device Tree.

2013-04-14 Thread Anton Vorontsov

On Mon, Apr 08, 2013 at 12:54:01PM -0700, Bryan Freed wrote:
[...]
> And as a more general question, why should we try not to put
> configuration in the device tree?  It seems like a great (and
> portable) place to put this stuff.
> It certainly seems better to have it there than hardwired in the
> kernel or tacked onto the kernel command line.

But then we have two in-kernel APIs to pass kernel parameters? So we'll
have to maintain two ways of passing the options for each driver. That is
hardly a good solution.

If you would like to see a convenient way to pass kernel/module options
via the device tree, I would suggest implementing something like this:

chosen {
kernel-options {
linux,pstore.record-size = 123;
linux,foo = "bar";
};
};

And then let the kernel translate all these to module_param_*().

I am still not sure about placing the options along with devices layout,
but if we go this route, then that is also viable:

pstore-node {
linux,pstore.record-size = 123;
};

And translate "linux,*" this to module_param_*().

How does that sound?

Thanks,
Anton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] perf fixes

2013-04-14 Thread Ingo Molnar

Linus,

Please pull the latest perf-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
perf-urgent-for-linus

   HEAD: c481420248c6730246d2a1b1773d5d7007ae0835 perf: Fix error return code

Misc fixlets.

 Thanks,

Ingo

-->
Chen Gang (3):
  perf: Fix strncpy() use, always make sure it's NUL terminated
  perf: Fix strncpy() use, use strlcpy() instead of strncpy()
  ftrace: Fix strncpy() use, use strlcpy() instead of strncpy()

Stephane Eranian (2):
  perf/x86: Fix uninitialized pt_regs in intel_pmu_drain_bts_buffer()
  perf: Fix ring_buffer perf_output_space() boundary calculation

Wei Yongjun (1):
  perf: Fix error return code


 arch/x86/kernel/cpu/perf_event_intel_ds.c |  3 ++-
 kernel/events/core.c  |  4 +++-
 kernel/events/internal.h  |  2 +-
 kernel/events/ring_buffer.c   | 22 ++
 kernel/trace/ftrace.c |  4 ++--
 kernel/trace/trace.c  |  4 ++--
 6 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c 
b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 826054a..f71c9f0 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -314,10 +314,11 @@ int intel_pmu_drain_bts_buffer(void)
if (top <= at)
return 0;
 
+   memset(®s, 0, sizeof(regs));
+
ds->bts_index = ds->bts_buffer_base;
 
perf_sample_data_init(&data, 0, event->hw.last_period);
-   regs.ip = 0;
 
/*
 * Prepare a generic sample, i.e. fill in the invariant fields.
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 59412d0..7e0962e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4737,7 +4737,8 @@ static void perf_event_mmap_event(struct perf_mmap_event 
*mmap_event)
} else {
if (arch_vma_name(mmap_event->vma)) {
name = strncpy(tmp, arch_vma_name(mmap_event->vma),
-  sizeof(tmp));
+  sizeof(tmp) - 1);
+   tmp[sizeof(tmp) - 1] = '\0';
goto got_name;
}
 
@@ -5986,6 +5987,7 @@ skip_type:
if (pmu->pmu_cpu_context)
goto got_cpu_context;
 
+   ret = -ENOMEM;
pmu->pmu_cpu_context = alloc_percpu(struct perf_cpu_context);
if (!pmu->pmu_cpu_context)
goto free_dev;
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index d56a64c..eb675c4 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -16,7 +16,7 @@ struct ring_buffer {
int page_order; /* allocation order  */
 #endif
int nr_pages;   /* nr of data pages  */
-   int writable;   /* are we writable   */
+   int overwrite;  /* can overwrite itself 
*/
 
atomic_tpoll;   /* POLL_ for wakeups */
 
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 23cb34f..97fddb0 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -18,12 +18,24 @@
 static bool perf_output_space(struct ring_buffer *rb, unsigned long tail,
  unsigned long offset, unsigned long head)
 {
-   unsigned long mask;
+   unsigned long sz = perf_data_size(rb);
+   unsigned long mask = sz - 1;
 
-   if (!rb->writable)
+   /*
+* check if user-writable
+* overwrite : over-write its own tail
+* !overwrite: buffer possibly drops events.
+*/
+   if (rb->overwrite)
return true;
 
-   mask = perf_data_size(rb) - 1;
+   /*
+* verify that payload is not bigger than buffer
+* otherwise masking logic may fail to detect
+* the "not enough space" condition
+*/
+   if ((head - offset) > sz)
+   return false;
 
offset = (offset - tail) & mask;
head   = (head   - tail) & mask;
@@ -212,7 +224,9 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, 
int flags)
rb->watermark = max_size / 2;
 
if (flags & RING_BUFFER_WRITABLE)
-   rb->writable = 1;
+   rb->overwrite = 0;
+   else
+   rb->overwrite = 1;
 
atomic_set(&rb->refcount, 1);
 
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 6893d5a..db14374 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3441,14 +3441,14 @@ static char ftrace_filter_buf[FTRACE_FILTER_SIZE] 
__initdata;
 
 static int __init set_ftrace_notrace(char *str)
 {
-   strncpy(ftrace_notrace_buf, str, FTRACE_FILTER_SIZE);
+   strlcpy(ftrace_notrace_buf, str, FTRACE_FILTER_SIZE);

[GIT PULL] scheduler fixes

2013-04-14 Thread Ingo Molnar

Linus,

Please pull the latest sched-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
sched-urgent-for-linus

   HEAD: e614b3332a4f3f264a26da28e5a1f4cc3aea3974 sched/cputime: Fix accounting 
on multi-threaded processes

Misc fixlets.

 Thanks,

Ingo

-->
Stanislaw Gruszka (1):
  sched/cputime: Fix accounting on multi-threaded processes

Tejun Heo (1):
  sched: Convert BUG_ON()s in try_to_wake_up_local() to WARN_ON_ONCE()s

Thomas Gleixner (1):
  sched_clock: Prevent 64bit inatomicity on 32bit systems

libin (1):
  sched/debug: Fix sd->*_idx limit range avoiding overflow


 kernel/sched/clock.c   | 26 ++
 kernel/sched/core.c|  8 +---
 kernel/sched/cputime.c |  2 +-
 3 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index c685e31..c3ae144 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -176,10 +176,36 @@ static u64 sched_clock_remote(struct sched_clock_data 
*scd)
u64 this_clock, remote_clock;
u64 *ptr, old_val, val;
 
+#if BITS_PER_LONG != 64
+again:
+   /*
+* Careful here: The local and the remote clock values need to
+* be read out atomic as we need to compare the values and
+* then update either the local or the remote side. So the
+* cmpxchg64 below only protects one readout.
+*
+* We must reread via sched_clock_local() in the retry case on
+* 32bit as an NMI could use sched_clock_local() via the
+* tracer and hit between the readout of
+* the low32bit and the high 32bit portion.
+*/
+   this_clock = sched_clock_local(my_scd);
+   /*
+* We must enforce atomic readout on 32bit, otherwise the
+* update on the remote cpu can hit inbetween the readout of
+* the low32bit and the high 32bit portion.
+*/
+   remote_clock = cmpxchg64(&scd->clock, 0, 0);
+#else
+   /*
+* On 64bit the read of [my]scd->clock is atomic versus the
+* update, so we can avoid the above 32bit dance.
+*/
sched_clock_local(my_scd);
 again:
this_clock = my_scd->clock;
remote_clock = scd->clock;
+#endif
 
/*
 * Use the opportunity that we have both locks
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b7b03cd..fa07792 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1488,8 +1488,10 @@ static void try_to_wake_up_local(struct task_struct *p)
 {
struct rq *rq = task_rq(p);
 
-   BUG_ON(rq != this_rq());
-   BUG_ON(p == current);
+   if (WARN_ON_ONCE(rq != this_rq()) ||
+   WARN_ON_ONCE(p == current))
+   return;
+
lockdep_assert_held(&rq->lock);
 
if (!raw_spin_trylock(&p->pi_lock)) {
@@ -4931,7 +4933,7 @@ static void sd_free_ctl_entry(struct ctl_table **tablep)
 }
 
 static int min_load_idx = 0;
-static int max_load_idx = CPU_LOAD_IDX_MAX;
+static int max_load_idx = CPU_LOAD_IDX_MAX-1;
 
 static void
 set_table_entry(struct ctl_table *entry,
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index ed12cbb..e93cca9 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -310,7 +310,7 @@ void thread_group_cputime(struct task_struct *tsk, struct 
task_cputime *times)
 
t = tsk;
do {
-   task_cputime(tsk, &utime, &stime);
+   task_cputime(t, &utime, &stime);
times->utime += utime;
times->stime += stime;
times->sum_exec_runtime += task_sched_runtime(t);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] x86 fixes

2013-04-14 Thread Ingo Molnar

Linus,

Please pull the latest x86-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
x86-urgent-for-linus

   HEAD: 26564600c9e88c6572a5e6ef5ae9121907edfb7f x86/mm: Flush lazy MMU when 
DEBUG_PAGEALLOC is set

Misc fixes.

 Thanks,

Ingo

-->
Andrea Arcangeli (2):
  x86/mm/cpa: Convert noop to functional fix
  x86/mm/cpa/selftest: Fix false positive in CPA self test

Boris Ostrovsky (2):
  x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal
  x86/mm: Flush lazy MMU when DEBUG_PAGEALLOC is set

Samu Kallio (1):
  x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates


 arch/x86/include/asm/paravirt.h   |  5 -
 arch/x86/include/asm/paravirt_types.h |  2 ++
 arch/x86/kernel/paravirt.c| 25 +
 arch/x86/lguest/boot.c|  1 +
 arch/x86/mm/fault.c   |  6 --
 arch/x86/mm/pageattr-test.c   |  2 +-
 arch/x86/mm/pageattr.c| 12 +++-
 arch/x86/xen/mmu.c|  1 +
 8 files changed, 33 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 5edd174..7361e47 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -703,7 +703,10 @@ static inline void arch_leave_lazy_mmu_mode(void)
PVOP_VCALL0(pv_mmu_ops.lazy_mode.leave);
 }
 
-void arch_flush_lazy_mmu_mode(void);
+static inline void arch_flush_lazy_mmu_mode(void)
+{
+   PVOP_VCALL0(pv_mmu_ops.lazy_mode.flush);
+}
 
 static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx,
phys_addr_t phys, pgprot_t flags)
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 142236e..b3b0ec1 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -91,6 +91,7 @@ struct pv_lazy_ops {
/* Set deferred update mode, used for batching operations. */
void (*enter)(void);
void (*leave)(void);
+   void (*flush)(void);
 };
 
 struct pv_time_ops {
@@ -679,6 +680,7 @@ void paravirt_end_context_switch(struct task_struct *next);
 
 void paravirt_enter_lazy_mmu(void);
 void paravirt_leave_lazy_mmu(void);
+void paravirt_flush_lazy_mmu(void);
 
 void _paravirt_nop(void);
 u32 _paravirt_ident_32(u32);
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 17fff18..8bfb335 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -263,6 +263,18 @@ void paravirt_leave_lazy_mmu(void)
leave_lazy(PARAVIRT_LAZY_MMU);
 }
 
+void paravirt_flush_lazy_mmu(void)
+{
+   preempt_disable();
+
+   if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_MMU) {
+   arch_leave_lazy_mmu_mode();
+   arch_enter_lazy_mmu_mode();
+   }
+
+   preempt_enable();
+}
+
 void paravirt_start_context_switch(struct task_struct *prev)
 {
BUG_ON(preemptible());
@@ -292,18 +304,6 @@ enum paravirt_lazy_mode paravirt_get_lazy_mode(void)
return this_cpu_read(paravirt_lazy_mode);
 }
 
-void arch_flush_lazy_mmu_mode(void)
-{
-   preempt_disable();
-
-   if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_MMU) {
-   arch_leave_lazy_mmu_mode();
-   arch_enter_lazy_mmu_mode();
-   }
-
-   preempt_enable();
-}
-
 struct pv_info pv_info = {
.name = "bare hardware",
.paravirt_enabled = 0,
@@ -475,6 +475,7 @@ struct pv_mmu_ops pv_mmu_ops = {
.lazy_mode = {
.enter = paravirt_nop,
.leave = paravirt_nop,
+   .flush = paravirt_nop,
},
 
.set_fixmap = native_set_fixmap,
diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
index 1cbd89c..7114c63 100644
--- a/arch/x86/lguest/boot.c
+++ b/arch/x86/lguest/boot.c
@@ -1334,6 +1334,7 @@ __init void lguest_init(void)
pv_mmu_ops.read_cr3 = lguest_read_cr3;
pv_mmu_ops.lazy_mode.enter = paravirt_enter_lazy_mmu;
pv_mmu_ops.lazy_mode.leave = lguest_leave_lazy_mmu_mode;
+   pv_mmu_ops.lazy_mode.flush = paravirt_flush_lazy_mmu;
pv_mmu_ops.pte_update = lguest_pte_update;
pv_mmu_ops.pte_update_defer = lguest_pte_update;
 
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2b97525..0e88336 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -378,10 +378,12 @@ static noinline __kprobes int vmalloc_fault(unsigned long 
address)
if (pgd_none(*pgd_ref))
return -1;
 
-   if (pgd_none(*pgd))
+   if (pgd_none(*pgd)) {
set_pgd(pgd, *pgd_ref);
-   else
+   arch_flush_lazy_mmu_mode();
+   } else {
BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref));
+   }
 
/*
 * Below here mismatches are bugs because these lower tables
diff --git a/arch/x86/mm/pageat

Re: [patch v7 0/21] sched: power aware scheduling

2013-04-14 Thread Borislav Petkov

On Sun, Apr 14, 2013 at 09:28:50AM +0800, Alex Shi wrote:
> Even some scenario the total energy cost more, at least the avg watts
> dropped in that scenarios.

Ok, what's wrong with x = 32 then? So basically if you're looking at
avg watts, you don't want to have more than 16 threads, otherwise
powersaving sucks on that particular uarch and platform. Can you say
that for all platforms out there?

Also, I've added in the columns below the Energy = Power * Time thing.

And the funny thing is, exactly there where avg watts is better in
powersaving, energy for workload retire is worse. And the other way
around. Basically, avg watts vs retire energy is reciprocal. Great :-\.

> Len said he has low p-state which can work there. but that's is
> different. I had sent some data in another email list to show the
> difference:
> 
> The following is 2 times kbuild testing result for 3 kinds condiation on
> SNB EP box, the middle column is the lowest p-state testing result, we
> can see, it has the lowest power consumption, also has the lowest
> performance/watts value.
> At least for kbuild benchmark, powersaving policy has the best
> compromise on powersaving and power efficient. Further more, due to cpu
> boost feature, it has better performance in some scenarios.
> 
>powersaving + ondemand  userspace + fixed 1.2GHz performance+ondemand
> x = 8231.318 /75 57   165.063 /166 36253.552 /63 62
> x = 16   280.357 /49 72   174.408 /106 54296.776 /41 82
> x = 32   325.206 /34 90   178.675 /90 62 314.153 /37 86
> 
> x = 8233.623 /74 57   164.507 /168 36254.775 /65 60
> x = 16   272.54  /38 96   174.364 /106 54297.731 /42 79
> x = 32   320.758 /34 91   177.917 /91 61 317.875 /35 89
> x = 64   326.837 /33 92   179.037 /90 62 320.615 /36 86

17348.850   27400.458  15973.776
13737.493   18487.248  12167.816
11057.004   16080.750  11623.661

17288.102   27637.176  16560.375
10356.5218482.584  12504.702
10905.772   16190.447  11125.625
10785.621   16113.330  11542.140

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:perf/urgent] perf: Fix error return code

2013-04-14 Thread tip-bot for Wei Yongjun

Commit-ID:  c481420248c6730246d2a1b1773d5d7007ae0835
Gitweb: http://git.kernel.org/tip/c481420248c6730246d2a1b1773d5d7007ae0835
Author: Wei Yongjun 
AuthorDate: Fri, 12 Apr 2013 11:05:54 +0800
Committer:  Ingo Molnar 
CommitDate: Fri, 12 Apr 2013 06:33:56 +0200

perf: Fix error return code

Fix to return -ENOMEM in the allocation error case instead of 0
(if pmu_bus_running == 1), as done elsewhere in this function.

Signed-off-by: Wei Yongjun 
Cc: a.p.zijls...@chello.nl
Cc: pau...@samba.org
Cc: a...@ghostprotocols.net
Link: 
http://lkml.kernel.org/r/capglhd8j_fwcgqe%3dklwjpbj%2b%3do0pw6z-seq%3dntpu08c2w1t...@mail.gmail.com
[ Tweaked the error code setting placement and the changelog. ]
Signed-off-by: Ingo Molnar 
---
 kernel/events/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7f0d67e..7e0962e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5987,6 +5987,7 @@ skip_type:
if (pmu->pmu_cpu_context)
goto got_cpu_context;
 
+   ret = -ENOMEM;
pmu->pmu_cpu_context = alloc_percpu(struct perf_cpu_context);
if (!pmu->pmu_cpu_context)
goto free_dev;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/urgent] x86/mm: Flush lazy MMU when DEBUG_PAGEALLOC is set

2013-04-14 Thread tip-bot for Boris Ostrovsky

Commit-ID:  26564600c9e88c6572a5e6ef5ae9121907edfb7f
Gitweb: http://git.kernel.org/tip/26564600c9e88c6572a5e6ef5ae9121907edfb7f
Author: Boris Ostrovsky 
AuthorDate: Thu, 11 Apr 2013 13:59:52 -0400
Committer:  Ingo Molnar 
CommitDate: Fri, 12 Apr 2013 07:19:19 +0200

x86/mm: Flush lazy MMU when DEBUG_PAGEALLOC is set

When CONFIG_DEBUG_PAGEALLOC is set page table updates made by
kernel_map_pages() are not made visible (via TLB flush)
immediately if lazy MMU is on. In environments that support lazy
MMU (e.g. Xen) this may lead to fatal page faults, for example,
when zap_pte_range() needs to allocate pages in
__tlb_remove_page() -> tlb_next_batch().

Signed-off-by: Boris Ostrovsky 
Cc: konrad.w...@oracle.com
Link: 
http://lkml.kernel.org/r/1365703192-2089-1-git-send-email-boris.ostrov...@oracle.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/mm/pageattr.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 7896f71..fb4e73e 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1413,6 +1413,8 @@ void kernel_map_pages(struct page *page, int numpages, 
int enable)
 * but that can deadlock->flush only current cpu:
 */
__flush_tlb_all();
+
+   arch_flush_lazy_mmu_mode();
 }
 
 #ifdef CONFIG_HIBERNATION
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/urgent] x86/mm/cpa/selftest: Fix false positive in CPA self test

2013-04-14 Thread tip-bot for Andrea Arcangeli

Commit-ID:  18699739b60cb60230153ff5475b2ba92be185f9
Gitweb: http://git.kernel.org/tip/18699739b60cb60230153ff5475b2ba92be185f9
Author: Andrea Arcangeli 
AuthorDate: Thu, 11 Apr 2013 15:36:09 +0200
Committer:  Ingo Molnar 
CommitDate: Fri, 12 Apr 2013 06:39:20 +0200

x86/mm/cpa/selftest: Fix false positive in CPA self test

If the pmd is not present, _PAGE_PSE will not be set anymore.
Fix the false positive.

Reported-by: Ingo Molnar 
Signed-off-by: Andrea Arcangeli 
Cc: Stefan Bader 
Cc: Andy Whitcroft 
Cc: Mel Gorman 
Cc: Borislav Petkov 
Link: 
http://lkml.kernel.org/r/1365687369-30802-1-git-send-email-aarca...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/mm/pageattr-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/pageattr-test.c b/arch/x86/mm/pageattr-test.c
index b008656..0e38951 100644
--- a/arch/x86/mm/pageattr-test.c
+++ b/arch/x86/mm/pageattr-test.c
@@ -68,7 +68,7 @@ static int print_split(struct split_state *s)
s->gpg++;
i += GPS/PAGE_SIZE;
} else if (level == PG_LEVEL_2M) {
-   if (!(pte_val(*pte) & _PAGE_PSE)) {
+   if ((pte_val(*pte) & _PAGE_PRESENT) && !(pte_val(*pte) 
& _PAGE_PSE)) {
printk(KERN_ERR
"%lx level %d but not PSE %Lx\n",
addr, level, (u64)pte_val(*pte));
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/5] ptrace/x86: Revert "hw_breakpoints: Fix racy access to ptrace breakpoints"

2013-04-14 Thread Oleg Nesterov

This reverts commit 87dc669ba25777b67796d7262c569429e58b1ed4.

The patch was fine but we can no longer race with SIGKILL after
9899d11f "ptrace: ensure arch_ptrace/ptrace_request can never race
with SIGKILL", the __TASK_TRACED tracee can't be woken up and
->ptrace_bps[] can't go away.

The patch only removes ptrace_get_breakpoints/ptrace_put_breakpoints
and does a couple of "while at it" cleanups, it doesn't remove other
changes from the reverted commit.

Signed-off-by: Oleg Nesterov 
---
 arch/x86/kernel/ptrace.c |   28 +---
 1 files changed, 5 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 29a8120..7a98b21 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -641,9 +641,6 @@ static int ptrace_write_dr7(struct task_struct *tsk, 
unsigned long data)
unsigned len, type;
struct perf_event *bp;
 
-   if (ptrace_get_breakpoints(tsk) < 0)
-   return -ESRCH;
-
data &= ~DR_CONTROL_RESERVED;
old_dr7 = ptrace_get_dr7(thread->ptrace_bps);
 restore:
@@ -692,9 +689,7 @@ restore:
goto restore;
}
 
-   ptrace_put_breakpoints(tsk);
-
-   return ((orig_ret < 0) ? orig_ret : rc);
+   return orig_ret < 0 ? orig_ret : rc;
 }
 
 /*
@@ -706,18 +701,10 @@ static unsigned long ptrace_get_debugreg(struct 
task_struct *tsk, int n)
unsigned long val = 0;
 
if (n < HBP_NUM) {
-   struct perf_event *bp;
+   struct perf_event *bp = thread->ptrace_bps[n];
 
-   if (ptrace_get_breakpoints(tsk) < 0)
-   return -ESRCH;
-
-   bp = thread->ptrace_bps[n];
-   if (!bp)
-   val = 0;
-   else
+   if (bp)
val = bp->hw.info.address;
-
-   ptrace_put_breakpoints(tsk);
} else if (n == 6) {
val = thread->debugreg6;
 } else if (n == 7) {
@@ -734,9 +721,6 @@ static int ptrace_set_breakpoint_addr(struct task_struct 
*tsk, int nr,
struct perf_event_attr attr;
int err = 0;
 
-   if (ptrace_get_breakpoints(tsk) < 0)
-   return -ESRCH;
-
if (!t->ptrace_bps[nr]) {
ptrace_breakpoint_init(&attr);
/*
@@ -762,7 +746,7 @@ static int ptrace_set_breakpoint_addr(struct task_struct 
*tsk, int nr,
 */
if (IS_ERR(bp)) {
err = PTR_ERR(bp);
-   goto put;
+   goto out;
}
 
t->ptrace_bps[nr] = bp;
@@ -773,9 +757,7 @@ static int ptrace_set_breakpoint_addr(struct task_struct 
*tsk, int nr,
attr.bp_addr = addr;
err = modify_user_hw_breakpoint(bp, &attr);
}
-
-put:
-   ptrace_put_breakpoints(tsk);
+out:
return err;
 }
 
-- 
1.5.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/5] ptrace/arm: Revert "hw_breakpoints: Fix racy access to ptrace breakpoints"

2013-04-14 Thread Oleg Nesterov

This reverts commit bf0b8f4b55e591ba417c2dbaff42769e1fc773b0.

The patch was fine but we can no longer race with SIGKILL after
9899d11f "ptrace: ensure arch_ptrace/ptrace_request can never race
with SIGKILL", the __TASK_TRACED tracee can't be woken up and
->ptrace_bps[] can't go away.

Signed-off-by: Oleg Nesterov 
Cc: Russell King 
Cc: Will Deacon 
---
 arch/arm/kernel/ptrace.c |8 
 1 files changed, 0 insertions(+), 8 deletions(-)

diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
index 03deeff..41668e5 100644
--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -886,20 +886,12 @@ long arch_ptrace(struct task_struct *child, long request,
 
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
case PTRACE_GETHBPREGS:
-   if (ptrace_get_breakpoints(child) < 0)
-   return -ESRCH;
-
ret = ptrace_gethbpregs(child, addr,
(unsigned long __user *)data);
-   ptrace_put_breakpoints(child);
break;
case PTRACE_SETHBPREGS:
-   if (ptrace_get_breakpoints(child) < 0)
-   return -ESRCH;
-
ret = ptrace_sethbpregs(child, addr,
(unsigned long __user *)data);
-   ptrace_put_breakpoints(child);
break;
 #endif
 
-- 
1.5.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/5] ptrace/sh: Revert "hw_breakpoints: Fix racy access to ptrace breakpoints"

2013-04-14 Thread Oleg Nesterov

This reverts commit e0ac8457d020c0289ea566917267da9e5e6d9865.

The patch was fine but we can no longer race with SIGKILL after
9899d11f "ptrace: ensure arch_ptrace/ptrace_request can never race
with SIGKILL", the __TASK_TRACED tracee can't be woken up and
->ptrace_bps[] can't go away.

Signed-off-by: Oleg Nesterov 
Cc: Paul Mundt 
---
 arch/sh/kernel/ptrace_32.c |4 
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/arch/sh/kernel/ptrace_32.c b/arch/sh/kernel/ptrace_32.c
index 81f999a..668c816 100644
--- a/arch/sh/kernel/ptrace_32.c
+++ b/arch/sh/kernel/ptrace_32.c
@@ -117,11 +117,7 @@ void user_enable_single_step(struct task_struct *child)
 
set_tsk_thread_flag(child, TIF_SINGLESTEP);
 
-   if (ptrace_get_breakpoints(child) < 0)
-   return;
-
set_single_step(child, pc);
-   ptrace_put_breakpoints(child);
 }
 
 void user_disable_single_step(struct task_struct *child)
-- 
1.5.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/5] ptrace: Revert "Prepare to fix racy accesses on task breakpoints"

2013-04-14 Thread Oleg Nesterov

This reverts commit bf26c018490c2fce7fe9b629083b96ce0e6ad019.

The patch was fine but we can no longer race with SIGKILL after
9899d11f "ptrace: ensure arch_ptrace/ptrace_request can never race
with SIGKILL", the __TASK_TRACED tracee can't be woken up and
->ptrace_bps[] can't go away.

Now that ptrace_get_breakpoints/ptrace_put_breakpoints have no
callers, we can kill them and remove task->ptrace_bp_refcnt.

Signed-off-by: Oleg Nesterov 
---
 include/linux/ptrace.h |   10 --
 include/linux/sched.h  |3 ---
 kernel/exit.c  |2 +-
 kernel/ptrace.c|   16 
 4 files changed, 1 insertions(+), 30 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 89573a3..07d0df6 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -142,9 +142,6 @@ static inline void ptrace_init_task(struct task_struct 
*child, bool ptrace)
 {
INIT_LIST_HEAD(&child->ptrace_entry);
INIT_LIST_HEAD(&child->ptraced);
-#ifdef CONFIG_HAVE_HW_BREAKPOINT
-   atomic_set(&child->ptrace_bp_refcnt, 1);
-#endif
child->jobctl = 0;
child->ptrace = 0;
child->parent = child->real_parent;
@@ -351,11 +348,4 @@ extern int task_current_syscall(struct task_struct 
*target, long *callno,
unsigned long args[6], unsigned int maxargs,
unsigned long *sp, unsigned long *pc);
 
-#ifdef CONFIG_HAVE_HW_BREAKPOINT
-extern int ptrace_get_breakpoints(struct task_struct *tsk);
-extern void ptrace_put_breakpoints(struct task_struct *tsk);
-#else
-static inline void ptrace_put_breakpoints(struct task_struct *tsk) { }
-#endif /* CONFIG_HAVE_HW_BREAKPOINT */
-
 #endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d35d2b6..89dc3e4 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1570,9 +1570,6 @@ struct task_struct {
} memcg_batch;
unsigned int memcg_kmem_skip_account;
 #endif
-#ifdef CONFIG_HAVE_HW_BREAKPOINT
-   atomic_t ptrace_bp_refcnt;
-#endif
 #ifdef CONFIG_UPROBES
struct uprobe_task *utask;
 #endif
diff --git a/kernel/exit.c b/kernel/exit.c
index 60bc027..0a66f6d 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -819,7 +819,7 @@ void do_exit(long code)
/*
 * FIXME: do that only when needed, using sched_exit tracepoint
 */
-   ptrace_put_breakpoints(tsk);
+   flush_ptrace_hw_breakpoint(tsk);
 
exit_notify(tsk, group_dead);
 #ifdef CONFIG_NUMA
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index acbd284..776ab3b 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1098,19 +1098,3 @@ asmlinkage long compat_sys_ptrace(compat_long_t request, 
compat_long_t pid,
return ret;
 }
 #endif /* CONFIG_COMPAT */
-
-#ifdef CONFIG_HAVE_HW_BREAKPOINT
-int ptrace_get_breakpoints(struct task_struct *tsk)
-{
-   if (atomic_inc_not_zero(&tsk->ptrace_bp_refcnt))
-   return 0;
-
-   return -1;
-}
-
-void ptrace_put_breakpoints(struct task_struct *tsk)
-{
-   if (atomic_dec_and_test(&tsk->ptrace_bp_refcnt))
-   flush_ptrace_hw_breakpoint(tsk);
-}
-#endif /* CONFIG_HAVE_HW_BREAKPOINT */
-- 
1.5.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/5] kill ptrace_{get,put}_breakpoints()

2013-04-14 Thread Oleg Nesterov

Hello.

Kill ptrace_{get,put}_breakpoints and task_struct->ptrace_bp_refcnt,
9899d11f "ptrace: ensure arch_ptrace/ptrace_request can never race
with SIGKILL" made this all unneeded.

Benjamin, Paul, arch_dup_task_struct()->flush_ptrace_hw_breakpoint(src)
on powerpc looks "obviously wrong". Don't we need

- flush_ptrace_hw_breakpoint(src);
+ dst->thread->ptrace_bps[0] = NULL;

?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:sched/core] sched: Document task_struct::personality field

2013-04-14 Thread tip-bot for Andrei Epure

Commit-ID:  9b89f6ba2ab56e4d9c00e7e591d6bc333137895e
Gitweb: http://git.kernel.org/tip/9b89f6ba2ab56e4d9c00e7e591d6bc333137895e
Author: Andrei Epure 
AuthorDate: Thu, 11 Apr 2013 20:30:29 +0300
Committer:  Ingo Molnar 
CommitDate: Fri, 12 Apr 2013 07:20:27 +0200

sched: Document task_struct::personality field

Signed-off-by: Andrei Epure 
Cc: Linus Torvalds 
Cc: Andrew Morton 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1365701429-4721-1-git-send-email-epure.and...@gmail.com
Signed-off-by: Ingo Molnar 
---
 include/linux/sched.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9004f6e..6bdaa73 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1105,8 +1105,10 @@ struct task_struct {
int exit_code, exit_signal;
int pdeath_signal;  /*  The signal sent when the parent dies  */
unsigned int jobctl;/* JOBCTL_*, siglock protected */
-   /* ??? */
+
+   /* Used for emulating ABI behavior of previous Linux versions */
unsigned int personality;
+
unsigned did_exec:1;
unsigned in_execve:1;   /* Tell the LSMs that the process is doing an
 * execve */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/5] ptrace/powerpc: Revert "hw_breakpoints: Fix racy access to ptrace breakpoints"

2013-04-14 Thread Oleg Nesterov

This reverts commit 07fa7a0a8a586c01a8b416358c7012dcb9dc688d and
removes ptrace_get/put_breakpoints() added by other commits.

The patch was fine but we can no longer race with SIGKILL after
9899d11f "ptrace: ensure arch_ptrace/ptrace_request can never race
with SIGKILL", the __TASK_TRACED tracee can't be woken up and
->ptrace_bps[] can't go away.

Signed-off-by: Oleg Nesterov 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
---
 arch/powerpc/kernel/ptrace.c |   20 
 1 files changed, 0 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index f9b30c6..d278e43 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -969,16 +969,12 @@ int ptrace_set_debugreg(struct task_struct *task, 
unsigned long addr,
hw_brk.type = (data & HW_BRK_TYPE_DABR) | HW_BRK_TYPE_PRIV_ALL;
hw_brk.len = 8;
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
-   if (ptrace_get_breakpoints(task) < 0)
-   return -ESRCH;
-
bp = thread->ptrace_bps[0];
if ((!data) || !(hw_brk.type & HW_BRK_TYPE_RDWR)) {
if (bp) {
unregister_hw_breakpoint(bp);
thread->ptrace_bps[0] = NULL;
}
-   ptrace_put_breakpoints(task);
return 0;
}
if (bp) {
@@ -991,11 +987,9 @@ int ptrace_set_debugreg(struct task_struct *task, unsigned 
long addr,
 
ret =  modify_user_hw_breakpoint(bp, &attr);
if (ret) {
-   ptrace_put_breakpoints(task);
return ret;
}
thread->ptrace_bps[0] = bp;
-   ptrace_put_breakpoints(task);
thread->hw_brk = hw_brk;
return 0;
}
@@ -1010,12 +1004,9 @@ int ptrace_set_debugreg(struct task_struct *task, 
unsigned long addr,
   ptrace_triggered, NULL, task);
if (IS_ERR(bp)) {
thread->ptrace_bps[0] = NULL;
-   ptrace_put_breakpoints(task);
return PTR_ERR(bp);
}
 
-   ptrace_put_breakpoints(task);
-
 #endif /* CONFIG_HAVE_HW_BREAKPOINT */
task->thread.hw_brk = hw_brk;
 #else /* CONFIG_PPC_ADV_DEBUG_REGS */
@@ -1434,9 +1425,6 @@ static long ppc_set_hwdebug(struct task_struct *child,
if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_WRITE)
brk.type |= HW_BRK_TYPE_WRITE;
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
-   if (ptrace_get_breakpoints(child) < 0)
-   return -ESRCH;
-
/*
 * Check if the request is for 'range' breakpoints. We can
 * support it if range < 8 bytes.
@@ -1444,12 +1432,10 @@ static long ppc_set_hwdebug(struct task_struct *child,
if (bp_info->addr_mode == PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE) {
len = bp_info->addr2 - bp_info->addr;
} else if (bp_info->addr_mode != PPC_BREAKPOINT_MODE_EXACT) {
-   ptrace_put_breakpoints(child);
return -EINVAL;
}
bp = thread->ptrace_bps[0];
if (bp) {
-   ptrace_put_breakpoints(child);
return -ENOSPC;
}
 
@@ -1463,11 +1449,9 @@ static long ppc_set_hwdebug(struct task_struct *child,
   ptrace_triggered, NULL, child);
if (IS_ERR(bp)) {
thread->ptrace_bps[0] = NULL;
-   ptrace_put_breakpoints(child);
return PTR_ERR(bp);
}
 
-   ptrace_put_breakpoints(child);
return 1;
 #endif /* CONFIG_HAVE_HW_BREAKPOINT */
 
@@ -1511,16 +1495,12 @@ static long ppc_del_hwdebug(struct task_struct *child, 
long data)
return -EINVAL;
 
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
-   if (ptrace_get_breakpoints(child) < 0)
-   return -ESRCH;
-
bp = thread->ptrace_bps[0];
if (bp) {
unregister_hw_breakpoint(bp);
thread->ptrace_bps[0] = NULL;
} else
ret = -ENOENT;
-   ptrace_put_breakpoints(child);
return ret;
 #else /* CONFIG_HAVE_HW_BREAKPOINT */
if (child->thread.hw_brk.address == 0)
-- 
1.5.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/mm] x86/mm/fixmap: Remove unused FIX_CYCLONE_TIMER

2013-04-14 Thread tip-bot for Paul Bolle

Commit-ID:  a7e6567585e513cb4e44387831cb75eb5b562cbb
Gitweb: http://git.kernel.org/tip/a7e6567585e513cb4e44387831cb75eb5b562cbb
Author: Paul Bolle 
AuthorDate: Thu, 11 Apr 2013 18:49:42 +0200
Committer:  Ingo Molnar 
CommitDate: Fri, 12 Apr 2013 07:21:18 +0200

x86/mm/fixmap: Remove unused FIX_CYCLONE_TIMER

The last users of FIX_CYCLONE_TIMER were removed in v2.6.18. We
can remove this unneeded constant.

Signed-off-by: Paul Bolle 
Link: http://lkml.kernel.org/r/1365698982.1427.3.camel@x61.thuisdomein
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/fixmap.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index a09c285..83ea2fb 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -107,9 +107,6 @@ enum fixed_addresses {
 #ifdef CONFIG_X86_F00F_BUG
FIX_F00F_IDT,   /* Virtual mapping for IDT */
 #endif
-#ifdef CONFIG_X86_CYCLONE_TIMER
-   FIX_CYCLONE_TIMER, /*cyclone timer register*/
-#endif
 #ifdef CONFIG_X86_32
FIX_KMAP_BEGIN, /* reserved pte's for temporary kernel mappings */
FIX_KMAP_END = FIX_KMAP_BEGIN+(KM_TYPE_NR*NR_CPUS)-1,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] algif_skcipher: Avoid crash if buffer is not multiple of cipher block size

2013-04-14 Thread Milan Broz

When user requests encryption (or decryption) of block which
is not aligned to cipher block size through userspace crypto
interface, an OOps like this can happen:

[  112.738285] BUG: unable to handle kernel paging request at e1c44840
[  112.738407] IP: [] scatterwalk_done+0x53/0x70
...
[  112.740515] Call Trace:
[  112.740588]  [] blkcipher_walk_done+0x160/0x1e0
[  112.740663]  [] blkcipher_walk_next+0x318/0x3c0
[  112.740737]  [] blkcipher_walk_first+0x70/0x160
[  112.740811]  [] blkcipher_walk_virt+0x17/0x20
[  112.740886]  [] cbc_encrypt+0x29/0x100 [aesni_intel]
[  112.740968]  [] ? get_user_pages_fast+0x123/0x150
[  112.741046]  [] ? trace_hardirqs_on+0xb/0x10
[  112.741119]  [] __ablk_encrypt+0x39/0x40 [ablk_helper]
[  112.741198]  [] ablk_encrypt+0x1a/0x70 [ablk_helper]
[  112.741275]  [] skcipher_recvmsg+0x20c/0x400 [algif_skcipher]
[  112.741359]  [] ? sched_clock_cpu+0x11d/0x1a0
[  112.741435]  [] ? find_get_page+0x79/0xc0
[  112.741509]  [] sock_aio_read+0x104/0x140
[  112.741580]  [] ? __do_fault+0x248/0x420
[  112.741650]  [] do_sync_read+0x97/0xd0
[  112.741719]  [] vfs_read+0x11d/0x140
[  112.741789]  [] ? sys_socketcall+0x2a3/0x320
[  112.741861]  [] sys_read+0x42/0x90
[  112.742578]  [] sysenter_do_call+0x12/0x32

Patch fixes it by simply rejecting buffer which is not multiple of cipher block.

(Bug is present in all stable kernels as well.)

Signed-off-by: Milan Broz 
---
 crypto/algif_skcipher.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
index 6a6dfc0..5f7713b 100644
--- a/crypto/algif_skcipher.c
+++ b/crypto/algif_skcipher.c
@@ -463,7 +463,7 @@ static int skcipher_recvmsg(struct kiocb *unused, struct 
socket *sock,
used -= used % bs;
 
err = -EINVAL;
-   if (!used)
+   if (!used || used % bs)
goto free;
 
ablkcipher_request_set_crypt(&ctx->req, sg,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] algif_skcipher: Avoid crash if buffer is not multiple of cipher block size

2013-04-14 Thread Milan Broz

On 04/14/2013 06:12 PM, Milan Broz wrote:
> When user requests encryption (or decryption) of block which
> is not aligned to cipher block size through userspace crypto
> interface, an OOps like this can happen

And this is a reproducer for the problem above...

Milan

/* 
 * Check for unaligned buffer to block cipher size in kernel crypto API
 * fixed by patch: http://article.gmane.org/gmane.linux.kernel.cryptoapi/7980
 * 
 * Compile with gcc test.c -o tst
 */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#ifndef AF_ALG
#define AF_ALG 38
#endif
#ifndef SOL_ALG
#define SOL_ALG 279
#endif

static int kernel_crypt(int opfd, const char *in, char *out, size_t length,
 const char *iv, size_t iv_length, uint32_t direction)
{
int r = 0;
ssize_t len;
struct af_alg_iv *alg_iv;
struct cmsghdr *header;
uint32_t *type;
struct iovec iov = {
.iov_base = (void*)(uintptr_t)in,
.iov_len = length,
};
int iv_msg_size = iv ? CMSG_SPACE(sizeof(*alg_iv) + iv_length) : 0;
char buffer[CMSG_SPACE(sizeof(type)) + iv_msg_size];
struct msghdr msg = {
.msg_control = buffer,
.msg_controllen = sizeof(buffer),
.msg_iov = &iov,
.msg_iovlen = 1,
};

if (!in || !out || !length)
return -EINVAL;

if ((!iv && iv_length) || (iv && !iv_length))
return -EINVAL;

memset(buffer, 0, sizeof(buffer));

/* Set encrypt/decrypt operation */
header = CMSG_FIRSTHDR(&msg);
header->cmsg_level = SOL_ALG;
header->cmsg_type = ALG_SET_OP;
header->cmsg_len = CMSG_LEN(sizeof(type));
type = (void*)CMSG_DATA(header);
*type = direction;

/* Set IV */
if (iv) {
header = CMSG_NXTHDR(&msg, header);
header->cmsg_level = SOL_ALG;
header->cmsg_type = ALG_SET_IV;
header->cmsg_len = iv_msg_size;
alg_iv = (void*)CMSG_DATA(header);
alg_iv->ivlen = iv_length;
memcpy(alg_iv->iv, iv, iv_length);
}

len = sendmsg(opfd, &msg, 0);
if (len != (ssize_t)length) {
r = -EIO;
goto bad;
}

len = read(opfd, out, length);
if (len != (ssize_t)length)
r = -EIO;
bad:
memset(buffer, 0, sizeof(buffer));
return r;
}

int main (int argc, char *argv[])
{
const char key[32] = "0123456789abcdef0123456789abcdef";
const char iv[16] =  "0001";
struct sockaddr_alg sa = {
.salg_family = AF_ALG,
.salg_type = "skcipher",
.salg_name = "cbc(aes)"
};
int tfmfd, opfd;
char *data;

if (posix_memalign((void*)&data, 4096, 32)) {
printf("Cannot alloc memory.\n");
return 1;
}

tfmfd = socket(AF_ALG, SOCK_SEQPACKET, 0);
if (tfmfd == -1)
goto bad;

if (bind(tfmfd, (struct sockaddr *)&sa, sizeof(sa)) == -1)
goto bad;

opfd = accept(tfmfd, NULL, 0);
if (opfd == -1)
goto bad;

if (setsockopt(tfmfd, SOL_ALG, ALG_SET_KEY, key, sizeof(key)) == -1)
goto bad;

if (kernel_crypt(opfd, data, data, 1, iv, sizeof(iv), ALG_OP_ENCRYPT) < 
0)
printf("Cannot encrypt data.\n");

close(tfmfd);
close(opfd);
free(data);
return 0;
bad:
printf("Cannot initialise cipher.\n");
return 1;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/8] dmaengine: ste_dma40: Do not configure channels during an channel allocation

2013-04-14 Thread Rabin Vincent

2013/4/12 Lee Jones :
> So I need to devise another way, as this function cannot be called
> here either. Using the dmaengine API, allocating a channel and
> configuring it are to be completed using different calls. Using the
> API correctly, there is no way the driver can setup the channel
> with all of the relevant information during allocation time.
>
> The steps are as follows:
>
> dma_request_channel() - here we only allot a channel number and
> allocate the appropriate resources for the
> channel.
>
> dma_slave_config()- this is where we're meant to configure the
> channel, so d40_config_write() needs to be
> called here, as only dma_slave_config() will
> carry the information required so as
> d40_*_cfg() can make the correct decisions.

The choice between whether a physical or a logical channel is used is
not something that is configurable via dma_slave_config().  And
d40_config_write() only needs that information, and that information is
available in dma_request_channel().  Therefore no more information
relevant to d40_config_write() will be obtained in dma_slave_config().
Hence d40_config_write() can be called in dma_request_channel().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/8] workqueue: advance concurrency management

2013-04-14 Thread Lai Jiangshan

I found the early-increasing nr_running in wq_worker_waking_up() is useless
in many cases. it tries to avoid waking up idle workers for pending work item.
but delay increasing nr_running does not increase waking up idle workers.

so we delay increasing and remove wq_worker_waking_up() and ...

enjoy a simpler concurrency management.

Lai Jiangshan (8):
  workqueue: remove @cpu from wq_worker_sleeping()
  workqueue: use create_and_start_worker() in manage_workers()
  workqueue: remove cpu_intensive from process_one_work()
  workqueue: quit cm mode when sleeping
  workqueue: remove disabled wq_worker_waking_up()
  workqueue: make nr_running non-atomic
  workqueue: move worker->flags up
  workqueue: rename ->nr_running to ->nr_cm_workers

 kernel/sched/core.c |6 +-
 kernel/workqueue.c  |  234 +++---
 kernel/workqueue_internal.h |9 +-
 3 files changed, 89 insertions(+), 160 deletions(-)

-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/8] workqueue: remove @cpu from wq_worker_sleeping()

2013-04-14 Thread Lai Jiangshan

WARN_ON_ONCE(cpu != raw_smp_processor_id()) in
wq_worker_sleeping() in useless, the caller ensures
cpu == raw_smp_processor_id().

We should use WARN_ON_ONCE(pool->cpu != raw_smp_processor_id())
to do the expected test.

It results @cpu removed from wq_worker_sleeping()

Signed-off-by: Lai Jiangshan 
---
 kernel/sched/core.c |2 +-
 kernel/workqueue.c  |7 +++
 kernel/workqueue_internal.h |2 +-
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 23606ee..ffc06ad 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2907,7 +2907,7 @@ need_resched:
if (prev->flags & PF_WQ_WORKER) {
struct task_struct *to_wakeup;
 
-   to_wakeup = wq_worker_sleeping(prev, cpu);
+   to_wakeup = wq_worker_sleeping(prev);
if (to_wakeup)
try_to_wake_up_local(to_wakeup);
}
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c273376..b3095ad 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -807,7 +807,6 @@ void wq_worker_waking_up(struct task_struct *task, int cpu)
 /**
  * wq_worker_sleeping - a worker is going to sleep
  * @task: task going to sleep
- * @cpu: CPU in question, must be the current CPU number
  *
  * This function is called during schedule() when a busy worker is
  * going to sleep.  Worker on the same cpu can be woken up by
@@ -817,9 +816,9 @@ void wq_worker_waking_up(struct task_struct *task, int cpu)
  * spin_lock_irq(rq->lock)
  *
  * RETURNS:
- * Worker task on @cpu to wake up, %NULL if none.
+ * Worker task on the same pool to wake up, %NULL if none.
  */
-struct task_struct *wq_worker_sleeping(struct task_struct *task, int cpu)
+struct task_struct *wq_worker_sleeping(struct task_struct *task)
 {
struct worker *worker = kthread_data(task), *to_wakeup = NULL;
struct worker_pool *pool;
@@ -835,7 +834,7 @@ struct task_struct *wq_worker_sleeping(struct task_struct 
*task, int cpu)
pool = worker->pool;
 
/* this can only happen on the local cpu */
-   if (WARN_ON_ONCE(cpu != raw_smp_processor_id()))
+   if (WARN_ON_ONCE(pool->cpu != raw_smp_processor_id()))
return NULL;
 
/*
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 84ab6e1..aec8df4 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -57,6 +57,6 @@ static inline struct worker *current_wq_worker(void)
  * sched.c and workqueue.c.
  */
 void wq_worker_waking_up(struct task_struct *task, int cpu);
-struct task_struct *wq_worker_sleeping(struct task_struct *task, int cpu);
+struct task_struct *wq_worker_sleeping(struct task_struct *task);
 
 #endif /* _KERNEL_WORKQUEUE_INTERNAL_H */
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/8] workqueue: use create_and_start_worker() in manage_workers()

2013-04-14 Thread Lai Jiangshan

After we allocated worker, we are free to access the worker without and
protection before it is visiable/published.

In old code, worker is published by start_worker(), and it is visiable only
after start_worker(), but in current code, it is visiable by
for_each_pool_worker() after
"idr_replace(&pool->worker_idr, worker, worker->id);"

It means the step of publishing worker is not atomic, it is very fragile.
(although I did not find any bug from it in current code). it should be fixed.

It can be fixed by moving "idr_replace(&pool->worker_idr, worker, worker->id);"
to start_worker() or by folding start_worker() in to create_worker().

I choice the second one. It makes the code much simple.

Signed-off-by: Lai Jiangshan 
---
 kernel/workqueue.c |   62 +++
 1 files changed, 18 insertions(+), 44 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b3095ad..d1e10c5 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -64,7 +64,7 @@ enum {
 *
 * Note that DISASSOCIATED should be flipped only while holding
 * manager_mutex to avoid changing binding state while
-* create_worker() is in progress.
+* create_and_start_worker_locked() is in progress.
 */
POOL_MANAGE_WORKERS = 1 << 0,   /* need to manage workers */
POOL_DISASSOCIATED  = 1 << 2,   /* cpu can't serve workers */
@@ -1542,7 +1542,10 @@ static void worker_enter_idle(struct worker *worker)
 (worker->hentry.next || worker->hentry.pprev)))
return;
 
-   /* can't use worker_set_flags(), also called from start_worker() */
+   /*
+* can't use worker_set_flags(), also called from
+* create_and_start_worker_locked().
+*/
worker->flags |= WORKER_IDLE;
pool->nr_idle++;
worker->last_active = jiffies;
@@ -1663,12 +1666,10 @@ static struct worker *alloc_worker(void)
 }
 
 /**
- * create_worker - create a new workqueue worker
+ * create_and_start_worker_locked - create and start a worker for a pool
  * @pool: pool the new worker will belong to
  *
- * Create a new worker which is bound to @pool.  The returned worker
- * can be started by calling start_worker() or destroyed using
- * destroy_worker().
+ * Create a new worker which is bound to @pool and start it.
  *
  * CONTEXT:
  * Might sleep.  Does GFP_KERNEL allocations.
@@ -1676,7 +1677,7 @@ static struct worker *alloc_worker(void)
  * RETURNS:
  * Pointer to the newly created worker.
  */
-static struct worker *create_worker(struct worker_pool *pool)
+static struct worker *create_and_start_worker_locked(struct worker_pool *pool)
 {
struct worker *worker = NULL;
int id = -1;
@@ -1734,9 +1735,15 @@ static struct worker *create_worker(struct worker_pool 
*pool)
if (pool->flags & POOL_DISASSOCIATED)
worker->flags |= WORKER_UNBOUND;
 
-   /* successful, commit the pointer to idr */
spin_lock_irq(&pool->lock);
+   /* successful, commit the pointer to idr */
idr_replace(&pool->worker_idr, worker, worker->id);
+
+   /* start worker */
+   worker->flags |= WORKER_STARTED;
+   worker->pool->nr_workers++;
+   worker_enter_idle(worker);
+   wake_up_process(worker->task);
spin_unlock_irq(&pool->lock);
 
return worker;
@@ -1752,23 +1759,6 @@ fail:
 }
 
 /**
- * start_worker - start a newly created worker
- * @worker: worker to start
- *
- * Make the pool aware of @worker and start it.
- *
- * CONTEXT:
- * spin_lock_irq(pool->lock).
- */
-static void start_worker(struct worker *worker)
-{
-   worker->flags |= WORKER_STARTED;
-   worker->pool->nr_workers++;
-   worker_enter_idle(worker);
-   wake_up_process(worker->task);
-}
-
-/**
  * create_and_start_worker - create and start a worker for a pool
  * @pool: the target pool
  *
@@ -1779,14 +1769,7 @@ static int create_and_start_worker(struct worker_pool 
*pool)
struct worker *worker;
 
mutex_lock(&pool->manager_mutex);
-
-   worker = create_worker(pool);
-   if (worker) {
-   spin_lock_irq(&pool->lock);
-   start_worker(worker);
-   spin_unlock_irq(&pool->lock);
-   }
-
+   worker = create_and_start_worker_locked(pool);
mutex_unlock(&pool->manager_mutex);
 
return worker ? 0 : -ENOMEM;
@@ -1934,17 +1917,8 @@ restart:
mod_timer(&pool->mayday_timer, jiffies + MAYDAY_INITIAL_TIMEOUT);
 
while (true) {
-   struct worker *worker;
-
-   worker = create_worker(pool);
-   if (worker) {
-   del_timer_sync(&pool->mayday_timer);
-   spin_lock_irq(&pool->lock);
-   start_worker(worker);
-   if (WARN_ON_ONCE(need_to_create_worker(pool)))
-   goto restart;
-   return true;
-

[PATCH 3/8] workqueue: remove cpu_intensive from process_one_work()

2013-04-14 Thread Lai Jiangshan

In process_one_work(), we can use "worker->flags & WORKER_CPU_INTENSIVE"
instead "cpu_intensive" and because worker->flags is hot field
(accessed when process each work item). so this change will not cause
any performance down.

It prepare for also clearing WORKER_QUIT_CM in the same place.

Signed-off-by: Lai Jiangshan 
---
 kernel/workqueue.c |7 +++
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index d1e10c5..a4bc589 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2068,7 +2068,6 @@ __acquires(&pool->lock)
 {
struct pool_workqueue *pwq = get_work_pwq(work);
struct worker_pool *pool = worker->pool;
-   bool cpu_intensive = pwq->wq->flags & WQ_CPU_INTENSIVE;
int work_color;
struct worker *collision;
 #ifdef CONFIG_LOCKDEP
@@ -2118,7 +2117,7 @@ __acquires(&pool->lock)
 * CPU intensive works don't participate in concurrency
 * management.  They're the scheduler's responsibility.
 */
-   if (unlikely(cpu_intensive))
+   if (unlikely(pwq->wq->flags & WQ_CPU_INTENSIVE))
worker_set_flags(worker, WORKER_CPU_INTENSIVE, true);
 
/*
@@ -2161,8 +2160,8 @@ __acquires(&pool->lock)
 
spin_lock_irq(&pool->lock);
 
-   /* clear cpu intensive status */
-   if (unlikely(cpu_intensive))
+   /* clear cpu intensive status if it is set */
+   if (unlikely(worker->flags & WORKER_CPU_INTENSIVE))
worker_clr_flags(worker, WORKER_CPU_INTENSIVE);
 
/* we're done with it, release */
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/8] workqueue: quit cm mode when sleeping

2013-04-14 Thread Lai Jiangshan

When a work is waken up from sleeping, it makes very small sense if
we still consider this worker is RUNNING(in view of concurrency management)
o   if the work goes to sleep again, it is not RUNNING again.
o   if the work runs long without sleeping, the worker should be consider
as CPU_INTENSIVE.
o   if the work runs short without sleeping, we can still consider
this worker is not RUNNING this harmless short time,
and fix it up before next work.

o   In almost all cases, the increasing nr_running does not increase
nr_running from 0. there are other RUNNING workers, the other
workers will not goto sleeping very probably before this worker
finishes the work in may cases. this early increasing makes less
sense.

So don't need consider this worker is RUNNING so early and
we can delay increasing nr_running a little. we increase it after
finished the work.

It is done by adding a new worker flag: WORKER_QUIT_CM.
it used for disabling increasing nr_running in wq_worker_waking_up(),
and for increasing nr_running after finished the work.

This change maybe cause we wakeup(or create) more workers in raw case,
but this is not incorrect.

It make the currency management much more simpler

Signed-off-by: Lai Jiangshan 
---
 kernel/workqueue.c  |   20 ++--
 kernel/workqueue_internal.h |2 +-
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index a4bc589..668e9b7 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -75,11 +75,13 @@ enum {
WORKER_DIE  = 1 << 1,   /* die die die */
WORKER_IDLE = 1 << 2,   /* is idle */
WORKER_PREP = 1 << 3,   /* preparing to run works */
+   WORKER_QUIT_CM  = 1 << 4,   /* quit concurrency managed */
WORKER_CPU_INTENSIVE= 1 << 6,   /* cpu intensive */
WORKER_UNBOUND  = 1 << 7,   /* worker is unbound */
WORKER_REBOUND  = 1 << 8,   /* worker was rebound */
 
-   WORKER_NOT_RUNNING  = WORKER_PREP | WORKER_CPU_INTENSIVE |
+   WORKER_NOT_RUNNING  = WORKER_PREP | WORKER_QUIT_CM |
+ WORKER_CPU_INTENSIVE |
  WORKER_UNBOUND | WORKER_REBOUND,
 
NR_STD_WORKER_POOLS = 2,/* # standard pools per cpu */
@@ -122,6 +124,10 @@ enum {
  *cpu or grabbing pool->lock is enough for read access.  If
  *POOL_DISASSOCIATED is set, it's identical to L.
  *
+ * LI: If POOL_DISASSOCIATED is NOT set, read/modification access should be
+ * done with local IRQ-disabled and only from local cpu.
+ * If POOL_DISASSOCIATED is set, it's identical to L.
+ *
  * MG: pool->manager_mutex and pool->lock protected.  Writes require both
  * locks.  Reads can happen under either lock.
  *
@@ -843,11 +849,13 @@ struct task_struct *wq_worker_sleeping(struct task_struct 
*task)
 * Please read comment there.
 *
 * NOT_RUNNING is clear.  This means that we're bound to and
-* running on the local cpu w/ rq lock held and preemption
+* running on the local cpu w/ rq lock held and preemption/irq
 * disabled, which in turn means that none else could be
 * manipulating idle_list, so dereferencing idle_list without pool
-* lock is safe.
+* lock is safe. And which in turn also means that we can
+* manipulating worker->flags.
 */
+   worker->flags |= WORKER_QUIT_CM;
if (atomic_dec_and_test(&pool->nr_running) &&
!list_empty(&pool->worklist))
to_wakeup = first_worker(pool);
@@ -2160,9 +2168,9 @@ __acquires(&pool->lock)
 
spin_lock_irq(&pool->lock);
 
-   /* clear cpu intensive status if it is set */
-   if (unlikely(worker->flags & WORKER_CPU_INTENSIVE))
-   worker_clr_flags(worker, WORKER_CPU_INTENSIVE);
+   /* clear cpu intensive status or WORKER_QUIT_CM if they are set */
+   if (unlikely(worker->flags & (WORKER_CPU_INTENSIVE | WORKER_QUIT_CM)))
+   worker_clr_flags(worker, WORKER_CPU_INTENSIVE | WORKER_QUIT_CM);
 
/* we're done with it, release */
hash_del(&worker->hentry);
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index aec8df4..1713ae7 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -35,7 +35,7 @@ struct worker {
/* L: for rescuers */
/* 64 bytes boundary on 64bit, 32 on 32bit */
unsigned long   last_active;/* L: last active timestamp */
-   unsigned intflags;  /* X: flags */
+   unsigned intflags;  /* LI: flags */
int id; /* I: worker id */
 
/* used only by rescuers to point to the target workqueue */
--

[PATCH 5/8] workqueue: remove disabled wq_worker_waking_up()

2013-04-14 Thread Lai Jiangshan

When a worker is sleeping, its flags has WORKER_QUIT_CM, which means
worker->flags & WORKER_NOT_RUNNING is always non-zero, and which means
wq_worker_waking_up() is disabled.

so we removed wq_worker_waking_up(). (the access to worker->flags
in wq_worker_waking_up() is not protected by "LI". after this, it is alwasy
protected by "LI")

The patch also do these changes after removal:
1) because wq_worker_waking_up() is removed, we don't need schedule()
   before zapping nr_running in wq_unbind_fn(), and don't need to
   release/regain pool->lock.
2) the sanity check in worker_enter_idle() is changed to also check for
   unbound/disassociated pools. (because the above change and nr_running
   is expected always reliable in worker_enter_idle() now.)

Signed-off-by: Lai Jiangshan 
---
 kernel/sched/core.c |4 ---
 kernel/workqueue.c  |   58 +++---
 kernel/workqueue_internal.h |3 +-
 3 files changed, 11 insertions(+), 54 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ffc06ad..18f95884 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1276,10 +1276,6 @@ static void ttwu_activate(struct rq *rq, struct 
task_struct *p, int en_flags)
 {
activate_task(rq, p, en_flags);
p->on_rq = 1;
-
-   /* if a worker is waking up, notify workqueue */
-   if (p->flags & PF_WQ_WORKER)
-   wq_worker_waking_up(p, cpu_of(rq));
 }
 
 /*
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 668e9b7..9f1ebdf 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -790,27 +790,6 @@ static void wake_up_worker(struct worker_pool *pool)
 }
 
 /**
- * wq_worker_waking_up - a worker is waking up
- * @task: task waking up
- * @cpu: CPU @task is waking up to
- *
- * This function is called during try_to_wake_up() when a worker is
- * being awoken.
- *
- * CONTEXT:
- * spin_lock_irq(rq->lock)
- */
-void wq_worker_waking_up(struct task_struct *task, int cpu)
-{
-   struct worker *worker = kthread_data(task);
-
-   if (!(worker->flags & WORKER_NOT_RUNNING)) {
-   WARN_ON_ONCE(worker->pool->cpu != cpu);
-   atomic_inc(&worker->pool->nr_running);
-   }
-}
-
-/**
  * wq_worker_sleeping - a worker is going to sleep
  * @task: task going to sleep
  *
@@ -1564,14 +1543,8 @@ static void worker_enter_idle(struct worker *worker)
if (too_many_workers(pool) && !timer_pending(&pool->idle_timer))
mod_timer(&pool->idle_timer, jiffies + IDLE_WORKER_TIMEOUT);
 
-   /*
-* Sanity check nr_running.  Because wq_unbind_fn() releases
-* pool->lock between setting %WORKER_UNBOUND and zapping
-* nr_running, the warning may trigger spuriously.  Check iff
-* unbind is not in progress.
-*/
-   WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
-pool->nr_workers == pool->nr_idle &&
+   /* Sanity check nr_running. */
+   WARN_ON_ONCE(pool->nr_workers == pool->nr_idle &&
 atomic_read(&pool->nr_running));
 }
 
@@ -4385,24 +4358,12 @@ static void wq_unbind_fn(struct work_struct *work)
 
pool->flags |= POOL_DISASSOCIATED;
 
-   spin_unlock_irq(&pool->lock);
-   mutex_unlock(&pool->manager_mutex);
-
/*
-* Call schedule() so that we cross rq->lock and thus can
-* guarantee sched callbacks see the %WORKER_UNBOUND flag.
-* This is necessary as scheduler callbacks may be invoked
-* from other cpus.
-*/
-   schedule();
-
-   /*
-* Sched callbacks are disabled now.  Zap nr_running.
-* After this, nr_running stays zero and need_more_worker()
-* and keep_working() are always true as long as the
-* worklist is not empty.  This pool now behaves as an
-* unbound (in terms of concurrency management) pool which
-* are served by workers tied to the pool.
+* Zap nr_running. After this, nr_running stays zero
+* and need_more_worker() and keep_working() are always true
+* as long as the worklist is not empty.  This pool now
+* behaves as an unbound (in terms of concurrency management)
+* pool which are served by workers tied to the pool.
 */
atomic_set(&pool->nr_running, 0);
 
@@ -4411,9 +4372,9 @@ static void wq_unbind_fn(struct work_struct *work)
 * worker blocking could lead to lengthy stalls.  Kick off
 * unbound chain execution of currently pending work items.
 */
-   spin_lock_irq(&pool->lock);
wake_up_worker(pool);
spin_unlock_irq(&pool->lock);
+   mutex_unlock(&pool->manager_mutex);
}
 }
 
@@ -4466,9 +4427,10 @@ static void

[PATCH 6/8] workqueue: make nr_running non-atomic

2013-04-14 Thread Lai Jiangshan

Now, nr_running is accessed only with local IRQ-disabled and only from local
cpu if the pool is assocated.(execpt read-access in insert_work()).

so we convert it to non-atomic to reduce the overhead of atomic.
It is protected by "LI"

Signed-off-by: Lai Jiangshan 
---
 kernel/workqueue.c |   49 +
 1 files changed, 21 insertions(+), 28 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9f1ebdf..25e2e5a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -150,6 +150,7 @@ struct worker_pool {
int node;   /* I: the associated node ID */
int id; /* I: pool ID */
unsigned intflags;  /* X: flags */
+   int nr_running; /* LI: count for running */
 
struct list_headworklist;   /* L: list of pending works */
int nr_workers; /* L: total number of workers */
@@ -175,13 +176,6 @@ struct worker_pool {
int refcnt; /* PL: refcnt for unbound pools 
*/
 
/*
-* The current concurrency level.  As it's likely to be accessed
-* from other CPUs during try_to_wake_up(), put it in a separate
-* cacheline.
-*/
-   atomic_tnr_running cacheline_aligned_in_smp;
-
-   /*
 * Destruction of pool is sched-RCU protected to allow dereferences
 * from get_work_pool().
 */
@@ -700,7 +694,7 @@ static bool work_is_canceling(struct work_struct *work)
 
 static bool __need_more_worker(struct worker_pool *pool)
 {
-   return !atomic_read(&pool->nr_running);
+   return !pool->nr_running;
 }
 
 /*
@@ -725,8 +719,7 @@ static bool may_start_working(struct worker_pool *pool)
 /* Do I need to keep working?  Called from currently running workers. */
 static bool keep_working(struct worker_pool *pool)
 {
-   return !list_empty(&pool->worklist) &&
-   atomic_read(&pool->nr_running) <= 1;
+   return !list_empty(&pool->worklist) && pool->nr_running <= 1;
 }
 
 /* Do we need a new worker?  Called from manager. */
@@ -823,21 +816,24 @@ struct task_struct *wq_worker_sleeping(struct task_struct 
*task)
return NULL;
 
/*
-* The counterpart of the following dec_and_test, implied mb,
-* worklist not empty test sequence is in insert_work().
-* Please read comment there.
-*
 * NOT_RUNNING is clear.  This means that we're bound to and
 * running on the local cpu w/ rq lock held and preemption/irq
 * disabled, which in turn means that none else could be
 * manipulating idle_list, so dereferencing idle_list without pool
 * lock is safe. And which in turn also means that we can
-* manipulating worker->flags.
+* manipulating worker->flags and pool->nr_running.
 */
worker->flags |= WORKER_QUIT_CM;
-   if (atomic_dec_and_test(&pool->nr_running) &&
-   !list_empty(&pool->worklist))
-   to_wakeup = first_worker(pool);
+   if (--pool->nr_running == 0) {
+   /*
+* This smp_mb() forces a mb between decreasing nr_running
+* and reading worklist. It paires with the smp_mb() in
+* insert_work(). Please read comment there.
+*/
+   smp_mb();
+   if (!list_empty(&pool->worklist))
+   to_wakeup = first_worker(pool);
+   }
return to_wakeup ? to_wakeup->task : NULL;
 }
 
@@ -868,12 +864,10 @@ static inline void worker_set_flags(struct worker 
*worker, unsigned int flags,
 */
if ((flags & WORKER_NOT_RUNNING) &&
!(worker->flags & WORKER_NOT_RUNNING)) {
-   if (wakeup) {
-   if (atomic_dec_and_test(&pool->nr_running) &&
-   !list_empty(&pool->worklist))
-   wake_up_worker(pool);
-   } else
-   atomic_dec(&pool->nr_running);
+   pool->nr_running--;
+   if (wakeup && !pool->nr_running &&
+   !list_empty(&pool->worklist))
+   wake_up_worker(pool);
}
 
worker->flags |= flags;
@@ -905,7 +899,7 @@ static inline void worker_clr_flags(struct worker *worker, 
unsigned int flags)
 */
if ((flags & WORKER_NOT_RUNNING) && (oflags & WORKER_NOT_RUNNING))
if (!(worker->flags & WORKER_NOT_RUNNING))
-   atomic_inc(&pool->nr_running);
+   pool->nr_running++;
 }
 
 /**
@@ -1544,8 +1538,7 @@ static void worker_enter_idle(struct worker *worker)
mod_timer(&pool->idle_timer, jiffies + IDLE_WORKER_TIMEOUT);
 
/* Sanity check nr_running. */
-   WARN_ON_ONCE(pool->nr_workers == pool->nr_idle &&
-

[PATCH 7/8] workqueue: move worker->flags up

2013-04-14 Thread Lai Jiangshan

worker->flags is hot field(accessed when process each work item).
Move it up the the first 64 bytes(32 byte in 32bis) which are
hot fields.

And move colder field worker->task down to ensure worker->pool is
still in the first 64 bytes.

Signed-off-by: Lai Jiangshan 
---
 kernel/workqueue_internal.h |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index e9fd05f..63cfac7 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -20,6 +20,7 @@ struct worker_pool;
  * Only to be used in workqueue and async.
  */
 struct worker {
+   unsigned intflags;  /* LI: flags */
/* on idle list while idle, on busy hash table while busy */
union {
struct list_headentry;  /* L: while idle */
@@ -30,12 +31,11 @@ struct worker {
work_func_t current_func;   /* L: current_work's fn */
struct pool_workqueue   *current_pwq; /* L: current_work's pwq */
struct list_headscheduled;  /* L: scheduled works */
-   struct task_struct  *task;  /* I: worker task */
struct worker_pool  *pool;  /* I: the associated pool */
/* L: for rescuers */
/* 64 bytes boundary on 64bit, 32 on 32bit */
+   struct task_struct  *task;  /* I: worker task */
unsigned long   last_active;/* L: last active timestamp */
-   unsigned intflags;  /* LI: flags */
int id; /* I: worker id */
 
/* used only by rescuers to point to the target workqueue */
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 8/8] workqueue: rename ->nr_running to ->nr_cm_workers

2013-04-14 Thread Lai Jiangshan

nr_running is not a good name, the reviewers may think they are non-sleeping
busy workers. nr_running is actually a counter for concurrency managed
workers. renaming it to nr_cm_workers would be better.

s/nr_running/nr_cm_workers/
s/NOT_RUNNING/NOT_CM/
manually tune a little(indent and the comment for nr_cm_workers)

Signed-off-by: Lai Jiangshan 
---
 kernel/workqueue.c |   69 +--
 1 files changed, 34 insertions(+), 35 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 25e2e5a..25e028c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -80,7 +80,7 @@ enum {
WORKER_UNBOUND  = 1 << 7,   /* worker is unbound */
WORKER_REBOUND  = 1 << 8,   /* worker was rebound */
 
-   WORKER_NOT_RUNNING  = WORKER_PREP | WORKER_QUIT_CM |
+   WORKER_NOT_CM   = WORKER_PREP | WORKER_QUIT_CM |
  WORKER_CPU_INTENSIVE |
  WORKER_UNBOUND | WORKER_REBOUND,
 
@@ -150,7 +150,7 @@ struct worker_pool {
int node;   /* I: the associated node ID */
int id; /* I: pool ID */
unsigned intflags;  /* X: flags */
-   int nr_running; /* LI: count for running */
+   int nr_cm_workers;  /* LI: count for cm workers */
 
struct list_headworklist;   /* L: list of pending works */
int nr_workers; /* L: total number of workers */
@@ -694,14 +694,14 @@ static bool work_is_canceling(struct work_struct *work)
 
 static bool __need_more_worker(struct worker_pool *pool)
 {
-   return !pool->nr_running;
+   return !pool->nr_cm_workers;
 }
 
 /*
  * Need to wake up a worker?  Called from anything but currently
  * running workers.
  *
- * Note that, because unbound workers never contribute to nr_running, this
+ * Note that, because unbound workers never contribute to nr_cm_workers, this
  * function will always return %true for unbound pools as long as the
  * worklist isn't empty.
  */
@@ -719,7 +719,7 @@ static bool may_start_working(struct worker_pool *pool)
 /* Do I need to keep working?  Called from currently running workers. */
 static bool keep_working(struct worker_pool *pool)
 {
-   return !list_empty(&pool->worklist) && pool->nr_running <= 1;
+   return !list_empty(&pool->worklist) && pool->nr_cm_workers <= 1;
 }
 
 /* Do we need a new worker?  Called from manager. */
@@ -804,9 +804,9 @@ struct task_struct *wq_worker_sleeping(struct task_struct 
*task)
/*
 * Rescuers, which may not have all the fields set up like normal
 * workers, also reach here, let's not access anything before
-* checking NOT_RUNNING.
+* checking NOT_CM.
 */
-   if (worker->flags & WORKER_NOT_RUNNING)
+   if (worker->flags & WORKER_NOT_CM)
return NULL;
 
pool = worker->pool;
@@ -816,17 +816,17 @@ struct task_struct *wq_worker_sleeping(struct task_struct 
*task)
return NULL;
 
/*
-* NOT_RUNNING is clear.  This means that we're bound to and
+* NOT_CM is clear.  This means that we're bound to and
 * running on the local cpu w/ rq lock held and preemption/irq
 * disabled, which in turn means that none else could be
 * manipulating idle_list, so dereferencing idle_list without pool
 * lock is safe. And which in turn also means that we can
-* manipulating worker->flags and pool->nr_running.
+* manipulating worker->flags and pool->nr_cm_workers.
 */
worker->flags |= WORKER_QUIT_CM;
-   if (--pool->nr_running == 0) {
+   if (--pool->nr_cm_workers == 0) {
/*
-* This smp_mb() forces a mb between decreasing nr_running
+* This smp_mb() forces a mb between decreasing nr_cm_workers
 * and reading worklist. It paires with the smp_mb() in
 * insert_work(). Please read comment there.
 */
@@ -838,13 +838,13 @@ struct task_struct *wq_worker_sleeping(struct task_struct 
*task)
 }
 
 /**
- * worker_set_flags - set worker flags and adjust nr_running accordingly
+ * worker_set_flags - set worker flags and adjust nr_cm_workers accordingly
  * @worker: self
  * @flags: flags to set
  * @wakeup: wakeup an idle worker if necessary
  *
- * Set @flags in @worker->flags and adjust nr_running accordingly.  If
- * nr_running becomes zero and @wakeup is %true, an idle worker is
+ * Set @flags in @worker->flags and adjust nr_cm_workers accordingly.  If
+ * nr_cm_workers becomes zero and @wakeup is %true, an idle worker is
  * woken up.
  *
  * CONTEXT:
@@ -858,14 +858,13 @@ static inline void worker_set_flags(struct worker 
*worker, unsigned int flags,
WARN_ON_ONCE(worker->task != current);
 
/*

Re: [PATCH 08/30] i2c: s3c2410: make header file local

2013-04-14 Thread Heiko Stübner

Am Sonntag, 14. April 2013, 14:20:35 schrieb Wolfram Sang:
> On Thu, Apr 11, 2013 at 02:04:50AM +0200, Arnd Bergmann wrote:
> > No other file in the kernel besides i2c-s3c2410.c uses the current
> > plat/regs-iic.h, so we can simply move the header file to live in the
> > same directory as the driver, as a preparation to multiplatform builds.
> 
> What about putting the regs in the driver itself?

they already are :-) [0] and Arnd will drop this patch in his next iteration 
[1].


Heiko


[0] 
https://git.kernel.org/cgit/linux/kernel/git/wsa/linux.git/commit/?h=i2c/for-
next&id=e636602ac2613da8c1777cb42443223994be4107

[1] Message-Id: <201304121011.13028.a...@arndb.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ARM: KVM: Add missing break;

2013-04-14 Thread Marc Zyngier

Hi Joe,

On Sat, 13 Apr 2013 22:55:45 -0700, Joe Perches  wrote:
> commit 3401d54696f ("KVM: ARM: Introduce KVM_ARM_SET_DEVICE_ADDR ioctl")
> added the case, but omitted adding break;

[...]

Already reported here:
http://lists.infradead.org/pipermail/linux-arm-kernel/2013-April/160127.html

> $ grep -rP --include=*.[ch]
> "\b(\w+)\s*=[^;]+;\s*(?:case\s+\w+:|default:)\s*\1\s*="

Cool regexp! :-)
 
Cheers,

M.
-- 
Fast, cheap, reliable. Pick two.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] UCB1400: Pass ucb1400-gpio data through ac97 bus

2013-04-14 Thread Marek Vasut

Cc: Linus Walleij 
Cc: Jean Delvare 
Cc: Samuel Ortiz 
Cc: Mark Brown 
Cc: Guenter Roeck 
Cc: linux-kernel 
Cc: Grant Likely 
Signed-off-by: Marek Vasut 
---
 drivers/gpio/gpio-ucb1400.c |   19 ++-
 drivers/mfd/ucb1400_core.c  |5 +
 include/linux/ucb1400.h |   18 ++
 3 files changed, 17 insertions(+), 25 deletions(-)

v2: Rebase patch from:
http://lists.infradead.org/pipermail/linux-arm-kernel/2010-October/028656.html

NOTE: I didn't even compile-test this, but the fix was plenty straightforward.

diff --git a/drivers/gpio/gpio-ucb1400.c b/drivers/gpio/gpio-ucb1400.c
index 26405ef..6d0feb2 100644
--- a/drivers/gpio/gpio-ucb1400.c
+++ b/drivers/gpio/gpio-ucb1400.c
@@ -12,8 +12,6 @@
 #include 
 #include 
 
-struct ucb1400_gpio_data *ucbdata;
-
 static int ucb1400_gpio_dir_in(struct gpio_chip *gc, unsigned off)
 {
struct ucb1400_gpio *gpio;
@@ -50,7 +48,7 @@ static int ucb1400_gpio_probe(struct platform_device *dev)
struct ucb1400_gpio *ucb = dev->dev.platform_data;
int err = 0;
 
-   if (!(ucbdata && ucbdata->gpio_offset)) {
+   if (!(ucb && ucb->gpio_offset)) {
err = -EINVAL;
goto err;
}
@@ -58,7 +56,7 @@ static int ucb1400_gpio_probe(struct platform_device *dev)
platform_set_drvdata(dev, ucb);
 
ucb->gc.label = "ucb1400_gpio";
-   ucb->gc.base = ucbdata->gpio_offset;
+   ucb->gc.base = ucb->gpio_offset;
ucb->gc.ngpio = 10;
ucb->gc.owner = THIS_MODULE;
 
@@ -72,8 +70,8 @@ static int ucb1400_gpio_probe(struct platform_device *dev)
if (err)
goto err;
 
-   if (ucbdata && ucbdata->gpio_setup)
-   err = ucbdata->gpio_setup(&dev->dev, ucb->gc.ngpio);
+   if (ucb && ucb->gpio_setup)
+   err = ucb->gpio_setup(&dev->dev, ucb->gc.ngpio);
 
 err:
return err;
@@ -85,8 +83,8 @@ static int ucb1400_gpio_remove(struct platform_device *dev)
int err = 0;
struct ucb1400_gpio *ucb = platform_get_drvdata(dev);
 
-   if (ucbdata && ucbdata->gpio_teardown) {
-   err = ucbdata->gpio_teardown(&dev->dev, ucb->gc.ngpio);
+   if (ucb && ucb->gpio_teardown) {
+   err = ucb->gpio_teardown(&dev->dev, ucb->gc.ngpio);
if (err)
return err;
}
@@ -103,11 +101,6 @@ static struct platform_driver ucb1400_gpio_driver = {
},
 };
 
-void __init ucb1400_gpio_set_data(struct ucb1400_gpio_data *data)
-{
-   ucbdata = data;
-}
-
 module_platform_driver(ucb1400_gpio_driver);
 
 MODULE_DESCRIPTION("Philips UCB1400 GPIO driver");
diff --git a/drivers/mfd/ucb1400_core.c b/drivers/mfd/ucb1400_core.c
index daf6952..e9031fa 100644
--- a/drivers/mfd/ucb1400_core.c
+++ b/drivers/mfd/ucb1400_core.c
@@ -75,6 +75,11 @@ static int ucb1400_core_probe(struct device *dev)
 
/* GPIO */
ucb_gpio.ac97 = ac97;
+   if (pdata) {
+   ucb_gpio.gpio_setup = pdata->gpio_setup;
+   ucb_gpio.gpio_teardown = pdata->gpio_teardown;
+   ucb_gpio.gpio_offset = pdata->gpio_offset;
+   }
ucb->ucb1400_gpio = platform_device_alloc("ucb1400_gpio", -1);
if (!ucb->ucb1400_gpio) {
err = -ENOMEM;
diff --git a/include/linux/ucb1400.h b/include/linux/ucb1400.h
index d21b33c..2e9ee4d 100644
--- a/include/linux/ucb1400.h
+++ b/include/linux/ucb1400.h
@@ -83,15 +83,12 @@
 #define UCB_ID 0x7e
 #define UCB_ID_1400 0x4304
 
-struct ucb1400_gpio_data {
-   int gpio_offset;
-   int (*gpio_setup)(struct device *dev, int ngpio);
-   int (*gpio_teardown)(struct device *dev, int ngpio);
-};
-
 struct ucb1400_gpio {
struct gpio_chipgc;
struct snd_ac97 *ac97;
+   int gpio_offset;
+   int (*gpio_setup)(struct device *dev, int ngpio);
+   int (*gpio_teardown)(struct device *dev, int ngpio);
 };
 
 struct ucb1400_ts {
@@ -110,6 +107,9 @@ struct ucb1400 {
 
 struct ucb1400_pdata {
int irq;
+   int gpio_offset;
+   int (*gpio_setup)(struct device *dev, int ngpio);
+   int (*gpio_teardown)(struct device *dev, int ngpio);
 };
 
 static inline u16 ucb1400_reg_read(struct snd_ac97 *ac97, u16 reg)
@@ -162,10 +162,4 @@ static inline void ucb1400_adc_disable(struct snd_ac97 
*ac97)
 unsigned int ucb1400_adc_read(struct snd_ac97 *ac97, u16 adc_channel,
  int adcsync);
 
-#ifdef CONFIG_GPIO_UCB1400
-void __init ucb1400_gpio_set_data(struct ucb1400_gpio_data *data);
-#else
-static inline void ucb1400_gpio_set_data(struct ucb1400_gpio_data *data) {}
-#endif
-
 #endif
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.

[PATCH 2/4] cgroup: drop hierarchy_id_lock

2013-04-14 Thread Tejun Heo

Now that hierarchy_id alloc / free are protected by the cgroup
mutexes, there's no need for this separate lock.  Drop it.

Signed-off-by: Tejun Heo 
---
 kernel/cgroup.c | 23 +--
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 823cb56..e15bdb7 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -237,9 +237,13 @@ struct cgroup_event {
 static LIST_HEAD(roots);
 static int root_count;
 
+/*
+ * Hierarchy ID allocation and mapping.  It follows the same exclusion
+ * rules as other root ops - both cgroup_mutex and cgroup_root_mutex for
+ * writes, either for reads.
+ */
 static DEFINE_IDA(hierarchy_ida);
 static int next_hierarchy_id;
-static DEFINE_SPINLOCK(hierarchy_id_lock);
 
 /* dummytop is a shorthand for the dummy hierarchy's top cgroup */
 #define dummytop (&rootnode.top_cgroup)
@@ -1456,10 +1460,12 @@ static int cgroup_init_root_id(struct cgroupfs_root 
*root)
 {
int ret;
 
+   lockdep_assert_held(&cgroup_mutex);
+   lockdep_assert_held(&cgroup_root_mutex);
+
do {
if (!ida_pre_get(&hierarchy_ida, GFP_KERNEL))
return -ENOMEM;
-   spin_lock(&hierarchy_id_lock);
/* Try to allocate the next unused ID */
ret = ida_get_new_above(&hierarchy_ida, next_hierarchy_id,
&root->hierarchy_id);
@@ -1472,18 +1478,17 @@ static int cgroup_init_root_id(struct cgroupfs_root 
*root)
/* Can only get here if the 31-bit IDR is full ... */
BUG_ON(ret);
}
-   spin_unlock(&hierarchy_id_lock);
} while (ret);
return 0;
 }
 
 static void cgroup_exit_root_id(struct cgroupfs_root *root)
 {
+   lockdep_assert_held(&cgroup_mutex);
+   lockdep_assert_held(&cgroup_root_mutex);
+
if (root->hierarchy_id) {
-   spin_lock(&hierarchy_id_lock);
ida_remove(&hierarchy_ida, root->hierarchy_id);
-   spin_unlock(&hierarchy_id_lock);
-
root->hierarchy_id = 0;
}
 }
@@ -4656,8 +4661,14 @@ int __init cgroup_init(void)
hash_add(css_set_table, &init_css_set.hlist, key);
 
/* allocate id for the dummy hierarchy */
+   mutex_lock(&cgroup_mutex);
+   mutex_lock(&cgroup_root_mutex);
+
BUG_ON(cgroup_init_root_id(&rootnode));
 
+   mutex_unlock(&cgroup_root_mutex);
+   mutex_unlock(&cgroup_mutex);
+
cgroup_kobj = kobject_create_and_add("cgroup", fs_kobj);
if (!cgroup_kobj) {
err = -ENOMEM;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/4] cgroup: implement task_cgroup_path_from_hierarchy()

2013-04-14 Thread Tejun Heo

kdbus folks want a sane way to determine the cgroup path that a given
task belongs to on a given hierarchy, which is a reasonble thing to
expect from cgroup core.

Implement task_cgroup_path_from_hierarchy().

Signed-off-by: Tejun Heo 
Cc: Kay Sievers 
Cc: Greg Kroah-Hartman 
Cc: Lennart Poettering 
Cc: Daniel Mack 
---
 include/linux/cgroup.h |  2 ++
 kernel/cgroup.c| 33 +
 2 files changed, 35 insertions(+)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 17ed818..ee83af2 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -443,6 +443,8 @@ int cgroup_is_removed(const struct cgroup *cgrp);
 bool cgroup_is_descendant(struct cgroup *cgrp, struct cgroup *ancestor);
 
 int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen);
+int task_cgroup_path_from_hierarchy(struct task_struct *task, int hierarchy_id,
+   char *buf, size_t buflen);
 
 int cgroup_task_count(const struct cgroup *cgrp);
 
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 75d85e8..5184fcd 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1842,6 +1842,39 @@ out:
 }
 EXPORT_SYMBOL_GPL(cgroup_path);
 
+/**
+ * task_cgroup_path_from_hierarchy - cgroup path of a task on a hierarchy
+ * @task: target task
+ * @hierarchy_id: the hierarchy to look up @task's cgroup from
+ * @buf: the buffer to write the path into
+ * @buflen: the length of the buffer
+ *
+ * Determine @task's cgroup on the hierarchy specified by @hierarchy_id and
+ * copy its path into @buf.  This function grabs cgroup_mutex and shouldn't
+ * be used inside locks used by cgroup controller callbacks.
+ */
+int task_cgroup_path_from_hierarchy(struct task_struct *task, int hierarchy_id,
+   char *buf, size_t buflen)
+{
+   struct cgroupfs_root *root;
+   struct cgroup *cgrp = NULL;
+   int ret = -ENOENT;
+
+   mutex_lock(&cgroup_mutex);
+
+   root = idr_find(&cgroup_hierarchy_idr, hierarchy_id);
+   if (root) {
+   cgrp = task_cgroup_from_root(task, root);
+   if (cgrp)
+   ret = cgroup_path(cgrp, buf, buflen);
+   }
+
+   mutex_unlock(&cgroup_mutex);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(task_cgroup_path_from_hierarchy);
+
 /*
  * Control Group taskset
  */
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/4] cgroup: refactor hierarchy_id handling

2013-04-14 Thread Tejun Heo

We're planning to converting hierarchy_ida to an idr and use it to
look up hierarchy from its id.  As we want the mapping to happen
atomically with cgroupfs_root registration, this patch refactors
hierarchy_id init / exit so that ida operations happen inside
cgroup_[root_]mutex.

* s/init_root_id()/cgroup_init_root_id()/ and make it return 0 or
  -errno like a normal function.

* Move hierarchy_id initialization from cgroup_root_from_opts() into
  cgroup_mount() block where the root is confirmed to be used and
  being registered while holding both mutexes.

* Split cgroup_drop_id() into cgroup_exit_root_id() and
  cgroup_free_root(), so that ID release can happen before dropping
  the mutexes in cgroup_kill_sb().  The latter expects hierarchy_id to
  be exited before being invoked.

Signed-off-by: Tejun Heo 
---
 kernel/cgroup.c | 56 +++-
 1 file changed, 35 insertions(+), 21 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index a790409..823cb56 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1452,13 +1452,13 @@ static void init_cgroup_root(struct cgroupfs_root *root)
list_add_tail(&cgrp->allcg_node, &root->allcg_list);
 }
 
-static bool init_root_id(struct cgroupfs_root *root)
+static int cgroup_init_root_id(struct cgroupfs_root *root)
 {
-   int ret = 0;
+   int ret;
 
do {
if (!ida_pre_get(&hierarchy_ida, GFP_KERNEL))
-   return false;
+   return -ENOMEM;
spin_lock(&hierarchy_id_lock);
/* Try to allocate the next unused ID */
ret = ida_get_new_above(&hierarchy_ida, next_hierarchy_id,
@@ -1474,7 +1474,18 @@ static bool init_root_id(struct cgroupfs_root *root)
}
spin_unlock(&hierarchy_id_lock);
} while (ret);
-   return true;
+   return 0;
+}
+
+static void cgroup_exit_root_id(struct cgroupfs_root *root)
+{
+   if (root->hierarchy_id) {
+   spin_lock(&hierarchy_id_lock);
+   ida_remove(&hierarchy_ida, root->hierarchy_id);
+   spin_unlock(&hierarchy_id_lock);
+
+   root->hierarchy_id = 0;
+   }
 }
 
 static int cgroup_test_super(struct super_block *sb, void *data)
@@ -1508,10 +1519,6 @@ static struct cgroupfs_root 
*cgroup_root_from_opts(struct cgroup_sb_opts *opts)
if (!root)
return ERR_PTR(-ENOMEM);
 
-   if (!init_root_id(root)) {
-   kfree(root);
-   return ERR_PTR(-ENOMEM);
-   }
init_cgroup_root(root);
 
root->subsys_mask = opts->subsys_mask;
@@ -1526,17 +1533,15 @@ static struct cgroupfs_root 
*cgroup_root_from_opts(struct cgroup_sb_opts *opts)
return root;
 }
 
-static void cgroup_drop_root(struct cgroupfs_root *root)
+static void cgroup_free_root(struct cgroupfs_root *root)
 {
-   if (!root)
-   return;
+   if (root) {
+   /* hierarhcy ID shoulid already have been released */
+   WARN_ON_ONCE(root->hierarchy_id);
 
-   BUG_ON(!root->hierarchy_id);
-   spin_lock(&hierarchy_id_lock);
-   ida_remove(&hierarchy_ida, root->hierarchy_id);
-   spin_unlock(&hierarchy_id_lock);
-   ida_destroy(&root->cgroup_ida);
-   kfree(root);
+   ida_destroy(&root->cgroup_ida);
+   kfree(root);
+   }
 }
 
 static int cgroup_set_super(struct super_block *sb, void *data)
@@ -1623,7 +1628,7 @@ static struct dentry *cgroup_mount(struct 
file_system_type *fs_type,
sb = sget(fs_type, cgroup_test_super, cgroup_set_super, 0, &opts);
if (IS_ERR(sb)) {
ret = PTR_ERR(sb);
-   cgroup_drop_root(opts.new_root);
+   cgroup_free_root(opts.new_root);
goto drop_modules;
}
 
@@ -1667,6 +1672,10 @@ static struct dentry *cgroup_mount(struct 
file_system_type *fs_type,
if (ret)
goto unlock_drop;
 
+   ret = cgroup_init_root_id(root);
+   if (ret)
+   goto unlock_drop;
+
ret = rebind_subsystems(root, root->subsys_mask);
if (ret == -EBUSY) {
free_cg_links(&tmp_cg_links);
@@ -1710,7 +1719,7 @@ static struct dentry *cgroup_mount(struct 
file_system_type *fs_type,
 * We re-used an existing hierarchy - the new root (if
 * any) is not needed
 */
-   cgroup_drop_root(opts.new_root);
+   cgroup_free_root(opts.new_root);
/* no subsys rebinding, so refcounts don't change */
drop_parsed_module_refcounts(opts.subsys_mask);
}
@@ -1720,6 +1729,7 @@ static struct dentry *cgroup_mount(struct 
file_system_type *fs_type,
return dget(sb->s_root);
 
  unlock_drop:
+   cgroup_exit_root_id(root);
mutex_unlock(&cgroup_root_mutex);
mutex_unl

[PATCH 3/4] cgroup: make hierarchy_id use cyclic idr

2013-04-14 Thread Tejun Heo

We want to be able to lookup a hierarchy from its id and cyclic
allocation is a whole lot simpler with idr.  Convert to idr and use
idr_alloc_cyclc().

Signed-off-by: Tejun Heo 
---
 kernel/cgroup.c | 28 
 1 file changed, 8 insertions(+), 20 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index e15bdb7..75d85e8 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -242,8 +242,7 @@ static int root_count;
  * rules as other root ops - both cgroup_mutex and cgroup_root_mutex for
  * writes, either for reads.
  */
-static DEFINE_IDA(hierarchy_ida);
-static int next_hierarchy_id;
+static DEFINE_IDR(cgroup_hierarchy_idr);
 
 /* dummytop is a shorthand for the dummy hierarchy's top cgroup */
 #define dummytop (&rootnode.top_cgroup)
@@ -1458,27 +1457,16 @@ static void init_cgroup_root(struct cgroupfs_root *root)
 
 static int cgroup_init_root_id(struct cgroupfs_root *root)
 {
-   int ret;
+   int id;
 
lockdep_assert_held(&cgroup_mutex);
lockdep_assert_held(&cgroup_root_mutex);
 
-   do {
-   if (!ida_pre_get(&hierarchy_ida, GFP_KERNEL))
-   return -ENOMEM;
-   /* Try to allocate the next unused ID */
-   ret = ida_get_new_above(&hierarchy_ida, next_hierarchy_id,
-   &root->hierarchy_id);
-   if (ret == -ENOSPC)
-   /* Try again starting from 0 */
-   ret = ida_get_new(&hierarchy_ida, &root->hierarchy_id);
-   if (!ret) {
-   next_hierarchy_id = root->hierarchy_id + 1;
-   } else if (ret != -EAGAIN) {
-   /* Can only get here if the 31-bit IDR is full ... */
-   BUG_ON(ret);
-   }
-   } while (ret);
+   id = idr_alloc_cyclic(&cgroup_hierarchy_idr, root, 2, 0, GFP_KERNEL);
+   if (id < 0)
+   return id;
+
+   root->hierarchy_id = id;
return 0;
 }
 
@@ -1488,7 +1476,7 @@ static void cgroup_exit_root_id(struct cgroupfs_root 
*root)
lockdep_assert_held(&cgroup_root_mutex);
 
if (root->hierarchy_id) {
-   ida_remove(&hierarchy_ida, root->hierarchy_id);
+   idr_remove(&cgroup_hierarchy_idr, root->hierarchy_id);
root->hierarchy_id = 0;
}
 }
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHSET] cgroup: implement task_cgroup_path_from_hierarchy()

2013-04-14 Thread Tejun Heo

kdbus folks want a sane way to determine the cgroup path that a given
task belongs to on a given hierarchy, which is a reasonble thing to
expect from cgroup core.

This patchset make hierarchy_id allocation use idr instead of ida and
implement task_cgroup_path_from_hierarchy().  In the process, the
yucky ida cyclic allocation is replaced with idr_alloc_cyclic().

 0001-cgroup-refactor-hierarchy_id-handling.patch
 0002-cgroup-drop-hierarchy_id_lock.patch
 0003-cgroup-make-hierarchy_id-use-cyclic-idr.patch
 0004-cgroup-implement-task_cgroup_path_from_hierarchy.patch

0001-0002 prepare for conversion to idr, which 0003 does.

0004 implements the new function.

This patchset is on top of next-20130412 as idr_alloc_cyclic() patch
is currently in -mm.  Given that this isn't an urgent thing and the
merge window is just around the corner, it'd be probably best to route
these through cgroup/for-3.11 once v3.10-rc1 drops.

These patches are also available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git 
review-task_cgroup_path_from_hierarchy

And it actually reduces LOC.  Woot Woot.

 include/linux/cgroup.h |2
 kernel/cgroup.c|  128 +
 2 files changed, 89 insertions(+), 41 deletions(-)

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] UCB1400: Pass ucb1400-gpio data through ac97 bus

2013-04-14 Thread Marek Vasut

Dear Marek Vasut,

> Cc: Linus Walleij 
> Cc: Jean Delvare 
> Cc: Samuel Ortiz 
> Cc: Mark Brown 
> Cc: Guenter Roeck 
> Cc: linux-kernel 
> Cc: Grant Likely 
> Signed-off-by: Marek Vasut 
> ---
>  drivers/gpio/gpio-ucb1400.c |   19 ++-
>  drivers/mfd/ucb1400_core.c  |5 +
>  include/linux/ucb1400.h |   18 ++
>  3 files changed, 17 insertions(+), 25 deletions(-)
> 
> v2: Rebase patch from:
> http://lists.infradead.org/pipermail/linux-arm-kernel/2010-October/028656.h
> tml
> 
> NOTE: I didn't even compile-test this, but the fix was plenty
> straightforward.

But damn, this code is ugly. I'm retrospectively-ashamed.

Best regards,
Marek Vasut
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] ptrace/x86: simplify the "disable" logic in ptrace_write_dr7()

2013-04-14 Thread Oleg Nesterov

ptrace_write_dr7() looks unnecessarily overcomplicated. We can
factor out ptrace_modify_breakpoint() and do not do "continue"
twice, just we need to pass the proper "disabled" argument to
ptrace_modify_breakpoint().

Signed-off-by: Oleg Nesterov 
---
 arch/x86/kernel/ptrace.c |   40 +++-
 1 files changed, 15 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 7a98b21..0649f16 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -637,9 +637,7 @@ static int ptrace_write_dr7(struct task_struct *tsk, 
unsigned long data)
struct thread_struct *thread = &(tsk->thread);
unsigned long old_dr7;
int i, orig_ret = 0, rc = 0;
-   int enabled, second_pass = 0;
-   unsigned len, type;
-   struct perf_event *bp;
+   int second_pass = 0;
 
data &= ~DR_CONTROL_RESERVED;
old_dr7 = ptrace_get_dr7(thread->ptrace_bps);
@@ -649,30 +647,22 @@ restore:
 * appropriate changes to each.
 */
for (i = 0; i < HBP_NUM; i++) {
-   enabled = decode_dr7(data, i, &len, &type);
-   bp = thread->ptrace_bps[i];
-
-   if (!enabled) {
-   if (bp) {
-   /*
-* Don't unregister the breakpoints right-away,
-* unless all register_user_hw_breakpoint()
-* requests have succeeded. This prevents
-* any window of opportunity for debug
-* register grabbing by other users.
-*/
-   if (!second_pass)
-   continue;
-
-   rc = ptrace_modify_breakpoint(bp, len, type,
- tsk, 1);
-   if (rc)
-   break;
-   }
-   continue;
+   unsigned len, type;
+   bool disabled = !decode_dr7(data, i, &len, &type);
+   struct perf_event *bp = thread->ptrace_bps[i];
+
+   if (disabled) {
+   /*
+* Don't unregister the breakpoints right-away, unless
+* all register_user_hw_breakpoint() requests have
+* succeeded. This prevents any window of opportunity
+* for debug register grabbing by other users.
+*/
+   if (!bp || !second_pass)
+   continue;
}
 
-   rc = ptrace_modify_breakpoint(bp, len, type, tsk, 0);
+   rc = ptrace_modify_breakpoint(bp, len, type, tsk, disabled);
if (rc)
break;
}
-- 
1.5.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] cgroup: implement task_cgroup_path_from_hierarchy()

2013-04-14 Thread Greg KH

On Sun, Apr 14, 2013 at 11:36:59AM -0700, Tejun Heo wrote:
> kdbus folks want a sane way to determine the cgroup path that a given
> task belongs to on a given hierarchy, which is a reasonble thing to
> expect from cgroup core.
> 
> Implement task_cgroup_path_from_hierarchy().
> 
> Signed-off-by: Tejun Heo 
> Cc: Kay Sievers 
> Cc: Greg Kroah-Hartman 
> Cc: Lennart Poettering 
> Cc: Daniel Mack 

Thanks so much for doing this.

Acked-by: Greg Kroah-Hartman 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] ptrace/x86: dont delay perf_event_disable() till second pass in ptrace_write_dr7()

2013-04-14 Thread Oleg Nesterov

ptrace_write_dr7() skips ptrace_modify_breakpoint(disabled => true)
unless second_pass, this buys nothing but complicates the code and
means that we always do the main loop twice even if "disabled" was
never true.

The comment says:

Don't unregister the breakpoints right-away,
unless all register_user_hw_breakpoint()
requests have succeeded.

I think this logic was always wrong, hw_breakpoint_del() does not
free the slot so perf_event_disable() can't hurt.

But in any case this looks unneeded nowadays, and contrary to what
the comment says we do not do register_user_hw_breakpoint(), this
was removed by 24f1e32c "hw-breakpoints: Rewrite the hw-breakpoints
layer on top of perf events".

Remove the "second_pass" check from the main loop and simplify the
code. Since we have to check "bp != NULL" anyway, the patch also
removes the same check in ptrace_modify_breakpoint() and moves the
comment into ptrace_write_dr7().

Signed-off-by: Oleg Nesterov 
---
 arch/x86/kernel/ptrace.c |   46 +-
 1 files changed, 17 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 0649f16..6814f27 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -609,14 +609,6 @@ ptrace_modify_breakpoint(struct perf_event *bp, int len, 
int type,
int gen_len, gen_type;
struct perf_event_attr attr;
 
-   /*
-* We should have at least an inactive breakpoint at this
-* slot. It means the user is writing dr7 without having
-* written the address register first
-*/
-   if (!bp)
-   return -EINVAL;
-
err = arch_bp_generic_fields(len, type, &gen_len, &gen_type);
if (err)
return err;
@@ -634,10 +626,10 @@ ptrace_modify_breakpoint(struct perf_event *bp, int len, 
int type,
  */
 static int ptrace_write_dr7(struct task_struct *tsk, unsigned long data)
 {
-   struct thread_struct *thread = &(tsk->thread);
+   struct thread_struct *thread = &tsk->thread;
unsigned long old_dr7;
-   int i, orig_ret = 0, rc = 0;
-   int second_pass = 0;
+   int i, ret = 0, rc = 0;
+   bool second_pass = false;
 
data &= ~DR_CONTROL_RESERVED;
old_dr7 = ptrace_get_dr7(thread->ptrace_bps);
@@ -651,35 +643,31 @@ restore:
bool disabled = !decode_dr7(data, i, &len, &type);
struct perf_event *bp = thread->ptrace_bps[i];
 
-   if (disabled) {
+   if (!bp) {
+   if (disabled)
+   continue;
/*
-* Don't unregister the breakpoints right-away, unless
-* all register_user_hw_breakpoint() requests have
-* succeeded. This prevents any window of opportunity
-* for debug register grabbing by other users.
+* We should have at least an inactive breakpoint at
+* this slot. It means the user is writing dr7 without
+* having written the address register first.
 */
-   if (!bp || !second_pass)
-   continue;
+   rc = -EINVAL;
+   break;
}
 
rc = ptrace_modify_breakpoint(bp, len, type, tsk, disabled);
if (rc)
break;
}
-   /*
-* Make a second pass to free the remaining unused breakpoints
-* or to restore the original breakpoints if an error occurred.
-*/
-   if (!second_pass) {
-   second_pass = 1;
-   if (rc < 0) {
-   orig_ret = rc;
-   data = old_dr7;
-   }
+   /* Make a second pass to restore the original breakpoints if failed */
+   if (!second_pass && rc) {
+   second_pass = true;
+   ret = rc;
+   data = old_dr7;
goto restore;
}
 
-   return orig_ret < 0 ? orig_ret : rc;
+   return ret;
 }
 
 /*
-- 
1.5.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/2] ptrace/x86: simplify ptrace_write_dr7()

2013-04-14 Thread Oleg Nesterov

Hello.

On top of "[PATCH 0/5] kill ptrace_{get,put}_breakpoints()".
Cleanup and preparation for the potential fix, see below.

--
Now the question. Initially I was going to make more patches
and fix the regression introduced by 24f1e32c (although I am
not 100% sure which exactly patch should be blamed).

See https://bugzilla.redhat.com/show_bug.cgi?id=660204 for
details.

ptrace_write_dr7() does not create bp if it is zero, the comment
says:

/*
 * We should have at least an inactive breakpoint at
 * this slot. It means the user is writing dr7 without
 * having written the address register first.
 */


and this looks logical. However, at least until 72f674d2
ptrace_set_debugreg(n => 7) worked even if addr wasn't set
by ptrace_set_debugreg(n => 0|1|2|3) before.

And note that ptrace_get_debugreg() does not fail if !ptrace_bps[n],
it just returns zero as if the address register was written. And
there is no way to know if address was actually set, not good and
not consistent.

Jan, Frederic, et all. What do you think we should do?

1. Change ptrace_write_dr7() to do register_user_hw_breakpoint()
   if necessary.

   This is what I was going to do, but I am no longer sure
   we want this. For what? Unlikely it is very useful to use
   the "default" addr == 0 for debugging.

2. Change ptrace_get_debugreg(0-4) to return -ESOMETHING if
   ptrace_bps[n] == NULL.

   This will match ptrace_set_debugreg(), but this can break
   something else...

3. Do nothing.

I am inclined to do "1", but please comment.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/3] iommu: Move swap_pci_ref function to pci.h.

2013-04-14 Thread Varun Sethi

swap_pci_ref function is used by the IOMMU API code for swapping pci device
pointers, while determining the iommu group for the device.
Currently this function was being implemented for different IOMMU drivers.
This patch moves the function to pci.h so that the implementation can be
shared across various IOMMU drivers.

Signed-off-by: Varun Sethi 
---
 drivers/iommu/amd_iommu.c   |6 --
 drivers/iommu/intel-iommu.c |6 --
 include/linux/pci.h |8 
 3 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index a7f6b04..c36c046 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -263,12 +263,6 @@ static bool check_device(struct device *dev)
return true;
 }
 
-static void swap_pci_ref(struct pci_dev **from, struct pci_dev *to)
-{
-   pci_dev_put(*from);
-   *from = to;
-}
-
 static struct pci_bus *find_hosted_bus(struct pci_bus *bus)
 {
while (!bus->self) {
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 6e0b9ff..8d7c979 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4137,12 +4137,6 @@ static int intel_iommu_domain_has_cap(struct 
iommu_domain *domain,
return 0;
 }
 
-static void swap_pci_ref(struct pci_dev **from, struct pci_dev *to)
-{
-   pci_dev_put(*from);
-   *from = to;
-}
-
 #define REQ_ACS_FLAGS  (PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF)
 
 static int intel_iommu_add_device(struct device *dev)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 2461033a..41511de 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1850,6 +1850,14 @@ static inline struct eeh_dev *pci_dev_to_eeh_dev(struct 
pci_dev *pdev)
 }
 #endif
 
+#ifdef CONFIG_IOMMU_API
+static inline void swap_pci_ref(struct pci_dev **from, struct pci_dev *to)
+{
+   pci_dev_put(*from);
+   *from = to;
+}
+#endif
+
 /**
  * pci_find_upstream_pcie_bridge - find upstream PCIe-to-PCI bridge of a device
  * @pdev: the PCI device
-- 
1.7.4.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3 v12] iommu/fsl: Add additional iommu attributes required by the PAMU driver.

2013-04-14 Thread Varun Sethi

Added the following domain attributes for the FSL PAMU driver:
1. Added new iommu stash attribute, which allows setting of the
   LIODN specific stash id parameter through IOMMU API.
2. Added an attribute for enabling/disabling DMA to a particular
   memory window.
3. Added domain attribute to check for PAMUV1 specific constraints.

Signed-off-by: Varun Sethi 
---
-v12 changes:
- Moved PAMU specifc stash ids and structures to PAMU header file.
- no change in v11.
- no change in v10.
 include/linux/iommu.h |   16 
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2727810..c5dc2b9 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -57,10 +57,26 @@ struct iommu_domain {
 #define IOMMU_CAP_CACHE_COHERENCY  0x1
 #define IOMMU_CAP_INTR_REMAP   0x2 /* isolates device intrs */
 
+/*
+ * Following constraints are specifc to PAMUV1:
+ *  -aperture must be power of 2, and naturally aligned
+ *  -number of windows must be power of 2, and address space size
+ *   of each window is determined by aperture size / # of windows
+ *  -the actual size of the mapped region of a window must be power
+ *   of 2 starting with 4KB and physical address must be naturally
+ *   aligned.
+ * DOMAIN_ATTR_FSL_PAMUV1 corresponds to the above mentioned contraints.
+ * The caller can invoke iommu_domain_get_attr to check if the underlying
+ * iommu implementation supports these constraints.
+ */
+
 enum iommu_attr {
DOMAIN_ATTR_GEOMETRY,
DOMAIN_ATTR_PAGING,
DOMAIN_ATTR_WINDOWS,
+   DOMAIN_ATTR_PAMU_STASH,
+   DOMAIN_ATTR_PAMU_ENABLE,
+   DOMAIN_ATTR_FSL_PAMUV1,
DOMAIN_ATTR_MAX,
 };
 
-- 
1.7.4.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] ptrace/x86: simplify ptrace_write_dr7()

2013-04-14 Thread Jan Kratochvil

On Sun, 14 Apr 2013 21:12:05 +0200, Oleg Nesterov wrote:
> Jan, Frederic, et all. What do you think we should do?
> 
>   1. Change ptrace_write_dr7() to do register_user_hw_breakpoint()
>  if necessary.
> 
>  This is what I was going to do, but I am no longer sure
>  we want this. For what? Unlikely it is very useful to use
>  the "default" addr == 0 for debugging.

I do not understand how these functions map to the PTRACE_* syscall.

But this was a regression from the application point of view as some
application did/do:
* waitpid - get the process to: t (tracing stop)
* PTRACE_POKEUSER DR7, enableDR0
* PTRACE_POKEUSER DR0, address
* PTRACE_CONT

This was perfectly valid before, there is no "default" addr == 0 used for any
debugging.  Just the applications did not care about PTRACE_POKEUSER ordering.
This is also how the bug was found.


Thanks,
Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/2] [RFC] blkdev: flush generation optimization

2013-04-14 Thread Dmitry Monakhov


Some filesystems try to optimize barrier flushes by maintaining
fs-specific generation counters, but if we introduce generic
flush generation counter for block device filesystems may use
it for fdatasync(2) optimization. Optimization should works if
userspace performs mutli-threaded IO with a lot of fdatasync()
Here are graphs for a test where each task performs random buffered writes
to dedicated file and performs fdatasync(2) after each operation.

Axis: x=nr_tasks, y=write_iops
# Chunk server simulation workload
# Files 'chunk.$NUM_JOB.0' should be precreated before the test
# 
[global]
bs=4k
ioengine=psync
filesize=64M
size=8G
direct=0
runtime=30
directory=/mnt
fdatasync=1
group_reporting=1

[chunk]
overwrite=1
new_group=1
write_bw_log=bw.log
rw=randwrite
numjobs=${NUM_JOBS}
fsync=1
stonewall
<><>

TOC:
0001 blkdev: add flush generation counter
0002 ext4: Add fdatasync scalability optimization

[PATCH 2/2] ext4: Add fdatasync scalability optimization

2013-04-14 Thread Dmitry Monakhov

Track blkdev's flush generation counter on per-inode basis and update
inside end_io. If inode's flush generation counter is older than current
blkdev's flush counter inode's data was already flushed to stable media,
so we can skip explicit barrier. Optimization is safe only when inode's
end_io was called before flush request was QUEUED and COMPLETED.

With that optimization we do not longer need jbd2 flush optimization.

Signed-off-by: Dmitry Monakhov 
---
 fs/ext4/ext4.h  |1 +
 fs/ext4/ext4_jbd2.h |   10 +-
 fs/ext4/fsync.c |   16 +++-
 fs/ext4/inode.c |3 ++-
 fs/ext4/page-io.c   |2 +-
 5 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 75b2326..e2ec980 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -932,6 +932,7 @@ struct ext4_inode_info {
 */
tid_t i_sync_tid;
tid_t i_datasync_tid;
+   atomic_t i_flush_tag;
 
/* Precomputed uuid+inum+igen checksum for seeding inode checksums */
__u32 i_csum_seed;
diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index c8c6885..46943ed 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -365,7 +365,15 @@ static inline void ext4_update_inode_fsync_trans(handle_t 
*handle,
ei->i_sync_tid = handle->h_transaction->t_tid;
if (datasync)
ei->i_datasync_tid = handle->h_transaction->t_tid;
-   }
+   } else {
+   struct request_queue *q = bdev_get_queue(inode->i_sb->s_bdev);
+   if (q)
+   atomic_set(&EXT4_I(inode)->i_flush_tag,
+  atomic_read(&q->flush_tag));
+   else
+   atomic_set(&EXT4_I(inode)->i_flush_tag, UINT_MAX);
+   }
+
 }
 
 /* super.c */
diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index 8a0dee8..b02d1ec 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -116,10 +116,10 @@ int ext4_sync_file(struct file *file, loff_t start, 
loff_t end, int datasync)
struct inode *inode = file->f_mapping->host;
struct ext4_inode_info *ei = EXT4_I(inode);
journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
+   bool needs_barrier = journal->j_flags & JBD2_BARRIER;
+   struct request_queue *q = bdev_get_queue(inode->i_sb->s_bdev);
int ret, err;
tid_t commit_tid;
-   bool needs_barrier = false;
-
J_ASSERT(ext4_journal_current_handle() == NULL);
 
trace_ext4_sync_file_enter(file, datasync);
@@ -163,10 +163,16 @@ int ext4_sync_file(struct file *file, loff_t start, 
loff_t end, int datasync)
}
 
commit_tid = datasync ? ei->i_datasync_tid : ei->i_sync_tid;
-   if (journal->j_flags & JBD2_BARRIER &&
-   !jbd2_trans_will_send_data_barrier(journal, &commit_tid))
-   needs_barrier = true;
ret = jbd2_complete_transaction(journal, commit_tid);
+   /*
+* We must send a barrier unless we can guarantee that:
+* Latest io-requst for given inode was completed before
+* new flush request was QUEUED and COMPLETED by blkdev.
+*/
+   if (q && ((unsigned int)atomic_read(&q->flush_tag) & ~1U)
+   > (((unsigned int)atomic_read(&ei->i_flush_tag) + 1U) & (~1U)))
+   needs_barrier = 0;
+
if (needs_barrier) {
err = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
if (!ret)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 1be5827..761513c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3073,11 +3073,12 @@ static void ext4_end_io_dio(struct kiocb *iocb, loff_t 
offset,
  size);
 
iocb->private = NULL;
-
/* if not aio dio with unwritten extents, just free io and return */
if (!(io_end->flag & EXT4_IO_END_UNWRITTEN)) {
ext4_free_io_end(io_end);
 out:
+   if (size)
+   ext4_update_inode_fsync_trans(NULL, inode, 1);
inode_dio_done(inode);
if (is_async)
aio_complete(iocb, ret, 0);
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 047a6de..8a2a09b 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -282,7 +282,7 @@ static void ext4_end_bio(struct bio *bio, int error)
}
io_end->num_io_pages = 0;
inode = io_end->inode;
-
+   ext4_update_inode_fsync_trans(NULL, inode, 1);
if (error) {
io_end->flag |= EXT4_IO_END_ERROR;
ext4_warning(inode->i_sb, "I/O error writing to inode %lu "
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] blkdev: add flush generation counter

2013-04-14 Thread Dmitry Monakhov

Callers may use this counter to optimize flushes

Signed-off-by: Dmitry Monakhov 
---
 block/blk-core.c   |1 +
 block/blk-flush.c  |3 ++-
 include/linux/blkdev.h |1 +
 3 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 074b758..afb5a4b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -537,6 +537,7 @@ void blk_cleanup_queue(struct request_queue *q)
spin_unlock_irq(lock);
mutex_unlock(&q->sysfs_lock);
 
+   atomic_set(&q->flush_tag, 0);
/*
 * Drain all requests queued before DYING marking. Set DEAD flag to
 * prevent that q->request_fn() gets invoked after draining finished.
diff --git a/block/blk-flush.c b/block/blk-flush.c
index cc2b827..b1adc75 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -203,7 +203,7 @@ static void flush_end_io(struct request *flush_rq, int 
error)
/* account completion of the flush request */
q->flush_running_idx ^= 1;
elv_completed_request(q, flush_rq);
-
+   atomic_inc(&q->flush_tag);
/* and push the waiting requests to the next stage */
list_for_each_entry_safe(rq, n, running, flush.list) {
unsigned int seq = blk_flush_cur_seq(rq);
@@ -268,6 +268,7 @@ static bool blk_kick_flush(struct request_queue *q)
q->flush_rq.end_io = flush_end_io;
 
q->flush_pending_idx ^= 1;
+   atomic_inc(&q->flush_tag);
list_add_tail(&q->flush_rq.queuelist, &q->queue_head);
return true;
 }
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 78feda9..e079fbd 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -416,6 +416,7 @@ struct request_queue {
unsigned intflush_queue_delayed:1;
unsigned intflush_pending_idx:1;
unsigned intflush_running_idx:1;
+   atomic_tflush_tag;
unsigned long   flush_pending_since;
struct list_headflush_queue[2];
struct list_headflush_data_in_flight;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] freescale: Update logging style

2013-04-14 Thread David Miller

From: Joe Perches 
Date: Sat, 13 Apr 2013 22:03:16 -0700

> Convert various printk logging styles to current styles.
> 
> Uncompiled, untested.
> 
> Joe Perches (3):
>   fec: Convert printks to netdev_
>   gianfar: Use netdev_ when possible
>   ucc_geth: Convert ugeth_ to pr_

All applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] ptrace/x86: simplify ptrace_write_dr7()

2013-04-14 Thread Oleg Nesterov

On 04/14, Jan Kratochvil wrote:
>
> On Sun, 14 Apr 2013 21:12:05 +0200, Oleg Nesterov wrote:
> > Jan, Frederic, et all. What do you think we should do?
> >
> > 1. Change ptrace_write_dr7() to do register_user_hw_breakpoint()
> >if necessary.
> >
> >This is what I was going to do, but I am no longer sure
> >we want this. For what? Unlikely it is very useful to use
> >the "default" addr == 0 for debugging.
>
> I do not understand how these functions map to the PTRACE_* syscall.
>
> But this was a regression from the application point of view as some
> application did/do:
>   * waitpid - get the process to: t (tracing stop)
>   * PTRACE_POKEUSER DR7, enableDR0
>   * PTRACE_POKEUSER DR0, address
>   * PTRACE_CONT
>
> This was perfectly valid before, there is no "default" addr == 0 used for any
> debugging.  Just the applications did not care about PTRACE_POKEUSER ordering.
> This is also how the bug was found.

Yes, exactly.

Except 'there is no "default" addr == 0', the first
"PTRACE_POKEUSER DR7, enableDR0" used addr == 0 and then it was
changed by "PTRACE_POKEUSER DR0".

And once again, I am ready to make the patch, it should be simple.
Just I am not sure it worth the trouble, so I decided to ask first.
Nobody noticed this problem(?) except you, and this was broken a
long ago.

PTRACE_POKEUSER DR0, address
PTRACE_POKEUSER DR7, enableDR0

should work and this looks better, we do not enable bp until it
has the correct address set.  Of course this doesn't really matter
if the tracee doesn't not run in between, but still...

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V4 1/6] clk: OMAP: introduce device tree binding to kernel clock data

2013-04-14 Thread Nishanth Menon

On 10:22-20130413, Tony Lindgren wrote:
> * Nishanth Menon  [130412 16:43]:
> > Thanks for checking up. Fixed all of them below, will post part of
> > series again, only if I need to address further comments in other
> > patches..
> 
> Thanks it seems that the other ones are ready to go, just one
> more comment below.
> 
[...]
> > +static struct clk *omap_clk_src_get(struct of_phandle_args *clkspec, void 
> > *data)
> > +{
> > +   struct clk *clk;
> > +   char clk_name[32];
> > +   struct device_node *np = clkspec->np;
> > +
> > +   snprintf(clk_name, 32, "%s_ck", np->name);
> > +   clk = clk_get(NULL, clk_name);
> > +   if (IS_ERR(clk)) {
> > +   pr_err("%s: could not get clock %s(%ld)\n", __func__,
> > +  clk_name, PTR_ERR(clk));
> > +   goto out;
> > +   }
> > +   clk_put(clk);
> 
> It seems that clk_put() is actually wrong here. That's because
> of_clk_get() should boild down to just the look up of the clock 
> and then clk_get() on it, so no double clk_get() is done in this
> case. Once the consumer driver is done, it will just call clk_put()
> on it.
Yep - updated version below.
>From d0bf3fce235cff46feac7f5ef1a40e2fa0f2aa12 Mon Sep 17 00:00:00 2001
From: Nishanth Menon 
Date: Tue, 9 Apr 2013 19:26:40 -0500
Subject: [PATCH V5.1 1/6] clk: OMAP: introduce device tree binding to kernel 
clock
 data

OMAP clock data is located in arch/arm/mach-omap2/cclockXYZ_data.c.
However, this presents an obstacle for using these clock nodes in
Device Tree definitions. This is especially true for board specific
clocks initially. The fixed clocks are currently found via clock
aliases table. There are many possible approaches to this problem as
discussed in the following thread:
http://marc.info/?t=13637032569&r=1&w=2.
Highlights of the options:
a) device specific clk_add_alias:
   cons: driver handling required
b) using an generic clk node and indexing to reach the clock required.
   This is similar in approach taken by tegra and few other platforms.
   Example usage: clock = <&clk 5>;
   cons: potential to have mismatches in indexed table and associated
   dtb data. In addition, managing continued documentation in bindings
   as clock indexing increases. Even though readability angle could be
   improved by using preprocessing of DT using macros, indexed
   approach is inherently risky from cases like the following:
   clk indexes in kernel:
   1 - mpu_dpll
   2 - aux_clk1
   3 - core_clk
   DT entry for peripheral X uses <&clk 2> to reach aux_clk1. Now, let's
   say kernel updates indices to:
   1 - mpu_dpll
   2 - per_dpll
   3 - aux_clk1
   4 - core_clk
   using the old dtb(or dts missing an update), on new kernel which
   has updated indices will result in per_dpll now controlled for
   peripheral X without warning or any potential error detection.

   Even though we could claim this is user error, such errors are hard
   to track down and fix.

An alternate approach introduced here is to introduce device tree
bindings corresponding to the clock nodes required in DT definition
for SoC which automatically maps back to the definitions in
cclockXYZ_data.c.

The driver introduced here to do this mapping will eventually be the
place where the clock handling will migrate to. We need to consider
this angle as well so that the solution will be an valid transition
point for moving the clock data out of kernel image (into device tree
or firmware load etc..).

Overall strategy introduced here is simple: a clock node described in
device tree blob is used to identify the exact clock provided in the
SoC specific data. This is then linked back using of_clk_add_provider
to the device node to be accessible by of_clk_get.

Based on discussion contributions from Roger Quadros, Grygorii Strashko
and others.

Cc: Kevin Hilman 
Cc: Mike Turquette 
Cc: Paul Walmsley 
[t...@atomide.com: co-developed]
Signed-off-by: Tony Lindgren 
Signed-off-by: Nishanth Menon 
---
 .../devicetree/bindings/clock/omap-clock.txt   |   40 +
 drivers/clk/Makefile   |1 +
 drivers/clk/omap/Makefile  |1 +
 drivers/clk/omap/clk.c |   91 
 4 files changed, 133 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/clock/omap-clock.txt
 create mode 100644 drivers/clk/omap/Makefile
 create mode 100644 drivers/clk/omap/clk.c

diff --git a/Documentation/devicetree/bindings/clock/omap-clock.txt 
b/Documentation/devicetree/bindings/clock/omap-clock.txt
new file mode 100644
index 000..047c1e7
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/omap-clock.txt
@@ -0,0 +1,40 @@
+Device Tree Clock bindings for Texas Instrument's OMAP compatible platforms
+
+This binding is an initial minimal binding that may be enhanced as part of
+transitioning OMAP clock data out of kernel image.
+
+This binding uses the common clock binding[1].
+
+[1] Documentation/devicetree/bindings/clock/clock-bindings.txt

[PATCH] drivers: dma: Use devm_request_and_ioremap

2013-04-14 Thread Alexandru Gheorghiu

Use devm_request_and_ioremap function which provides more consistent error
handling.

Signed-off-by: Alexandru Gheorghiu 
---
 drivers/dma/txx9dmac.c |6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/dma/txx9dmac.c b/drivers/dma/txx9dmac.c
index 913f55c..471f9f1 100644
--- a/drivers/dma/txx9dmac.c
+++ b/drivers/dma/txx9dmac.c
@@ -1217,11 +1217,7 @@ static int __init txx9dmac_probe(struct platform_device 
*pdev)
if (!ddev)
return -ENOMEM;
 
-   if (!devm_request_mem_region(&pdev->dev, io->start, resource_size(io),
-dev_name(&pdev->dev)))
-   return -EBUSY;
-
-   ddev->regs = devm_ioremap(&pdev->dev, io->start, resource_size(io));
+   ddev->regs = devm_request_and_ioremap(&pdev->dev, io);
if (!ddev->regs)
return -ENOMEM;
ddev->have_64bit_regs = pdata->have_64bit_regs;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.6.11.1-rt32

2013-04-14 Thread Carsten Emde


Hi Steven,


I'm pleased to announce the 3.6.11.1-rt32 stable release.

Unfortunately, there is another compile error:
drivers/gpu/drm/i915/i915_gem.c: In function ‘i915_gem_wait_for_error’:
drivers/gpu/drm/i915/i915_gem.c:118:3: warning: passing argument 1 of
‘rt_spin_lock’ from incompatible pointer type [enabled by default]
In file included from include/linux/spinlock.h:273:0,
   from include/linux/wait.h:24,
   from include/linux/fs.h:396,
   from include/drm/drmP.h:47,
   from drivers/gpu/drm/i915/i915_gem.c:28:
[..]
I would propose to adopt the mechanism that Sebastian introduced in
3.8.4-rt2 (https://lkml.org/lkml/2013/3/26/600). The kernel compiles
and runs without any problem with the below patch on a system that
requires the i915 driver module.

Thanks Carsten, I'll be updating this later today.

Thank you.


BTW, did you get any core dumps from the work queue race that we've
been seeing?

No, not yet. Originally, the farm systems did not use crashkernels by
default. I understood that it does no harm but could help in cases like
this one. Therefore, I've started to reconfigure all farm system with
crashkernels - starting with the two systems that had the work queue
race crashes. The kernel messages here (one is a 12-core, the other one
a 32-core box) look exactly like the one 
(https://lkml.org/lkml/2013/3/18/325)

you saw in your 40-core machine (https://lkml.org/lkml/2013/3/18/430).
We'll need to wait for the next crash that will give us a core dump we
may then dissect.

-Carsten.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] lib: digsig: Use ERR_CAST function

2013-04-14 Thread Alexandru Gheorghiu

Use ERR_CAST function instead of ERR_PTR and PTR_ERR.
Patch found using coccinelle.

Signed-off-by: Alexandru Gheorghiu 
---
 lib/digsig.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/digsig.c b/lib/digsig.c
index 2f31e6a..8793aed 100644
--- a/lib/digsig.c
+++ b/lib/digsig.c
@@ -209,7 +209,7 @@ int digsig_verify(struct key *keyring, const char *sig, int 
siglen,
kref = keyring_search(make_key_ref(keyring, 1UL),
&key_type_user, name);
if (IS_ERR(kref))
-   key = ERR_PTR(PTR_ERR(kref));
+   key = ERR_CAST(kref);
else
key = key_ref_to_ptr(kref);
} else {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Cannot add new efi boot entry

2013-04-14 Thread Jiri Slaby

Hi,

after update to 3.8, every update of the kernel ends up in an unbootable
machine. It is due to the following commit:
commit 68d929862e29a8b52a7f2f2f86a0600423b093cd
Author: Matthew Garrett 
Date:   Sat Mar 2 19:40:17 2013 -0500

efi: be more paranoid about available space when creating variables

efibootmgr which tries to add an entry and silently fails when writing
to /sys/firmware/efi/vars/new_var with -ENOSPC.

There are many entries in there:
# efibootmgr
BootCurrent: 000D
Timeout: 0 seconds
BootOrder:
0018,,0001,0002,0003,0007,0008,0009,000A,000B,000C,000D,000E,000F,0010,0011,0012
Boot  Setup
Boot0001  Boot Menu
Boot0002  Diagnostic Splash Screen
Boot0003  Lenovo Diagnostics
Boot0004  Startup Interrupt Menu
Boot0005  ME Configuration Menu
Boot0006  Rescue and Recovery
Boot0007* USB CD
Boot0008* USB FDD
Boot0009* ATAPI CD0
Boot000A* ATA HDD0
Boot000B* ATA HDD1
Boot000C* ATA HDD2
Boot000D* USB HDD
Boot000E* PCI LAN
Boot000F* ATAPI CD1
Boot0010  Other CD
Boot0011* ATA HDD3
Boot0012  Other HDD
Boot0013* IDER BOOT CDROM
Boot0014* IDER BOOT Floppy
Boot0015* ATA HDD
Boot0016* ATAPI CD:
Boot0017* PCI LAN
Boot0018* Linux


Remaining size is about 20k, added entry size is hundreds bytes, store
size is 64k.

Obviously lowering the limitation from 1/2 to 1/4 fixes the problem for
me because it always worked on my setup to store a new entry...

Any ideas how to overcome that? It would be better to blacklist bad
machines rather than whitelist good ones, right?

thanks,
-- 
js
suse labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] drivers: usb: gadget: Use ERR_CAST function

2013-04-14 Thread Alexandru Gheorghiu

Use ERR_CAST function instead of ERR_PTR and PTR_ERR.
Patch found using coccinelle.

Signed-off-by: Alexandru Gheorghiu 
---
 drivers/usb/gadget/composite.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/gadget/composite.c b/drivers/usb/gadget/composite.c
index 7c821de..5b73a74 100644
--- a/drivers/usb/gadget/composite.c
+++ b/drivers/usb/gadget/composite.c
@@ -1138,7 +1138,7 @@ struct usb_string *usb_gstrings_attach(struct 
usb_composite_dev *cdev,
 
uc = copy_gadget_strings(sp, n_gstrings, n_strings);
if (IS_ERR(uc))
-   return ERR_PTR(PTR_ERR(uc));
+   return ERR_CAST(uc);
 
n_gs = get_containers_gs(uc);
ret = usb_string_ids_tab(cdev, n_gs[0]->strings);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] fs: reiserfs: Use kstrdup function

2013-04-14 Thread Alexandru Gheorghiu

Use kstrdup function instead of kmalloc and strcpy.
Patch found using coccinelle.

Signed-off-by: Alexandru Gheorghiu 
---
 fs/reiserfs/super.c |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c
index 194113b..f8a23c3 100644
--- a/fs/reiserfs/super.c
+++ b/fs/reiserfs/super.c
@@ -1147,8 +1147,7 @@ static int reiserfs_parse_options(struct super_block *s, 
char *options,   /* strin
 "on filesystem root.");
return 0;
}
-   qf_names[qtype] =
-   kmalloc(strlen(arg) + 1, GFP_KERNEL);
+   qf_names[qtype] = kstrdup(arg, GFP_KERNEL);
if (!qf_names[qtype]) {
reiserfs_warning(s, "reiserfs-2502",
 "not enough memory "
@@ -1156,7 +1155,6 @@ static int reiserfs_parse_options(struct super_block *s, 
char *options,   /* strin
 "quotafile name.");
return 0;
}
-   strcpy(qf_names[qtype], arg);
if (qtype == USRQUOTA)
*mount_options |= 1 << 
REISERFS_USRQUOTA;
else
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] staging: ramster: add how-to for ramster

2013-04-14 Thread Wanpeng Li

From: Dan Magenheimer 

Add how-to for ramster.

Singed-off-by: Dan Magenheimer 
Signed-off-by: Wanpeng Li 
---
 drivers/staging/zcache/ramster/HOWTO.txt |  249 ++
 1 file changed, 249 insertions(+)
 create mode 100644 drivers/staging/zcache/ramster/HOWTO.txt

diff --git a/drivers/staging/zcache/ramster/HOWTO.txt 
b/drivers/staging/zcache/ramster/HOWTO.txt
new file mode 100644
index 000..e6387e8
--- /dev/null
+++ b/drivers/staging/zcache/ramster/HOWTO.txt
@@ -0,0 +1,249 @@
+Version: 130309
+ Dan Magenheimer 
+
+This is a how-to document for RAMster.  It applies to the March 9, 2013
+version of RAMster, re-merged with the new zcache codebase, built and tested
+on the 3.9 tree and submitted for the staging tree for 3.9.
+
+Note that this document was created from notes taken earlier.  I would
+appreciate any feedback from anyone who follows the process as described
+to confirm that it works and to clarify any possible misunderstandings,
+or to report problems.
+
+A. PRELIMINARY
+
+1) Install two or more Linux systems that are known to work when upgraded
+   to a recent upstream Linux kernel version (e.g. v3.9).  I used Oracle
+   Linux 6 ("OL6") on two Dell Optiplex 790s.  Note that it should be possible
+   to use ocfs2 as a filesystem on your systems but this hasn't been
+   tested thoroughly, so if you do use ocfs2 and run into problems, please
+   report them.  Up to eight nodes should work, but not much testing has
+   been done with more than three nodes.
+
+On each system:
+
+2) Configure, build and install then boot Linux (e.g. 3.9), just to ensure it
+   can be done with an unmodified upstream kernel.  Confirm you booted
+   the upstream kernel with "uname -a".
+
+3) Install ramster-tools.  The src.rpm and an OL6 rpm are available
+   in this directory.  I'm not very good at userspace stuff and
+   would welcome any help in turning ramster-tools into more
+   distributable rpms/debs for a wider range of distros.
+
+B. BUILDING RAMSTER INTO THE KERNEL
+
+Do the following on each system:
+
+1) Ensure you have the new codebase for drivers/staging/zcache in your source.
+
+2) Change your .config to have:
+
+   CONFIG_CLEANCACHE=y
+   CONFIG_FRONTSWAP=y
+   CONFIG_STAGING=y
+   CONFIG_ZCACHE=y
+   CONFIG_RAMSTER=y
+
+   You may have to reconfigure your kernel multiple times to ensure
+   all of these are set properly.  I use:
+
+   # yes "" | make oldconfig
+
+   and then manually check the .config file to ensure my selections
+   have "taken".
+
+   Do not bother to build the kernel until you are certain all of
+   the above config selections will stick for the build.
+
+3) Build this kernel and "make install" so that you have a new kernel
+   in /etc/grub.conf
+
+4) Add "ramster" to the kernel boot line in /etc/grub.conf.
+
+5) Reboot and check dmesg to ensure there are some messages from ramster
+   and that "ramster_enabled=1" appears.
+
+   # dmesg | grep ramster
+
+   You should also see a lot of files in:
+
+   # ls /sys/kernel/debug/zcache
+   # ls /sys/kernel/debug/ramster
+
+   and a few files in:
+
+   # ls /sys/kernel/mm/ramster
+
+   RAMster now will act as a single-system zcache but doesn't yet
+   know anything about the cluster so can't do anything remotely.
+
+C. BUILDING THE RAMSTER CLUSTER
+
+This is the error prone part unless you are a clustering expert.  We need
+to describe the cluster in /etc/ramster.conf file and the init scripts
+that parse it are extremely picky about the syntax.
+
+1) Create the /etc/ramster.conf file and ensure it is identical
+   on both systems.  There is a good amount of similar documentation
+   for ocfs2 /etc/cluster.conf that can be googled for this, but I use:
+
+   cluster:
+   name = ramster
+   node_count = 2
+   node:
+   name = system1
+   cluster = ramster
+   number = 0
+   ip_address = my.ip.ad.r1
+   ip_port = 
+   node:
+   name = system2
+   cluster = ramster
+   number = 0
+   ip_address = my.ip.ad.r2
+   ip_port = 
+
+   You must ensure that the "name" field in the file exactly matches
+   the output of "hostname" on each system.  The following assumes
+   you use "ramster" as the name of your cluster.
+
+2) Enable the ramster service and configure it:
+
+   # chkconfig --add ramster
+   # service ramster configure
+
+   Set "load on boot" to "y", cluster to start is "ramster" (or whatever
+   name you chose in ramster.conf), heartbeat dead threshold as "500",
+   network idle timeout as "100".  Leave the others as default.
+
+4) Reboot.  After reboot, try:
+
+   # service ramster status
+
+   You should see "Checking ramster cluster ramster: Online".  If you do
+   not, something is wrong and RAMster will not work.  Note that you
+   should also see that the driver for "configfs" is loaded a

Re: [PATCH] staging: ramster: add how-to for ramster

2013-04-14 Thread Greg Kroah-Hartman

On Mon, Apr 15, 2013 at 07:56:56AM +0800, Wanpeng Li wrote:
> +This is a how-to document for RAMster.  It applies to the March 9, 2013
> +version of RAMster, re-merged with the new zcache codebase, built and tested
> +on the 3.9 tree and submitted for the staging tree for 3.9.

This is not needed at all, given that it should just reflect the state
of the code in the kernel that this file is present in.  Please remove
it.

> +Note that this document was created from notes taken earlier.  I would
> +appreciate any feedback from anyone who follows the process as described
> +to confirm that it works and to clarify any possible misunderstandings,
> +or to report problems.

Is this needed?

> +A. PRELIMINARY
> +
> +1) Install two or more Linux systems that are known to work when upgraded
> +   to a recent upstream Linux kernel version (e.g. v3.9).  I used Oracle
> +   Linux 6 ("OL6") on two Dell Optiplex 790s.  Note that it should be 
> possible
> +   to use ocfs2 as a filesystem on your systems but this hasn't been
> +   tested thoroughly, so if you do use ocfs2 and run into problems, please
> +   report them.  Up to eight nodes should work, but not much testing has
> +   been done with more than three nodes.
> +
> +On each system:
> +
> +2) Configure, build and install then boot Linux (e.g. 3.9), just to ensure it
> +   can be done with an unmodified upstream kernel.  Confirm you booted
> +   the upstream kernel with "uname -a".
> +
> +3) Install ramster-tools.  The src.rpm and an OL6 rpm are available
> +   in this directory.  I'm not very good at userspace stuff and
> +   would welcome any help in turning ramster-tools into more
> +   distributable rpms/debs for a wider range of distros.

This isn't true, the rpms are not here.

> +B. BUILDING RAMSTER INTO THE KERNEL
> +
> +Do the following on each system:
> +
> +1) Ensure you have the new codebase for drivers/staging/zcache in your 
> source.
> +
> +2) Change your .config to have:
> +
> + CONFIG_CLEANCACHE=y
> + CONFIG_FRONTSWAP=y
> + CONFIG_STAGING=y
> + CONFIG_ZCACHE=y
> + CONFIG_RAMSTER=y
> +
> +   You may have to reconfigure your kernel multiple times to ensure
> +   all of these are set properly.  I use:
> +
> + # yes "" | make oldconfig
> +
> +   and then manually check the .config file to ensure my selections
> +   have "taken".

This last bit isn't needed at all.  Just stick to the "these are the
settings you need enabled."

> +   Do not bother to build the kernel until you are certain all of
> +   the above config selections will stick for the build.
> +
> +3) Build this kernel and "make install" so that you have a new kernel
> +   in /etc/grub.conf

Don't assume 'make install' works for all distros, nor that
/etc/grub.conf is a grub config file (hint, it usually isn't, and what
about all the people not even using grub for their bootloader?)

> +4) Add "ramster" to the kernel boot line in /etc/grub.conf.

Again, drop grub.conf reference

> +5) Reboot and check dmesg to ensure there are some messages from ramster
> +   and that "ramster_enabled=1" appears.
> +
> + # dmesg | grep ramster

Are you sure ramster still spits out messages?  If so, provide an
example of what it should look like.

> +   You should also see a lot of files in:
> +
> + # ls /sys/kernel/debug/zcache
> + # ls /sys/kernel/debug/ramster

You forgot to mention that debugfs needs to be mounted.

> +   and a few files in:
> +
> + # ls /sys/kernel/mm/ramster
> +
> +   RAMster now will act as a single-system zcache but doesn't yet
> +   know anything about the cluster so can't do anything remotely.
> +
> +C. BUILDING THE RAMSTER CLUSTER
> +
> +This is the error prone part unless you are a clustering expert.  We need
> +to describe the cluster in /etc/ramster.conf file and the init scripts
> +that parse it are extremely picky about the syntax.
> +
> +1) Create the /etc/ramster.conf file and ensure it is identical
> +   on both systems.  There is a good amount of similar documentation
> +   for ocfs2 /etc/cluster.conf that can be googled for this, but I use:
> +
> + cluster:
> + name = ramster
> + node_count = 2
> + node:
> + name = system1
> + cluster = ramster
> + number = 0
> + ip_address = my.ip.ad.r1
> + ip_port = 
> + node:
> + name = system2
> + cluster = ramster
> + number = 0
> + ip_address = my.ip.ad.r2
> + ip_port = 
> +
> +   You must ensure that the "name" field in the file exactly matches
> +   the output of "hostname" on each system.  The following assumes
> +   you use "ramster" as the name of your cluster.
> +
> +2) Enable the ramster service and configure it:
> +
> + # chkconfig --add ramster
> + # service ramster configure

That's a huge assumption as to how your system config/startup scripts
work, right?  Not all the world is using old-style system V init
any

RE: [PATCH 3.8-stable] gpio: fix wrong checking condition for gpio range

2013-04-14 Thread Jonghwan Choi

Dear Haojian Ahuang.

> > This patch looks like it should be in the 3.8-stable tree, should we
> apply
> > it?
> >
> 
> It could be merged into 3.8-stable tree.
>

Thanks~

Best Regards.

> -Original Message-
> From: Haojian Zhuang [mailto:haojian.zhu...@linaro.org]
> Sent: Sunday, April 14, 2013 12:27 AM
> To: Jonghwan Choi
> Cc: Linus Walleij; sta...@vger.kernel.org; linux-kernel@vger.kernel.org;
> Jonghwan Choi
> Subject: Re: [PATCH 3.8-stable] gpio: fix wrong checking condition for
> gpio range
> 
> On 13 April 2013 22:46, Jonghwan Choi  wrote:
> > From: Haojian Zhuang 
> >
> > This patch looks like it should be in the 3.8-stable tree, should we
> apply
> > it?
> >
> 
> It could be merged into 3.8-stable tree.
> 
> Regards
> Haojian

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 3/4] x86, kdump: Change crashkernel_high/low= to crashkernel=,high/low

2013-04-14 Thread HATAYAMA Daisuke

(2013/04/12 15:54), Yinghai Lu wrote:

> Index: linux-2.6/kernel/kexec.c
> ===
> --- linux-2.6.orig/kernel/kexec.c
> +++ linux-2.6/kernel/kexec.c
> @@ -1368,35 +1368,114 @@ static int __init parse_crashkernel_simp
>   return 0;
>   }
>   
> +#define SUFFIX_HIGH 0
> +#define SUFFIX_LOW  1
> +#define SUFFIX_NULL 2
> +static __initdata char *suffix_tbl[] = {
> + [SUFFIX_HIGH] = ",high",
> + [SUFFIX_LOW]  = ",low",
> + [SUFFIX_NULL] = NULL,
> +};
> +
>   /*
> - * That function is the entry point for command line parsing and should be
> - * called from the arch-specific code.
> + * That function parses "suffix"  crashkernel command lines like
> + *
> + *   crashkernel=size,[high|low]
> + *
> + * It returns 0 on success and -EINVAL on failure.
>*/
> +static int __init parse_crashkernel_suffix(char *cmdline,
> +unsigned long long   *crash_size,
> +unsigned long long   *crash_base,
> +const char *suffix)
> +{
> + char *cur = cmdline;
> +
> + *crash_size = memparse(cmdline, &cur);
> + if (cmdline == cur) {
> + pr_warn("crashkernel: memory value expected\n");
> + return -EINVAL;
> + }
> +
> + /* check with suffix */
> + if (strncmp(cur, suffix, strlen(suffix))) {
> + pr_warn("crashkernel: unrecognized char\n");
> + return -EINVAL;
> + }
> + cur += strlen(suffix);
> + if (*cur != ' ' && *cur != '\0') {
> + pr_warn("crashkernel: unrecognized char\n");
> + return -EINVAL;
> + }
> +
> + return 0;
> +}

Thanks, looks good to me.

-- 
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 1/2] watchdog: introduce new watchdog AUTOSTART option

2013-04-14 Thread Kim, Milo

Hi Guenter

> I really don't like that idea. It defeats a significant part of the
> purpose
> for having a watchdog, which is to prevent user-space hangups.
> 
> To make this a driver option is even more odd - it forces every user of
> this
> driver to use it in-kernel only, and makes /dev/watchdog quite useless.
> 
> I mean, really, if you have such a watchdog, what is the point of using
> the
> watchdog infrastructure in the first place ? Just make it a kernel
> thread or
> timer-activated platform code which pings your watchdog once in a while.
> No
> need to get the watchdog infrastructure involved in the first place.
> 
> Am I missing something ?

I wanted to enable the watchdog timer without the watchdog application for
making sure the system alive.
However, I think I misunderstood the purpose of the watchdog driver.
The watchdog is for detecting user-space hangups rather than kernel stall.
Is it correct? If yes, this patch is totally wrong.

Thanks!

Milo

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Sleeping process from kernel space: not doing schedule?

2013-04-14 Thread Jose Navas

Hi all,

First of all, I am not subscribed to the mailing list and I'd like to get the 
answers directly to my email. Thank you!

I am working on a project where I am trying to detect the Out Of Memory machine 
state and collect some data from the machine. I've created a LKM that hacks the 
do_brk call and checks if there is enough memory to perform the call. I 
successfully detect the OOM state and my next step is sending a signal to a 
process in user space that writes info to a log file (this way, I avoid the 
necessity of opening files in kernel space). After sending the signal, the LKM 
puts the current process to sleep, using the function 
schedule_timeout_uninterruptible(). What I expect, is to see the process in 
user space running some time while the process is sleeping, but, instead, I see 
that the user process do not run until the sleeping process has finished...

I'm suspecting that it is not doing scheduling or something similar. I'm 
missing something? Is this the correct way of putting a process to sleep from 
kernel space?

Thank you!!!

Jose  --
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/4] cgroup: convert cgroupfs_root flag bits to masks and add CGRP_ prefix

2013-04-14 Thread Serge Hallyn

Quoting Tejun Heo (t...@kernel.org):
> There's no reason to be using bitops, which tends to be more
> cumbersome, to handle root flags.  Convert them to masks.  Also, as
> they'll be moved to include/linux/cgroup.h and it's generally a good
> idea, add CGRP_ prefix.
> 
> Note that flags are assigned from (1 << 1).  The first bit will be
> used by a flag which will be added soon.
> 
> Signed-off-by: Tejun Heo 

This *is* much nicer to read, thanks.

Acked-by: Serge E. Hallyn 

> ---
>  kernel/cgroup.c | 21 ++---
>  1 file changed, 10 insertions(+), 11 deletions(-)
> 
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 678a22c..a372eaa 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -296,10 +296,10 @@ bool cgroup_is_descendant(struct cgroup *cgrp, struct 
> cgroup *ancestor)
>  }
>  EXPORT_SYMBOL_GPL(cgroup_is_descendant);
>  
> -/* bits in struct cgroupfs_root flags field */
> +/* cgroupfs_root->flags */
>  enum {
> - ROOT_NOPREFIX,  /* mounted subsystems have no named prefix */
> - ROOT_XATTR, /* supports extended attributes */
> + CGRP_ROOT_NOPREFIX  = (1 << 1), /* mounted subsystems have no named 
> prefix */
> + CGRP_ROOT_XATTR = (1 << 2), /* supports extended attributes */
>  };
>  
>  static int cgroup_is_releasable(const struct cgroup *cgrp)
> @@ -1137,9 +1137,9 @@ static int cgroup_show_options(struct seq_file *seq, 
> struct dentry *dentry)
>   mutex_lock(&cgroup_root_mutex);
>   for_each_subsys(root, ss)
>   seq_printf(seq, ",%s", ss->name);
> - if (test_bit(ROOT_NOPREFIX, &root->flags))
> + if (root->flags & CGRP_ROOT_NOPREFIX)
>   seq_puts(seq, ",noprefix");
> - if (test_bit(ROOT_XATTR, &root->flags))
> + if (root->flags & CGRP_ROOT_XATTR)
>   seq_puts(seq, ",xattr");
>   if (strlen(root->release_agent_path))
>   seq_printf(seq, ",release_agent=%s", root->release_agent_path);
> @@ -1202,7 +1202,7 @@ static int parse_cgroupfs_options(char *data, struct 
> cgroup_sb_opts *opts)
>   continue;
>   }
>   if (!strcmp(token, "noprefix")) {
> - set_bit(ROOT_NOPREFIX, &opts->flags);
> + opts->flags |= CGRP_ROOT_NOPREFIX;
>   continue;
>   }
>   if (!strcmp(token, "clone_children")) {
> @@ -1210,7 +1210,7 @@ static int parse_cgroupfs_options(char *data, struct 
> cgroup_sb_opts *opts)
>   continue;
>   }
>   if (!strcmp(token, "xattr")) {
> - set_bit(ROOT_XATTR, &opts->flags);
> + opts->flags |= CGRP_ROOT_XATTR;
>   continue;
>   }
>   if (!strncmp(token, "release_agent=", 14)) {
> @@ -1293,8 +1293,7 @@ static int parse_cgroupfs_options(char *data, struct 
> cgroup_sb_opts *opts)
>* with the old cpuset, so we allow noprefix only if mounting just
>* the cpuset subsystem.
>*/
> - if (test_bit(ROOT_NOPREFIX, &opts->flags) &&
> - (opts->subsys_mask & mask))
> + if ((opts->flags & CGRP_ROOT_NOPREFIX) && (opts->subsys_mask & mask))
>   return -EINVAL;
>  
>  
> @@ -2523,7 +2522,7 @@ static struct simple_xattrs *__d_xattrs(struct dentry 
> *dentry)
>  static inline int xattr_enabled(struct dentry *dentry)
>  {
>   struct cgroupfs_root *root = dentry->d_sb->s_fs_info;
> - return test_bit(ROOT_XATTR, &root->flags);
> + return root->flags & CGRP_ROOT_XATTR;
>  }
>  
>  static bool is_valid_xattr(const char *name)
> @@ -2695,7 +2694,7 @@ static int cgroup_add_file(struct cgroup *cgrp, struct 
> cgroup_subsys *subsys,
>  
>   simple_xattrs_init(&cft->xattrs);
>  
> - if (subsys && !test_bit(ROOT_NOPREFIX, &cgrp->root->flags)) {
> + if (subsys && !(cgrp->root->flags & CGRP_ROOT_NOPREFIX)) {
>   strcpy(name, subsys->name);
>   strcat(name, ".");
>   }
> -- 
> 1.8.1.4
> 
> ___
> Containers mailing list
> contain...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] hfs/hfsplus: Convert dprint to hfs_dbg

2013-04-14 Thread Hin-Tak Leung

--- On Mon, 8/4/13, Joe Perches  wrote:

> Use a more current logging style.
> 
> Rename macro and uses.
> Add do {} while (0) to macro.
> Add DBG_ to macro.
> Add and use hfs_dbg_cont variant where appropriate.
> 
> Signed-off-by: Joe Perches 

> +++ b/fs/hfs/hfs_fs.h
> @@ -34,8 +34,18 @@
>  //#define DBG_MASK   
> (DBG_CAT_MOD|DBG_BNODE_REFS|DBG_INODE|DBG_EXTENT)
>  #define DBG_MASK    (0)
>  
> -#define dprint(flg, fmt, args...) \
> -    if (flg & DBG_MASK) printk(fmt , ##
> args)
> +#define hfs_dbg(flg, fmt, ...)   
>             \
> +do {       
>            
>         \
> +    if (DBG_##flg &
> DBG_MASK)       
>     \
> +        printk(KERN_DEBUG
> fmt, ##__VA_ARGS__);    \
> +} while (0)
> +
> +#define hfs_dbg_cont(flg, fmt, ...)   
>         \
> +do {       
>            
>         \
> +    if (DBG_##flg &
> DBG_MASK)       
>     \
> +        printk(KERN_CONT fmt,
> ##__VA_ARGS__);    \
> +} while (0)
> +
>  
>  /*
>   * struct hfs_inode_info

This set of change seems to be somewhat zealous - it doesn't offer any benefits 
other than possibly satisfying somebody's idea of code-purity.

FWIW, I have been sitting on a patch which changes this part of the code to 
dynamic debugging, and it is much simplier. Just:

=
diff --git a/fs/hfsplus/hfsplus_fs.h b/fs/hfsplus/hfsplus_fs.h
index e298b83..55d211d 100644
--- a/fs/hfsplus/hfsplus_fs.h
+++ b/fs/hfsplus/hfsplus_fs.h
@@ -45,8 +25,7 @@
 #define HFSPLUS_JOURNAL_SWAP   1

 #define dprint(flg, fmt, args...) \
-   if (flg & DBG_MASK) \
-   printk(fmt , ## args)
+   pr_debug(fmt , ## args)

 /* Runtime config options */
 #define HFSPLUS_DEF_CR_TYPE0x3F3F3F3F  /* '' */
=

(and you can then remove all the DBG_* defines before that, since they then 
don't have any effect any more).

The benefit of this alternative is that it does not break any out-of-tree 
patches, while make it easier to debug say patches... and I am still sitting on 
a rather substantial set of the journal change, plus all the other issues that 
come out of it, like the folder count patch for case-sensitive file systems.

I think one needs to think very carefully about make bulk changes like this, 
which serves no real purpose other than satisfying somebody's idea of code 
purity.

The problem with such bulk "stylistic" changes, is that it forces people who 
are working on real functionalities and bug fixes to rebase their work, and 
spend time on doing so, and also at the risk introducing new bugs while 
rebasing. I know I am writing on a somewhat selfish purpose: if I need to 
rebase my work due to other's bug fixes or enhancement, etc, then fair enough, 
but I'd prefer not to rebase for the purpose of other's preference of, and 
attempts at re-arranging the style of the debug statements, when the debugging 
output means little to them.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/4] move cgroupfs_root to include/linux/cgroup.h

2013-04-14 Thread Serge Hallyn

Quoting Tejun Heo (t...@kernel.org):
> While controllers shouldn't be accessing cgroupfs_root directly, it
> being hidden inside kern/cgroup.c makes somethings pretty silly.  This
> makes routing hierarchy-wide settings which need to be visible to
> controllers cumbersome.
> 
> We're gonna add another hierarchy-wide setting which needs to be
> accessed from controllers.  Move cgroupfs_root and its flags to the
> header file so that we can access root settings with inline helpers.
> 
> Signed-off-by: Tejun Heo 

Acked-by: Serge E. Hallyn 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/4] cgroup: introduce sane_behavior mount option

2013-04-14 Thread Serge Hallyn

Quoting Tejun Heo (t...@kernel.org):
> It's a sad fact that at this point various cgroup controllers are
> carrying so many idiosyncrasies and pure insanities that it simply
> isn't possible to reach any sort of sane consistent behavior while
> maintaining staying fully compatible with what already has been
> exposed to userland.
> 
> As we can't break exposed userland interface, transitioning to sane
> behaviors can only be done in steps while maintaining backwards
> compatibility.  This patch introduces a new mount option -
> __DEVEL__sane_behavior - which disables crazy features and enforces
> consistent behaviors in cgroup core proper and various controllers.
> As exactly which behaviors it changes are still being determined, the
> mount option, at this point, is useful only for development of the new
> behaviors.  As such, the mount option is prefixed with __DEVEL__ and
> generates a warning message when used.
> 
> Eventually, once we get to the point where all controller's behaviors
> are consistent enough to implement unified hierarchy, the __DEVEL__
> prefix will be dropped, and more importantly, unified-hierarchy will
> enforce sane_behavior by default.  Maybe we'll able to completely drop
> the crazy stuff after a while, maybe not, but we at least have a
> strategy to move on to saner behaviors.
> 
> This patch introduces the mount option and changes the following
> behaviors in cgroup core.
> 
> * Mount options "noprefix" and "clone_children" are disallowed.  Also,
>   cgroupfs file cgroup.clone_children is not created.
> 
> * When mounting an existing superblock, mount options should match.
>   This is currently pretty crazy.  If one mounts a cgroup, creates a
>   subdirectory, unmounts it and then mount it again with different
>   option, it looks like the new options are applied but they aren't.
> 
> * Remount is disallowed.
> 
> The behaviors changes are documented in the comment above
> CGRP_ROOT_SANE_BEHAVIOR enum and will be expanded as different
> controllers are converted and planned improvements progress.
> 
> Signed-off-by: Tejun Heo 

Acked-by: Serge E. Hallyn 

> Cc: Li Zefan 
> Cc: Michal Hocko 
> Cc: Vivek Goyal 
> ---
>  include/linux/cgroup.h | 43 +++
>  kernel/cgroup.c| 49 +
>  2 files changed, 92 insertions(+)
> 
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index b21881e..9c300ad 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -156,6 +156,8 @@ enum {
>* specified at mount time and thus is implemented here.
>*/
>   CGRP_CPUSET_CLONE_CHILDREN,
> + /* see the comment above CGRP_ROOT_SANE_BEHAVIOR for details */
> + CGRP_SANE_BEHAVIOR,
>  };
>  
>  struct cgroup_name {
> @@ -243,6 +245,37 @@ struct cgroup {
>  
>  /* cgroupfs_root->flags */
>  enum {
> + /*
> +  * Unfortunately, cgroup core and various controllers are riddled
> +  * with idiosyncrasies and pointless options.  The following flag,
> +  * when set, will force sane behavior - some options are forced on,
> +  * others are disallowed, and some controllers will change their
> +  * hierarchical or other behaviors.
> +  *
> +  * The set of behaviors affected by this flag are still being
> +  * determined and developed and the mount option for this flag is
> +  * prefixed with __DEVEL__.  The prefix will be dropped once we
> +  * reach the point where all behaviors are compatible with the
> +  * planned unified hierarchy, which will automatically turn on this
> +  * flag.
> +  *
> +  * The followings are the behaviors currently affected this flag.
> +  *
> +  * - Mount options "noprefix" and "clone_children" are disallowed.
> +  *   Also, cgroupfs file cgroup.clone_children is not created.
> +  *
> +  * - When mounting an existing superblock, mount options should
> +  *   match.
> +  *
> +  * - Remount is disallowed.
> +  *
> +  * The followings are planned changes.
> +  *
> +  * - release_agent will be disallowed once replacement notification
> +  *   mechanism is implemented.
> +  */
> + CGRP_ROOT_SANE_BEHAVIOR = (1 << 0),
> +
>   CGRP_ROOT_NOPREFIX  = (1 << 1), /* mounted subsystems have no named 
> prefix */
>   CGRP_ROOT_XATTR = (1 << 2), /* supports extended attributes */
>  };
> @@ -360,6 +393,7 @@ struct cgroup_map_cb {
>  /* cftype->flags */
>  #define CFTYPE_ONLY_ON_ROOT  (1U << 0)   /* only create on root cg */
>  #define CFTYPE_NOT_ON_ROOT   (1U << 1)   /* don't create on root cg */
> +#define CFTYPE_INSANE(1U << 2)   /* don't create if 
> sane_behavior */
>  
>  #define MAX_CFTYPE_NAME  64
>  
> @@ -486,6 +520,15 @@ struct cgroup_scanner {
>   void *data;
>  };
>  
> +/*
> + * See the comment above CGRP_ROOT_SANE_BEHAVIOR for details.  This
> + * function can be c

Re: [PATCH 4/4] memcg: force use_hierarchy if sane_behavior

2013-04-14 Thread Serge Hallyn

Quoting Tejun Heo (t...@kernel.org):
> Turn on use_hierarchy by default if sane_behavior is specified and
> don't create .use_hierarchy file.
> 
> It is debatable whether to remove .use_hierarchy file or make it ro as
> the former could make transition easier in certain cases; however, the
> behavior changes which will be gated by sane_behavior are intensive
> including changing basic meaning of certain control knobs in a few
> controllers and I don't really think keeping this piece would make
> things easier in any noticeable way, so let's remove it.
> 
> Signed-off-by: Tejun Heo 

Acked-by: Serge E. Hallyn 

> Cc: Michal Hocko 
> Cc: KAMEZAWA Hiroyuki 
> ---
>  include/linux/cgroup.h |  3 +++
>  mm/memcontrol.c| 13 +
>  2 files changed, 16 insertions(+)
> 
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 9c300ad..c562e33 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -269,6 +269,9 @@ enum {
>*
>* - Remount is disallowed.
>*
> +  * - memcg: use_hierarchy is on by default and the cgroup file for
> +  *   the flag is not created.
> +  *
>* The followings are planned changes.
>*
>* - release_agent will be disallowed once replacement notification
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9715c0c..a651131 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5814,6 +5814,7 @@ static struct cftype mem_cgroup_files[] = {
>   },
>   {
>   .name = "use_hierarchy",
> + .flags = CFTYPE_INSANE,
>   .write_u64 = mem_cgroup_hierarchy_write,
>   .read_u64 = mem_cgroup_hierarchy_read,
>   },
> @@ -6784,6 +6785,17 @@ static void mem_cgroup_move_task(struct cgroup *cont,
>  }
>  #endif
>  
> +/*
> + * Cgroup retains root cgroups across [un]mount cycles making it necessary
> + * to verify sane_behavior flag on each mount attempt.
> + */
> +static void mem_cgroup_bind(struct cgroup *root)
> +{
> + /* use_hierarchy is forced with sane_behavior */
> + if (cgroup_sane_behavior(root))
> + mem_cgroup_from_cont(root)->use_hierarchy = true;
> +}
> +
>  struct cgroup_subsys mem_cgroup_subsys = {
>   .name = "memory",
>   .subsys_id = mem_cgroup_subsys_id,
> @@ -6794,6 +6806,7 @@ struct cgroup_subsys mem_cgroup_subsys = {
>   .can_attach = mem_cgroup_can_attach,
>   .cancel_attach = mem_cgroup_cancel_attach,
>   .attach = mem_cgroup_move_task,
> + .bind = mem_cgroup_bind,
>   .base_cftypes = mem_cgroup_files,
>   .early_init = 0,
>   .use_id = 1,
> -- 
> 1.8.1.4
> 
> ___
> Containers mailing list
> contain...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] memcg: force use_hierarchy if sane_behavior

2013-04-14 Thread Serge Hallyn

Quoting Tejun Heo (t...@kernel.org):
> Turn on use_hierarchy by default if sane_behavior is specified and
> don't create .use_hierarchy file.
> 
> It is debatable whether to remove .use_hierarchy file or make it ro as
> the former could make transition easier in certain cases; however, the
> behavior changes which will be gated by sane_behavior are intensive
> including changing basic meaning of certain control knobs in a few
> controllers and I don't really think keeping this piece would make
> things easier in any noticeable way, so let's remove it.

Hi Tejun,

this actually reminds me of something that's been on my todo list to
report for some time, but I haven't had time to find the source of the
bug...  And maybe it's already been reported...  but

If I do

cd /sys/fs/cgroup/memory
mkdir b
cd b
echo 1 > memory.use_hierarchy
echo 5000 > memory.limit_in_bytes
cat memory.limit_in_bytes
8192
mkdir c
cd c
cat memory.use_hierarchy
1
cat memory.limit_in_bytes
9223372036854775807
echo $$ > tasks
bash


So it seems the hierarchy is being enforced, but not reported in
child limit_in_bytes files.

(Last tested tonight on 3.8.0-17-generic #27-Ubuntu fwiw)

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Linux 3.9-rc7

2013-04-14 Thread Linus Torvalds

Another week, another -rc.

This is mostly random one-liners, with a few slightly larger driver
fixes. The most interesting (to me, probably to nobody else) fix is a
fix for a rather subtle TLB invalidate bug that only hits 32-bit PAE
due to the weird way that works. Even then it only hits you if you
have some particularly insane mapping patterns, but we *suspect* that
this one might be the cause behind google chrome having triggered bugs
like

chrome: Corrupted page table at address 34a03000
*pdpt =  *pde = 
Bad pagetable: 000f [#1] PREEMPT SMP

however, the problem is so rare that we haven't been able to verify
that this really fixes it.

That said, this bug is much more common (and by "much more common" I
mean "still basically impossible to hit unless you were really
unlucky") on newer machines that have bigger TLB's and that could
happily have run in 64-bit mode without the disgusting abortion that
is x86 PAE, and a small part of me feels that anybody who hit this
problem on such a machine probably got whatever they deserved.

But if you've seen messages like this, and you still run PAE, give the
new -rc a try.

The rest of the fixes are probably more relevant to most people, but
hey, the PAE oen tickles my fancy. Anyway, go out and test regardless
of the PAE issue,

Linus

---

Al Viro (3):
  ecryptfs: close rmmod race
  procfs: add proc_remove_subtree()
  palinfo fixes

Alban Bedel (1):
  ASoC: wm8903: Fix the bypass to HP/LINEOUT when no DAC or ADC is running

Alex Williamson (1):
  vfio-pci: Fix possible integer overflow

Alexandre Belloni (1):
  gpio: pca953x: fix irq_domain_add_simple usage

Alexey Khoroshilov (1):
  tty: mxser: fix cycle termination condition in mxser_probe() and
mxser_module_init()

Alexey Pelykh (1):
  OMAP/serial: Revert bad fix of Rx FIFO threshold granularity

Andrea Arcangeli (2):
  x86/mm/cpa: Convert noop to functional fix
  x86/mm/cpa/selftest: Fix false positive in CPA self test

Andrey Vagin (1):
  mnt: release locks on error path in do_loopback

Arnd Bergmann (1):
  block: avoid using uninitialized value in from queue_var_store

Artem Savkov (1):
  cfg80211: sched_scan_mtx lock in cfg80211_conn_work()

Arun Easi (1):
  [SCSI] qla2xxx: Fix crash during firmware dump procedure.

Asai Thambi S P (3):
  mtip32xx: recovery from command timeout
  mtip32xx: return 0 from pci probe in case of rebuild
  mtip32xx: Add debugfs entry device_status

Asias He (7):
  tcm_vhost: Use ACCESS_ONCE for vs->vs_tpg[target] access
  tcm_vhost: Use vq->private_data to indicate if the endpoint is setup
  tcm_vhost: Initialize vq->last_used_idx when set endpoint
  tcm_vhost: Remove double check of response
  tcm_vhost: Fix tv_cmd leak in vhost_scsi_handle_vq
  tcm_vhost: Add vhost_scsi_send_bad_target() helper
  tcm_vhost: Send bad target to guest when cmd fails

Bing Zhao (1):
  mwifiex: complete last internal scan

Boris Ostrovsky (2):
  x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal
  x86/mm: Flush lazy MMU when DEBUG_PAGEALLOC is set

Brian King (1):
  [SCSI] ibmvscsi: Fix slave_configure deadlock

Calvin Owens (1):
  drm/nouveau: fix unconditional return waiting on memory

Charles Keepax (1):
  ASoC: compress: Cancel delayed power down if needed

Chen Gang (3):
  perf: Fix strncpy() use, always make sure it's NUL terminated
  perf: Fix strncpy() use, use strlcpy() instead of strncpy()
  ftrace: Fix strncpy() use, use strlcpy() instead of strncpy()

Chris Metcalf (1):
  tile: comment assumption about __insn_mtspr for 

Christian Ruppert (1):
  ARC: Add implicit compiler barrier to raw_local_irq* functions

Christoph Paasch (1):
  ipv6/tcp: Stop processing ICMPv6 redirect messages

Christopher Harvey (1):
  drm/mgag200: Index 24 in extended CRTC registers is 24 in hex,
not decimal.

Chuck Lever (1):
  SUNRPC: Remove extra xprt_put()

Daniel Vetter (1):
  drm/fb-helper: Fix locking in drm_fb_helper_hotplug_event

Dave Airlie (1):
  udl: handle EDID failure properly.

Dave Hansen (1):
  x86-32: Fix possible incomplete TLB invalidate with PAE pagetables

David Woodhouse (1):
  libata: fix DMA to stack in reading devslp_timing parameters

Dirk Behme (1):
  ARM i.MX6: Fix ldb_di clock selection

Dirk Brandewie (1):
  cpufreq / intel_pstate: Set timer timeout correctly

Dmitry Tarnyagin (1):
  remoteproc/ste: fix memory leak on shutdown

Eldad Zack (1):
  ALSA: usb-audio: fix endianness bug in snd_nativeinstruments_*

Eric Dumazet (1):
  selinux: add a skb_owned_by() hook

Franky Lin (1):
  brcmfmac: do not proceed if fail to download nvram to dongle

Gabor Juhos (1):
  rt2x00: rt2x00pci: fix build error on Ralink RT3x5x SoCs

Greg Ungerer (1):
  m68k: define a local gpio_request_one() function

Re: [PATCHv3 1/3] thermal: introduce thermal_zone_get_zone_by_name helper function

2013-04-14 Thread Zhang Rui

On Fri, 2013-04-05 at 08:32 -0400, Eduardo Valentin wrote:
> This patch adds a helper function to get a reference of
> a thermal zone, based on the zone type name.
> 
> It will perform a zone name lookup and return a reference
> to a thermal zone device that matches the name requested.
> In case the zone is not found or when several zones match
> same name or if the required parameters are invalid, it will return
> the corresponding error code (ERR_PTR).
> 
> Cc: Durgadoss R 
> Signed-off-by: Eduardo Valentin 

refreshed the patch to modify drivers/thermal/thermal_core.c instead of
drivers/thermal/thermal_sys.c and applied to thermal -next.

thanks,
rui
> ---
>  drivers/thermal/thermal_sys.c |   38 ++
>  include/linux/thermal.h   |1 +
>  2 files changed, 39 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/thermal/thermal_sys.c b/drivers/thermal/thermal_sys.c
> index 5bd95d4..e9b636b 100644
> --- a/drivers/thermal/thermal_sys.c
> +++ b/drivers/thermal/thermal_sys.c
> @@ -1790,6 +1790,44 @@ void thermal_zone_device_unregister(struct 
> thermal_zone_device *tz)
>  }
>  EXPORT_SYMBOL_GPL(thermal_zone_device_unregister);
>  
> +/**
> + * thermal_zone_get_zone_by_name() - search for a zone and returns its ref
> + * @name: thermal zone name to fetch the temperature
> + *
> + * When only one zone is found with the passed name, returns a reference to 
> it.
> + *
> + * Return: On success returns a reference to an unique thermal zone with
> + * matching name equals to @name, an ERR_PTR otherwise (-EINVAL for invalid
> + * paramenters, -ENODEV for not found and -EEXIST for multiple matches).
> + */
> +struct thermal_zone_device *thermal_zone_get_zone_by_name(const char *name)
> +{
> + struct thermal_zone_device *pos = NULL, *ref = ERR_PTR(-EINVAL);
> + unsigned int found = 0;
> +
> + if (!name)
> + goto exit;
> +
> + mutex_lock(&thermal_list_lock);
> + list_for_each_entry(pos, &thermal_tz_list, node)
> + if (!strnicmp(name, pos->type, THERMAL_NAME_LENGTH)) {
> + found++;
> + ref = pos;
> + }
> + mutex_unlock(&thermal_list_lock);
> +
> + /* nothing has been found, thus an error code for it */
> + if (found == 0)
> + ref = ERR_PTR(-ENODEV);
> + else if (found > 1)
> + /* Success only when an unique zone is found */
> + ref = ERR_PTR(-EEXIST);
> +
> +exit:
> + return ref;
> +}
> +EXPORT_SYMBOL_GPL(thermal_zone_get_zone_by_name);
> +
>  #ifdef CONFIG_NET
>  static struct genl_family thermal_event_genl_family = {
>   .id = GENL_ID_GENERATE,
> diff --git a/include/linux/thermal.h b/include/linux/thermal.h
> index 542a39c..0cf9eb5 100644
> --- a/include/linux/thermal.h
> +++ b/include/linux/thermal.h
> @@ -237,6 +237,7 @@ void thermal_zone_device_update(struct 
> thermal_zone_device *);
>  struct thermal_cooling_device *thermal_cooling_device_register(char *, void 
> *,
>   const struct thermal_cooling_device_ops *);
>  void thermal_cooling_device_unregister(struct thermal_cooling_device *);
> +struct thermal_zone_device *thermal_zone_get_zone_by_name(const char *name);
>  
>  int thermal_zone_trend_get(struct thermal_zone_device *, int);
>  struct thermal_instance *thermal_instance_get(struct thermal_zone_device *,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv3 2/3] thermal: expose thermal_zone_get_temp API

2013-04-14 Thread Zhang Rui

On Fri, 2013-04-05 at 08:32 -0400, Eduardo Valentin wrote:
> This patch exports the thermal_zone_get_temp API so that driver
> writers can fetch temperature of thermal zones managed by other
> drivers.
> 
> Acked-by: Durgadoss R 
> Signed-off-by: Eduardo Valentin 

refreshed the patch to modify drivers/thermal/thermal_core.c instead of
drivers/thermal/thermal_sys.c and applied to thermal -next.

thanks,
rui

> ---
>  drivers/thermal/thermal_sys.c |   20 +---
>  include/linux/thermal.h   |1 +
>  2 files changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/thermal/thermal_sys.c b/drivers/thermal/thermal_sys.c
> index e9b636b..83bfa0d 100644
> --- a/drivers/thermal/thermal_sys.c
> +++ b/drivers/thermal/thermal_sys.c
> @@ -371,16 +371,28 @@ static void handle_thermal_trip(struct 
> thermal_zone_device *tz, int trip)
>   monitor_thermal_zone(tz);
>  }
>  
> -static int thermal_zone_get_temp(struct thermal_zone_device *tz,
> - unsigned long *temp)
> +/**
> + * thermal_zone_get_temp() - returns its the temperature of thermal zone
> + * @tz: a valid pointer to a struct thermal_zone_device
> + * @temp: a valid pointer to where to store the resulting temperature.
> + *
> + * When a valid thermal zone reference is passed, it will fetch its
> + * temperature and fill @temp.
> + *
> + * Return: On success returns 0, an error code otherwise
> + */
> +int thermal_zone_get_temp(struct thermal_zone_device *tz, unsigned long 
> *temp)
>  {
> - int ret = 0;
> + int ret = -EINVAL;
>  #ifdef CONFIG_THERMAL_EMULATION
>   int count;
>   unsigned long crit_temp = -1UL;
>   enum thermal_trip_type type;
>  #endif
>  
> + if (IS_ERR_OR_NULL(tz))
> + goto exit;
> +
>   mutex_lock(&tz->lock);
>  
>   ret = tz->ops->get_temp(tz, temp);
> @@ -404,8 +416,10 @@ static int thermal_zone_get_temp(struct 
> thermal_zone_device *tz,
>  skip_emul:
>  #endif
>   mutex_unlock(&tz->lock);
> +exit:
>   return ret;
>  }
> +EXPORT_SYMBOL_GPL(thermal_zone_get_temp);
>  
>  static void update_temperature(struct thermal_zone_device *tz)
>  {
> diff --git a/include/linux/thermal.h b/include/linux/thermal.h
> index 0cf9eb5..8eea86c 100644
> --- a/include/linux/thermal.h
> +++ b/include/linux/thermal.h
> @@ -238,6 +238,7 @@ struct thermal_cooling_device 
> *thermal_cooling_device_register(char *, void *,
>   const struct thermal_cooling_device_ops *);
>  void thermal_cooling_device_unregister(struct thermal_cooling_device *);
>  struct thermal_zone_device *thermal_zone_get_zone_by_name(const char *name);
> +int thermal_zone_get_temp(struct thermal_zone_device *tz, unsigned long 
> *temp);
>  
>  int thermal_zone_trend_get(struct thermal_zone_device *, int);
>  struct thermal_instance *thermal_instance_get(struct thermal_zone_device *,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv3 3/3] staging: ti-soc-thermal: remove external heat while extrapolating hotspot

2013-04-14 Thread Zhang Rui

On Fri, 2013-04-05 at 08:32 -0400, Eduardo Valentin wrote:
> For boards that provide a PCB sensor close to SoC junction
> temperature, it is possible to remove the cumulative heat
> reported by the SoC temperature sensor.
> 
> This patch changes the extrapolation computation to consider
> an external sensor in the extrapolation equations.
> 
> Signed-off-by: Eduardo Valentin 

hmm, who should take this patch?

thanks,
rui
> ---
>  drivers/staging/ti-soc-thermal/ti-thermal-common.c |   30 +--
>  1 files changed, 20 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/staging/ti-soc-thermal/ti-thermal-common.c 
> b/drivers/staging/ti-soc-thermal/ti-thermal-common.c
> index 231c549..780368b 100644
> --- a/drivers/staging/ti-soc-thermal/ti-thermal-common.c
> +++ b/drivers/staging/ti-soc-thermal/ti-thermal-common.c
> @@ -38,6 +38,7 @@
>  /* common data structures */
>  struct ti_thermal_data {
>   struct thermal_zone_device *ti_thermal;
> + struct thermal_zone_device *pcb_tz;
>   struct thermal_cooling_device *cool_dev;
>   struct ti_bandgap *bgp;
>   enum thermal_device_mode mode;
> @@ -77,10 +78,12 @@ static inline int ti_thermal_hotspot_temperature(int t, 
> int s, int c)
>  static inline int ti_thermal_get_temp(struct thermal_zone_device *thermal,
> unsigned long *temp)
>  {
> + struct thermal_zone_device *pcb_tz = NULL;
>   struct ti_thermal_data *data = thermal->devdata;
>   struct ti_bandgap *bgp;
>   const struct ti_temp_sensor *s;
> - int ret, tmp, pcb_temp, slope, constant;
> + int ret, tmp, slope, constant;
> + unsigned long pcb_temp;
>  
>   if (!data)
>   return 0;
> @@ -92,16 +95,22 @@ static inline int ti_thermal_get_temp(struct 
> thermal_zone_device *thermal,
>   if (ret)
>   return ret;
>  
> - pcb_temp = 0;
> - /* TODO: Introduce pcb temperature lookup */
> + /* Default constants */
> + slope = s->slope;
> + constant = s->constant;
> +
> + pcb_tz = data->pcb_tz;
>   /* In case pcb zone is available, use the extrapolation rule with it */
> - if (pcb_temp) {
> - tmp -= pcb_temp;
> - slope = s->slope_pcb;
> - constant = s->constant_pcb;
> - } else {
> - slope = s->slope;
> - constant = s->constant;
> + if (!IS_ERR_OR_NULL(pcb_tz)) {
> + ret = thermal_zone_get_temp(pcb_tz, &pcb_temp);
> + if (!ret) {
> + tmp -= pcb_temp; /* got a valid PCB temp */
> + slope = s->slope_pcb;
> + constant = s->constant_pcb;
> + } else {
> + dev_err(bgp->dev,
> + "Failed to read PCB state. Using defaults\n");
> + }
>   }
>   *temp = ti_thermal_hotspot_temperature(tmp, slope, constant);
>  
> @@ -248,6 +257,7 @@ static struct ti_thermal_data
>   data->sensor_id = id;
>   data->bgp = bgp;
>   data->mode = THERMAL_DEVICE_ENABLED;
> + data->pcb_tz = thermal_zone_get_zone_by_name("pcb");
>   INIT_WORK(&data->thermal_wq, ti_thermal_work);
>  
>   return data;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] backlight: platform_lcd: introduce probe callback

2013-04-14 Thread Jingoo Han

On Friday, April 12, 2013 5:25 AM, Andrew Bresticker wrote:
> 
> Platform LCD devices may need to do some device-specific
> initialization before they can be used (regulator or GPIO setup,
> for example), but currently the driver does not support any way of
> doing this.  This patch adds a probe() callback to plat_lcd_data
> which platform LCD devices can set to indicate that device-specific
> initialization is needed.
> 
> Signed-off-by: Andrew Bresticker 

CC'ed Andrew Morton,

It looks good.
Acked-by: Jingoo Han 

Best regards,
Jingoo Han


> ---
>  drivers/video/backlight/platform_lcd.c | 6 ++
>  include/video/platform_lcd.h   | 1 +
>  2 files changed, 7 insertions(+)
> 
> diff --git a/drivers/video/backlight/platform_lcd.c 
> b/drivers/video/backlight/platform_lcd.c
> index 17a6b83..f46180e 100644
> --- a/drivers/video/backlight/platform_lcd.c
> +++ b/drivers/video/backlight/platform_lcd.c
> @@ -86,6 +86,12 @@ static int platform_lcd_probe(struct platform_device *pdev)
>   return -EINVAL;
>   }
> 
> + if (pdata->probe) {
> + err = pdata->probe(pdata);
> + if (err)
> + return err;
> + }
> +
>   plcd = devm_kzalloc(&pdev->dev, sizeof(struct platform_lcd),
>   GFP_KERNEL);
>   if (!plcd) {
> diff --git a/include/video/platform_lcd.h b/include/video/platform_lcd.h
> index ad3bdfe..23864b2 100644
> --- a/include/video/platform_lcd.h
> +++ b/include/video/platform_lcd.h
> @@ -15,6 +15,7 @@ struct plat_lcd_data;
>  struct fb_info;
> 
>  struct plat_lcd_data {
> + int (*probe)(struct plat_lcd_data *);
>   void(*set_power)(struct plat_lcd_data *, unsigned int power);
>   int (*match_fb)(struct plat_lcd_data *, struct fb_info *);
>  };
> --
> 1.8.1.3
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] hfs/hfsplus: Convert dprint to hfs_dbg

2013-04-14 Thread Joe Perches

On Mon, 2013-04-15 at 01:53 +0100, Hin-Tak Leung wrote:
> --- On Mon, 8/4/13, Joe Perches  wrote:
> > Use a more current logging style.
[]
> I have been sitting on a patch which changes this part of the code to dynamic 
> debugging, and it is much simplier. Just:
> #define dprint(flg, fmt, args...) \
> -   if (flg & DBG_MASK) \
> -   printk(fmt , ## args)
> +   pr_debug(fmt , ## args)

This change wouldn't work well as it would make a mess
of output that uses no prefix (ie: emits at KERN_DEFAULT)
with output that uses KERN_DEBUG

That's the reason for _dbg and _dbg_cont.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH mainline] btrfs: fix minor typo in comment

2013-04-14 Thread Nathaniel Yazdani

In the comment describing the sync_writers field of the btrfs_inode
struct, "fsyncing" was misspelled "fsycing."

Signed-off-by: Nathaniel Yazdani 
---
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index d9b97d4..08b286b 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -93,7 +93,7 @@ struct btrfs_inode {

  unsigned long runtime_flags;

- /* Keep track of who's O_SYNC/fsycing currently */
+ /* Keep track of who's O_SYNC/fsyncing currently */
  atomic_t sync_writers;

  /* full 64 bit generation number, struct vfs_inode doesn't have a big
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] process cputimer is moving faster than its corresponding clock

2013-04-14 Thread Olivier Langlois

On Fri, 2013-04-12 at 11:16 +0200, Peter Zijlstra wrote:
> On Wed, 2013-04-10 at 11:48 -0400, Olivier Langlois wrote:
> > Please explain how expensive it is. All I am seeing is a couple of
> > additions.
> 
> Let me start with this, since your earlier argument also refers to
> this.
> 
> So yes it does look simple and straight fwd, only one addition. However
> its an atomic operation across all threads of the same process. Imagine
> a single process with 512 threads, all running on a separate cpu.
> 
Peter,

It now makes perfect sense. Thank you for your explanation. It is
showing me an aspect that I did overlook.

Greetings,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] hfs/hfsplus: Convert dprint to hfs_dbg

2013-04-14 Thread Hin-Tak Leung

--- On Mon, 15/4/13, Joe Perches  wrote:

> On Mon, 2013-04-15 at 01:53 +0100,
> Hin-Tak Leung wrote:
> > --- On Mon, 8/4/13, Joe Perches 
> wrote:
> > > Use a more current logging style.
> []
> > I have been sitting on a patch which changes this part
> of the code to dynamic debugging, and it is much simplier.
> Just:
> > #define dprint(flg, fmt, args...) \
> > -       if (flg &
> DBG_MASK) \
> > -           
>    printk(fmt , ## args)
> > +           
>    pr_debug(fmt , ## args)
> 
> This change wouldn't work well as it would make a mess
> of output that uses no prefix (ie: emits at KERN_DEFAULT)
> with output that uses KERN_DEBUG
> 
> That's the reason for _dbg and _dbg_cont.

Hmm, I don't get it. Is there any *existing* use of dprint in the hfplus code 
which is affected by your comment? Or is this another general stylistic 
comment? i.e. "this does not work in general"?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 1/4] x86, kdump: Set crashkernel_low automatically

2013-04-14 Thread H. Peter Anvin

On 04/11/2013 11:54 PM, Yinghai Lu wrote:

> + /*
> +  * two parts from lib/swiotlb.c:
> +  *  swiotlb size: user specified with swiotlb= or default.
> +  *  swiotlb overflow buffer: now is hardcoded to 32k,
> +  *  round to 8M to cover more others.
> +  */

This comment is incomprehensible.  "Cover more others"?


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 6/7] rcu: Drive quiescent-state-forcing delay from HZ

2013-04-14 Thread Paul Mackerras

On Fri, Apr 12, 2013 at 11:38:04PM -0700, Paul E. McKenney wrote:
> On Fri, Apr 12, 2013 at 04:54:02PM -0700, Josh Triplett wrote:
> > On Fri, Apr 12, 2013 at 04:19:13PM -0700, Paul E. McKenney wrote:
> > > From: "Paul E. McKenney" 
> > > 
> > > Systems with HZ=100 can have slow bootup times due to the default
> > > three-jiffy delays between quiescent-state forcing attempts.  This
> > > commit therefore auto-tunes the RCU_JIFFIES_TILL_FORCE_QS value based
> > > on the value of HZ.  However, this would break very large systems that
> > > require more time between quiescent-state forcing attempts.  This
> > > commit therefore also ups the default delay by one jiffy for each
> > > 256 CPUs that might be on the system (based off of nr_cpu_ids at
> > > runtime, -not- NR_CPUS at build time).
> > > 
> > > Reported-by: Paul Mackerras 
> > > Signed-off-by: Paul E. McKenney 
> > 
> > Something seems very wrong if RCU regularly hits the fqs code during
> > boot; feels like there's some more straightforward solution we're
> > missing.  What causes these CPUs to fall under RCU's scrutiny during
> > boot yet not actually hit the RCU codepaths naturally?
> 
> The problem is that they are running HZ=100, so that RCU will often
> take 30-60 milliseconds per grace period.  At that point, you only
> need 16-30 grace periods to chew up a full second, so it is not all
> that hard to eat up the additional 8-12 seconds of boot time that
> they were seeing.  IIRC, UP boot was costing them 4 seconds.

I added some instrumentation, which counted 202 calls to
synchronize_sched() during boot (Fedora 17 minimal install +
development tools) with a 3.8.0 kernel on a 4-cpu KVM virtual machine
on a POWER7.  Without this patch, those 202 calls take up a total of
4.32 seconds; with it, they take up 3.6 seconds.  The kernel is
compiled with HZ=100 and NR_CPUS=1024, like the standard Fedora
kernel.

I suspect a lot of the calls are in udevd and related processes.
Interestingly there were no calls to synchronize_rcu_bh or
synchronize_sched_expedited.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] hfs/hfsplus: Convert dprint to hfs_dbg

2013-04-14 Thread Joe Perches

On Mon, 2013-04-15 at 02:56 +0100, Hin-Tak Leung wrote:
> --- On Mon, 15/4/13, Joe Perches  wrote:
> > On Mon, 2013-04-15 at 01:53 +0100,
> > Hin-Tak Leung wrote:
> > > --- On Mon, 8/4/13, Joe Perches  wrote:
> > > > Use a more current logging style.
> > []
> > > I have been sitting on a patch which changes this part
> > of the code to dynamic debugging, and it is much simplier.
[]
> > This change wouldn't work well as it would make a mess
> > of output that uses no prefix (ie: emits at KERN_DEFAULT)
> > with output that uses KERN_DEBUG
> > 
> > That's the reason for _dbg and _dbg_cont.
> 
> Hmm, I don't get it. Is there any *existing* use of dprint
> in the hfplus code which is affected by your comment?

Code like this prints out currently on a single line at
KERN_DEFAULT.

@@ -138,16 +138,16 @@ void hfs_bnode_dump(struct hfs_bnode *node)
[]
for (i = be16_to_cpu(desc.num_recs); i >= 0; off -= 2, i--) {
key_off = hfs_bnode_read_u16(node, off);
-   dprint(DBG_BNODE_MOD, " %d", key_off);
+   hfs_dbg_cont(BNODE_MOD, " %d", key_off);

By converting this dprint() to pr_debug(), it would
print out on a multiple lines, one for each read.

That's why it should use a mechanism like dbg_cont.

btw: there is no current pr_debug_cont mechanism.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] watchdog: introduce new watchdog AUTOSTART option

2013-04-14 Thread Guenter Roeck

On Mon, Apr 15, 2013 at 12:43:31AM +, Kim, Milo wrote:
> Hi Guenter
> 
> > I really don't like that idea. It defeats a significant part of the
> > purpose
> > for having a watchdog, which is to prevent user-space hangups.
> > 
> > To make this a driver option is even more odd - it forces every user of
> > this
> > driver to use it in-kernel only, and makes /dev/watchdog quite useless.
> > 
> > I mean, really, if you have such a watchdog, what is the point of using
> > the
> > watchdog infrastructure in the first place ? Just make it a kernel
> > thread or
> > timer-activated platform code which pings your watchdog once in a while.
> > No
> > need to get the watchdog infrastructure involved in the first place.
> > 
> > Am I missing something ?
> 
> I wanted to enable the watchdog timer without the watchdog application for
> making sure the system alive.
> However, I think I misunderstood the purpose of the watchdog driver.
> The watchdog is for detecting user-space hangups rather than kernel stall.
> Is it correct? If yes, this patch is totally wrong.
> 
Correct. After all, if the kernel stalls, user space will stall as well, so by
covering user space it covers both. Covering kernel alone doesn't help much,
since most of the stalls (at least in my experience) happen in user space.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3] tracepoints: prevents null probe from being added

2013-04-14 Thread kpark3469

From: Sahara 

Somehow tracepoint_entry_add_probe function allows a null probe function.
And, this may lead to unexpected result since the number of probe
functions in an entry can be counted by checking whether probe is null
or not in for-loop.
This patch prevents the null probe from being added.
In tracepoint_entry_remove_probe function, checking probe parameter
within for-loop is moved out for code efficiency leaving the null probe
feature which removes all probe functions in the entry.

Signed-off-by: Sahara 
Reviewed-by: Steven Rostedt 
Reviewed-by: Mathieu Desnoyers 
---
 kernel/tracepoint.c |   21 +
 1 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 0c05a45..29f2654 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -112,7 +112,8 @@ tracepoint_entry_add_probe(struct tracepoint_entry *entry,
int nr_probes = 0;
struct tracepoint_func *old, *new;
 
-   WARN_ON(!probe);
+   if (WARN_ON(!probe))
+   return ERR_PTR(-EINVAL);
 
debug_print_probes(entry);
old = entry->funcs;
@@ -152,13 +153,18 @@ tracepoint_entry_remove_probe(struct tracepoint_entry 
*entry,
 
debug_print_probes(entry);
/* (N -> M), (N > 1, M >= 0) probes */
-   for (nr_probes = 0; old[nr_probes].func; nr_probes++) {
-   if (!probe ||
-   (old[nr_probes].func == probe &&
-old[nr_probes].data == data))
-   nr_del++;
+   if (probe) {
+   for (nr_probes = 0; old[nr_probes].func; nr_probes++) {
+   if (old[nr_probes].func == probe &&
+old[nr_probes].data == data)
+   nr_del++;
+   }
}
 
+   /*
+* If probe is NULL, then nr_probes = nr_del = 0, and then the
+* entire entry will be removed.
+*/
if (nr_probes - nr_del == 0) {
/* N -> 0, (N > 1) */
entry->funcs = NULL;
@@ -173,8 +179,7 @@ tracepoint_entry_remove_probe(struct tracepoint_entry 
*entry,
if (new == NULL)
return ERR_PTR(-ENOMEM);
for (i = 0; old[i].func; i++)
-   if (probe &&
-   (old[i].func != probe || old[i].data != data))
+   if (old[i].func != probe || old[i].data != data)
new[j++] = old[i];
new[nr_probes - nr_del].func = NULL;
entry->refcount = nr_probes - nr_del;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 1/2] watchdog: introduce new watchdog AUTOSTART option

2013-04-14 Thread Kim, Milo

> -Original Message-
> From: Guenter Roeck [mailto:li...@roeck-us.net]
> Sent: Monday, April 15, 2013 11:07 AM
> To: Kim, Milo
> Cc: w...@iguana.be; linux-watch...@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCH 1/2] watchdog: introduce new watchdog AUTOSTART
> option
> 
> On Mon, Apr 15, 2013 at 12:43:31AM +, Kim, Milo wrote:
> > Hi Guenter
> >
> > > I really don't like that idea. It defeats a significant part of the
> > > purpose
> > > for having a watchdog, which is to prevent user-space hangups.
> > >
> > > To make this a driver option is even more odd - it forces every
> user of
> > > this
> > > driver to use it in-kernel only, and makes /dev/watchdog quite
> useless.
> > >
> > > I mean, really, if you have such a watchdog, what is the point of
> using
> > > the
> > > watchdog infrastructure in the first place ? Just make it a kernel
> > > thread or
> > > timer-activated platform code which pings your watchdog once in a
> while.
> > > No
> > > need to get the watchdog infrastructure involved in the first place.
> > >
> > > Am I missing something ?
> >
> > I wanted to enable the watchdog timer without the watchdog
> application for
> > making sure the system alive.
> > However, I think I misunderstood the purpose of the watchdog driver.
> > The watchdog is for detecting user-space hangups rather than kernel
> stall.
> > Is it correct? If yes, this patch is totally wrong.
> >
> Correct. After all, if the kernel stalls, user space will stall as well,
> so by
> covering user space it covers both. Covering kernel alone doesn't help
> much,
> since most of the stalls (at least in my experience) happen in user
> space.

Got it. I nearly spoiled it due to my misunderstanding ;)
Many thanks!

Milo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Bug fix PATCH] resource: Reusing a resource structure allocated by bootmem

2013-04-14 Thread Yasuaki Ishimatsu

When hot removing memory presented at boot time, following messages are shown:

[  296.867031] [ cut here ]
[  296.922273] kernel BUG at mm/slub.c:3409!
[  296.970229] invalid opcode:  [#1] SMP
[  297.019453] Modules linked in: ebtable_nat ebtables xt_CHECKSUM 
iptable_mangle bridge stp llc ipmi_devintf ipmi_msghandler sunrpc ipt_REJECT 
nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT 
nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter 
ip6_tables binfmt_misc vfat fat dm_mirror dm_region_hash dm_log dm_mod 
vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp 
kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode pcspkr sg i2c_i801 
lpc_ich mfd_core igb i2c_algo_bit i2c_core e1000e ptp pps_core tpm_infineon 
ioatdma dca sr_mod cdrom sd_mod crc_t10dif usb_storage megaraid_sas lpfc 
scsi_transport_fc scsi_tgt scsi_mod
[  297.747808] CPU 0
[  297.769764] Pid: 5091, comm: kworker/0:2 Tainted: GW3.9.0-rc6+ 
#15
[  297.897917] RIP: 0010:[]  [] 
kfree+0x232/0x240
[  297.988634] RSP: 0018:88084678d968  EFLAGS: 00010246
[  298.052196] RAX: 00600400 RBX: 8987fea0 RCX: 
[  298.137595] RDX: 8107a5ae RSI: 0001 RDI: 8987fea0
[  298.222994] RBP: 88084678d998 R08: 8200 R09: 0001
[  298.308390] R10:  R11:  R12: 0300
[  298.393792] R13: ea061fc0 R14: 0303 R15: 0080
[  298.479190] FS:  () GS:88085aa0() 
knlGS:
[  298.576030] CS:  0010 DS:  ES:  CR0: 80050033
[  298.644791] CR2: 025d3f78 CR3: 01c0c000 CR4: 001407f0
[  298.730192] DR0:  DR1:  DR2: 
[  298.815590] DR3:  DR6: 0ff0 DR7: 0400
[  298.900997] Process kworker/0:2 (pid: 5091, threadinfo 88084678c000, 
task 88083928ca80)
[  299.005121] Stack:
[  299.029156]  0303 8987fea0 0300 
8987fe90
[  299.118116]  0303 0080 88084678d9c8 
8107a5d4
[  299.207084]  3000 8987fffb2680 0080 
3000
[  299.296045] Call Trace:
[  299.325288]  [] __release_region+0xd4/0xe0
[  299.393020]  [] __remove_pages+0x52/0x110
[  299.459707]  [] arch_remove_memory+0x89/0xd0
[  299.529505]  [] remove_memory+0xc4/0x100
[  299.595145]  [] acpi_memory_device_remove+0x6d/0xb1
[  299.672230]  [] acpi_device_remove+0x89/0xab
[  299.742033]  [] __device_release_driver+0x7c/0xf0
[  299.817048]  [] device_release_driver+0x2f/0x50
[  299.889972]  [] acpi_bus_device_detach+0x6c/0x70
[  299.963938]  [] acpi_ns_walk_namespace+0x11a/0x250
[  300.039982]  [] ? power_state_show+0x36/0x36
[  300.109800]  [] ? power_state_show+0x36/0x36
[  300.179612]  [] acpi_walk_namespace+0xee/0x137
[  300.251492]  [] acpi_bus_trim+0x33/0x7a
[  300.316089]  [] ? mutex_lock_nested+0x4a/0x60
[  300.386927]  [] acpi_bus_hot_remove_device+0xc4/0x1a1
[  300.466096]  [] acpi_os_execute_deferred+0x27/0x34
[  300.542137]  [] process_one_work+0x1f7/0x590
[  300.611940]  [] ? process_one_work+0x185/0x590
[  300.683823]  [] worker_thread+0x11a/0x370
[  300.750502]  [] ? manage_workers+0x180/0x180
[  300.820308]  [] kthread+0xee/0x100
[  300.879714]  [] ? __lock_release+0x12b/0x190
[  300.949512]  [] ? __init_kthread_worker+0x70/0x70
[  301.024517]  [] ret_from_fork+0x7c/0xb0
[  301.089135]  [] ? __init_kthread_worker+0x70/0x70
[  301.164138] Code: 89 ef e8 c2 2c fb ff e9 0b ff ff ff 4d 8b 6d 30 e9 5c fe 
ff ff 4c 89 f1 48 89 da 4c 89 ee 4c 89 e7 e8 03 f9 ff ff e9 ec fe ff ff <0f> 0b 
eb fe 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec
[  301.397214] RIP  [] kfree+0x232/0x240
[  301.459855]  RSP 
[  301.501675] ---[ end trace 8679967aa8606ed8 ]---

The reason why the messages are shown is to release a resource structure,
allocated by bootmem, by kfree(). So when we release a resource structure,
we should check whether it is allocated by bootmem or not.

But even if we know a resource structure is allocated by bootmem, we cannot
release it since SLxB cannot treat it. So for reusing a resource structure,
this patch remembers it by using bootmem_resource as follows:

When releasing a resource structure by free_resource(), free_resource() checks
whether the resource structure is allocated by bootmem or not. If it is
allocated by bootmem, free_resource() adds it to bootmem_resource. If it is
not allocated by bootmem, free_resource() release it by kfree().

And when getting a new resource structure by get_resource(), get_resource()
checks whether bootmem_resource has released resource structures or not. If
there is a released resource structure, get_resource() returns it. If there is
not a releaed resource structure, get_resource() returns new resource structure
allocated by kzalloc

[ 00/11] 3.0.74-stable review

2013-04-14 Thread Greg Kroah-Hartman

This is the start of the stable review cycle for the 3.0.74 release.
There are 11 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed Apr 17 02:05:34 UTC 2013.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.0.74-rc1.gz
and the diffstat can be found below.

thanks,

greg k-h

-
Pseudo-Shortlog of commits:

Greg Kroah-Hartman 
Linux 3.0.74-rc1

Hayes Wang 
r8169: fix auto speed down issue

Linus Torvalds 
mtdchar: fix offset overflow detection

Boris Ostrovsky 
x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal

Samu Kallio 
x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates

Thomas Gleixner 
sched_clock: Prevent 64bit inatomicity on 32bit systems

Nicholas Bellinger 
target: Fix incorrect fallthrough of ALUA Standby/Offline/Transition CDBs

Huacai Chen 
PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()

Namhyung Kim 
tracing: Fix double free when function profile init failed

Alban Bedel 
ASoC: wm8903: Fix the bypass to HP/LINEOUT when no DAC or ADC is running

Dave Hansen 
x86-32, mm: Rip out x86_32 NUMA remapping code

Eldad Zack 
ALSA: usb-audio: fix endianness bug in snd_nativeinstruments_*


-

Diffstat:

 Makefile  |   4 +-
 arch/x86/include/asm/paravirt.h   |   5 +-
 arch/x86/include/asm/paravirt_types.h |   2 +
 arch/x86/kernel/paravirt.c|  25 +++---
 arch/x86/lguest/boot.c|   1 +
 arch/x86/mm/fault.c   |   6 +-
 arch/x86/mm/numa_32.c | 161 --
 arch/x86/xen/mmu.c|   1 +
 drivers/mtd/mtdchar.c |  48 --
 drivers/net/r8169.c   |  30 ++-
 drivers/target/target_core_alua.c |   3 +
 kernel/sched_clock.c  |  26 ++
 kernel/sys.c  |   3 +-
 kernel/trace/ftrace.c |   1 -
 sound/soc/codecs/wm8903.c |   2 +
 sound/usb/mixer_quirks.c  |   4 +-
 sound/usb/quirks.c|   2 +-
 17 files changed, 131 insertions(+), 193 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 >

1 - 100 of 228 matches

Mail list logo