from:"Viresh Kumar"

Re: [PATCH V1 1/3] mfd: stmpe: Arrange #include in alphabetical order

2012-11-26 Thread Viresh Kumar

On 26 November 2012 18:55, Lee Jones  wrote:

> Why do you need to sort them? Is there _really_ a need?

Many people & maintainers like to have their header files ordered. The
reason for that is:

If they are not ordered, there is a possibility of adding an header file
multiple times in a file. This might not do something serious
when thinking about compilation time or Image size, but adding an header
file multiple times is simply wrong.

This can't be caught in patch reviews most of the time, as diff may not show
the earlier inclusion.

That's why i ordered them. And i don't see any harm in doing so.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 0/4] Create sched_select_cpu() and use it for workqueues and timers

2012-11-26 Thread Viresh Kumar

On 6 November 2012 16:08, Viresh Kumar  wrote:
> This is V2 Resend of my sched_select_cpu() work. Resend because didn't got 
> much
> attention on V2. Including more guys now in cc :)
>
> In order to save power, it would be useful to schedule work onto non-IDLE cpus
> instead of waking up an IDLE one.
>
> To achieve this, we need scheduler to guide kernel frameworks (like: timers &
> workqueues) on which is the most preferred CPU that must be used for these
> tasks.
>
> This patchset is about implementing this concept.
>
> - The first patch adds sched_select_cpu() routine which returns the preferred
>   cpu which is non-idle.
> - Second patch removes idle_cpu() calls from timer & hrtimer.
> - Third patch is about adapting this change in workqueue framework.
> - Fourth patch add migration capability in running timer
>
> Earlier discussions over v1 can be found here:
> http://www.mail-archive.com/linaro-dev@lists.linaro.org/msg13342.html
>
> Earlier discussions over this concept were done at last LPC:
> http://summit.linuxplumbersconf.org/lpc-2012/meeting/90/lpc2012-sched-timer-workqueue/
>
> Module created for testing this behavior is present here:
> http://git.linaro.org/gitweb?p=people/vireshk/module.git;a=summary

Ping!!

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] mmc: sdhci-spear: Initialize sdhci clk to 50 MHz

2012-11-26 Thread Viresh Kumar

On 20 November 2012 12:11, Viresh Kumar  wrote:
> On 8 November 2012 20:39, Viresh Kumar  wrote:
>> From: Vipul Kumar Samar 
>>
>> SPEAr sdhci driver expects the clock to be set to 50 MHz for proper 
>> functioning.
>> This patch sets clk to 50 MHz in probe.
>>
>> Signed-off-by: Vipul Kumar Samar 
>> Signed-off-by: Viresh Kumar 
>
> Ping!!

Hi Chris,

Are you taking these for v3.8?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] mmc: sdhci-spear: Initialize sdhci clk to 50 MHz

2012-11-26 Thread Viresh Kumar

On 26 November 2012 20:39, Chris Ball  wrote:
> On Thu, Nov 08 2012, Viresh Kumar wrote:
>> + ret = clk_set_rate(sdhci->clk, 5000);
>> + if (ret)
>> + dev_dbg(&pdev->dev, "Error setting desired clk, clk=%lu\n",
>> + clk_get_rate(sdhci->clk));
>> +
>>   if (np) {
>>   sdhci->data = sdhci_probe_config_dt(pdev);
>>   if (IS_ERR(sdhci->data)) {
>
> Can I change this to a dev_err()?  No-one's going to see a dev_dbg().

Yes and thanks :)

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] Input: spear-keyboard: Add clk_{un}prepare() support

2012-11-26 Thread Viresh Kumar

On 20 November 2012 14:17, Dmitry Torokhov  wrote:
> No, not really, it just does not work well with devm_* patches that got
> applied: on removal you unprepare clock as the very first operation and
> then devm_* does the rest which is wrong order.
>
> I am looking at adding dem_* for clocks.

I haven't seen your clock patch in your tree. I am asking because i want to
push this patch :)

You want me to repost it over your patches.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 3/3] mfd: stmpe: Update DT support in stmpe driver

2012-11-26 Thread Viresh Kumar

On 27 November 2012 00:10, Grant Likely  wrote:
> Ah, so it is configuring the way the device emits interrupts; not how
> the interrupt controller processes them. Fair enough.

Finally i am able to convince somebody that stmpe is different :)

> It would actually be good to ask the interrupt controller driver what
> kind of interrupt signal it expects for a given interrupt line. That
> should also solve the problem and I think it would be more useful to
> other devices. Can you investigate whether or not
> irqd_get_trigger_type() returns the information you need?

That's a pretty cool function to use. :)

Will check it out :)

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 3/3] mfd: stmpe: Update DT support in stmpe driver

2012-11-26 Thread Viresh Kumar

On 27 November 2012 08:10, Viresh Kumar  wrote:
> On 27 November 2012 00:10, Grant Likely  wrote:
>> It would actually be good to ask the interrupt controller driver what
>> kind of interrupt signal it expects for a given interrupt line. That
>> should also solve the problem and I think it would be more useful to
>> other devices. Can you investigate whether or not
>> irqd_get_trigger_type() returns the information you need?
>
> That's a pretty cool function to use. :)
>
> Will check it out :)

I was thinking about this logic in my earlier mail, don't know what stopped me
from thinking it is wrong. :(

Problem is with invert polarity, which the interrupt controller is not aware of.
For example, suppose interrupt controller needs Rising edge interrupt, but
the board has inverted the line between stmpe and IC. So, we will get
Rising high from the routine you mentioned, but we need to generate
opposite of that to make it rising high.

And so interrupt polarity field is still required.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

2012-11-26 Thread Viresh Kumar

Hi Tejun,

On 26 November 2012 22:45, Tejun Heo  wrote:
> On Tue, Nov 06, 2012 at 04:08:45PM +0530, Viresh Kumar wrote:

> I'm pretty skeptical about this.  queue_work() w/o explicit CPU
> assignment has always guaranteed that the work item will be executed
> on the same CPU.  I don't think there are too many users depending on
> that but am not sure at all that there are none.  I asked you last
> time that you would at the very least need to audit most users but it
> doesn't seem like there has been any effort there.

My bad. I completely missed/forgot that comment from your earlier mails.
Will do it.

> That said, if the obtained benefit is big enough, sure, we can
> definitely change the behavior, which isn't all that great to begin
> with, and try to shake out the bugs quicky by e.g. forcing all work
> items to execute on different CPUs, but you're presenting a few
> percent of work items being migrated to a different CPU from an
> already active CPU, which doesn't seem like such a big benefit to me
> even if the migration target CPU is somewhat more efficient.  How much
> powersaving are we talking about?

Hmm.. I actually implemented the problem discussed here:
(I know you have seen this earlier :) )

http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/08/lpc2012-sched-timer-workqueue.pdf

Specifically slides: 12 & 19.

I haven't done much power calculations with it and have tested it more from
functionality point of view.

@Vincent: Can you add some comments here?

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the arm-soc tree with the slave-dma tree

2012-11-26 Thread Viresh Kumar

On 27 November 2012 10:14, Stephen Rothwell  wrote:
> Today's linux-next merge of the arm-soc tree got a conflict in
> arch/arm/mach-spear13xx/spear1310.c between commit b47394911c26 ("ARM:
> SPEAr13xx: Pass DW DMAC platform data from DT") from the slave-dma tree
> and commit 300a6856324a ("ARM: SPEAr1310: Fix AUXDATA for compact flash
> controller") from the arm-soc tree.
>
> I have no idea how to fix this up, so I just effectively dropped the
> arm-doc tree patch.

Hi Stephen,

So sorry for that, Can you please take arm-soc version here? Patch 300a6856324a
is doing the correct thing. i.e. we need

+   OF_DEV_AUXDATA("arasan,cf-spear1340", MCIF_CF_BASE, NULL, &cf_pdata),

instead of

+   OF_DEV_AUXDATA("arasan,cf-spear1340", MCIF_CF_BASE, NULL, "cf"),

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 0/4] Create sched_select_cpu() and use it for workqueues and timers

2012-11-26 Thread Viresh Kumar

Hi Steven,

Thanks for sharing your opinion. :)

As, this went out to be a long thread of discussion (thanks Paul), i will try to
answer everything here.

On 26 November 2012 22:10, Steven Rostedt  wrote:
> This is a really bad time of year to post new patches :-/
> A lot of people are trying to get their own work done by year end and
> then there's holidays and such that are also distractions. Not to
> mention that a new merge window will be opening soon.

Patches are there since End of September and it was just a ping now (actually
with bad timing - merge window).

> As workqueues are set off by the CPU that queued it, what real benefit
> does this give? A CPU was active when it queued the work and the work
> should be done before it gets back to sleep.
>
> OK, an interrupt happens on an idle CPU and queues some work. That work
> should execute before the CPU gets back to sleep, right? I fail to see
> the benefit of trying to move that work elsewhere. The CPU had to wake
> up to execute the interrupt. It's no longer in a deep sleep (or any
> sleep for that matter).
>
> To me it seems best to avoid waking up an idle CPU in the first place.
>
> I'm still working off a turkey overdose, so maybe I'm missing something
> obvious.

Ok, here is the story behind these patches. The idea was first discussed
by Vincent in LPC this year:

http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/08/lpc2012-sched-timer-workqueue.pdf

Specifically slides: 12 & 19.

cpu idleness here means idleness from scheduler's perspective, i.e.
For the scheduler CPU is idle, if all below are true:
- current task is idle task
- nr_running == 0
- wake_list is empty

There are two use cases of workqueue patch 3/4:

- queue-work from timer interrupt, which re-arms itself (slide 12):
  Whether timer is deferred or not, it will queue the work on current
cpu once cpu
  wakes up. The cpu could have immediately gone back to idle state again, if the
  work is not queued on it. So, because this cpu is idle (from
schedulers point of
  view), we can move this work to other cpu.

- delayed-work, which rearm's itself (slide 19):
  Again the same thing, we could have kept the cpu in idle state for
some more time.

There might not be many users with this behavior, but a single user can actually
have significant impact.

For now, it doesn't take care of big LITTLE stuff that Paul pointed
out, but yes that
is in plan. Some work is going on in that direction too:

http://linux.kernel.narkive.com/mCyvFVUX/rfc-0-6-sched-packing-small-tasks

The very first reason of having this patchset was to have a single
preferred_cpu()
routine, which can be used by all frameworks. Timers being the first
user (already),
workqueues tried to be the second one.

I tested it more from functionality point of view rather than with
power figures :(
And i understood, that it is very much required.

Having said that, I believe all the questions raised are on PATCH 3/4
(workqueue).
And other 3 patches should be fine. Can you share your opinion on those patches,
I will then split this patchset and send workqueue stuff after doing some power
measurements later.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 3/3] mfd: stmpe: Update DT support in stmpe driver

2012-11-27 Thread Viresh Kumar

On 27 November 2012 14:10, Lee Jones  wrote:
> On Tue, 27 Nov 2012, Viresh Kumar wrote:
>> Problem is with invert polarity, which the interrupt controller is not aware 
>> of.
>> For example, suppose interrupt controller needs Rising edge interrupt, but
>> the board has inverted the line between stmpe and IC. So, we will get
>> Rising high from the routine you mentioned, but we need to generate
>> opposite of that to make it rising high.
>
> Surely that would be a hardware design error/quirk?

Yes.

> Can you give an example where this has happened?

I haven't seen this in any of SPEAr boards i have worked on. Maybe Rabin
would have, that's why he added that part of code :)

@Rabin/Linus: Do you remember why have you added this in stmpe driver:

+   if (stmpe->pdata->irq_invert_polarity)
+   icr ^= STMPE_ICR_LSB_HIGH;
+

Does somebody actually need it?

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

2012-11-27 Thread Viresh Kumar

On 27 November 2012 18:56, Steven Rostedt  wrote:
> A couple of things. The sched_select_cpu() is not cheap. It has a double
> loop of domains/cpus looking for a non idle cpu. If we have 1024 CPUs,
> and we are CPU 1023 and all other CPUs happen to be idle, we could be
> searching 1023 CPUs before we come up with our own.

Not sure if you missed the first check sched_select_cpu()

+int sched_select_cpu(unsigned int sd_flags)
+{
+   /* If Current cpu isn't idle, don't migrate anything */
+   if (!idle_cpu(cpu))
+   return cpu;

We aren't going to search if we aren't idle.

> Also, I really don't like this as a default behavior. It seems that this
> solution is for a very special case, and this can become very intrusive
> for the normal case.

We tried with an KCONFIG option for it, which Tejun rejected.

> To be honest, I'm uncomfortable with this approach. It seems to be
> fighting a symptom and not the disease. I'd rather find a way to keep
> work from being queued on wrong CPU. If it is a timer, find a way to
> move the timer. If it is something else, lets work to fix that. Doing
> searches of possibly all CPUs (unlikely, but it is there), just seems
> wrong to me.

As Vincent pointed out, on big LITTLE systems we just don't want to
serve works on big cores. That would be wasting too much of power.
Specially if we are going to wake up big cores.

It would be difficult to control the source driver (which queues work) to
little cores. We thought, if somebody wanted to queue work on current
cpu then they must use queue_work_on().

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 3/3] mfd: stmpe: Update DT support in stmpe driver

2012-11-27 Thread Viresh Kumar

On 28 November 2012 01:25, Rabin Vincent  wrote:
> 2012/11/27 Viresh Kumar :
>> On 27 November 2012 14:10, Lee Jones  wrote:
>> I haven't seen this in any of SPEAr boards i have worked on. Maybe Rabin
>> would have, that's why he added that part of code :)
>>
>> @Rabin/Linus: Do you remember why have you added this in stmpe driver:
>>
>> +   if (stmpe->pdata->irq_invert_polarity)
>> +   icr ^= STMPE_ICR_LSB_HIGH;
>> +
>>
>> Does somebody actually need it?
>
> It was (as irq_rev_pol) part of Luotao Fu's proposed STMPE811 patchset
> (https://patchwork.kernel.org/patch/106173/) which I integrated into my
> version of the STMPE driver, which didn't have it in its initial version
> (https://patchwork.kernel.org/patch/103273/).
>
> It's not something _I_ ever used.

I grep'd kernel and nobody is using it there, so lets get rid of it
completely :)
I will patch it.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 1/2] clk: use struct clk only for external API

2012-11-28 Thread viresh kumar

On Wed, Nov 28, 2012 at 5:22 PM, Rabin Vincent
 wrote:
> In order to provide per-user accounting, this separates the struct clk
> used in the common clock framework into two structures 'struct clk_core'
> and 'struct clk'.  struct clk_core will be used for internal
> manipulation and struct clk will be used in the clock API
> implementation.
>
> In this patch, struct clk is simply renamed to struct clk_core and a new
> struct clk is implemented which simply wraps it.  In the next patch, the
> new struct clk will be used to implement per-user clock enable
> accounting.
>
> There is a rather hacky #define of clk_core to clk for the non-common
> clk case in order to avoid a mass rename of all clk non-common clk
> implementations.
>
> Signed-off-by: Rabin Vincent 

Hi Rabin,

I reviewed it with help of git diff --word-diff and didn't found any silly
mistake in the replacement of clk with clk_core :)

> diff --git a/drivers/clk/clk-core.h b/drivers/clk/clk-core.h
> +#ifdef CONFIG_COMMON_CLK
> +#define clk_to_clk_core(clk)   ((struct clk_core *)(clk))
> +#define clk_core_to_clk(core)  ((struct clk *)(core))
> +#else
> +#define clk_to_clk_core(clk)   ((clk))
> +#define clk_core_to_clk(core)  ((core))
> +#endif

> diff --git a/include/linux/clk-private.h b/include/linux/clk-private.h
> +struct clk {
> +   struct clk_core clk;
> +};
> +

Isn't something wrong here? For common clk case shouldn't
this be:

> +#define clk_to_clk_core(clk)  (&clk->clk)
> +#define clk_core_to_clk(core) (container_of(clk, ...))  //not getting into 
> the exact format here

Sorry, if i am missing basics.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 1/2] clk: use struct clk only for external API

2012-11-28 Thread viresh kumar

On Wed, Nov 28, 2012 at 9:31 PM, viresh kumar  wrote:
> On Wed, Nov 28, 2012 at 5:22 PM, Rabin Vincent
> Isn't something wrong here? For common clk case shouldn't
> this be:
>
>> +#define clk_to_clk_core(clk)  (&clk->clk)
>> +#define clk_core_to_clk(core) (container_of(clk, ...))  //not getting into 
>> the exact format here
>
> Sorry, if i am missing basics.

Ok. I saw these getting updated in 2/2. But it means this individual patch
is broken and this is not allowed i believe.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V4 1/3] mfd: stmpe: Get rid of irq_invert_polarity

2012-11-28 Thread Viresh Kumar

Since the very first patch, stmpe core driver is using irq_invert_polarity as
part of platform data. But, nobody is actually using it in kernel till now.

Also, this is not something part of hardware specs, but is included to cater
some board mistakes or quirks.

So, better get rid of it. This is earlier discussed here:

https://lkml.org/lkml/2012/11/27/636

Signed-off-by: Viresh Kumar 
---

This is actually V1 of this patch.

 drivers/mfd/stmpe.c   | 7 ---
 include/linux/mfd/stmpe.h | 2 --
 2 files changed, 9 deletions(-)

diff --git a/drivers/mfd/stmpe.c b/drivers/mfd/stmpe.c
index 8b164f3..fece28b 100644
--- a/drivers/mfd/stmpe.c
+++ b/drivers/mfd/stmpe.c
@@ -960,13 +960,6 @@ static int __devinit stmpe_chip_init(struct stmpe *stmpe)
else
icr |= STMPE_ICR_LSB_HIGH;
}
-
-   if (stmpe->pdata->irq_invert_polarity) {
-   if (id == STMPE801_ID)
-   icr ^= STMPE801_REG_SYS_CTRL_INT_HI;
-   else
-   icr ^= STMPE_ICR_LSB_HIGH;
-   }
}
 
if (stmpe->pdata->autosleep) {
diff --git a/include/linux/mfd/stmpe.h b/include/linux/mfd/stmpe.h
index 15dac79..383ac15 100644
--- a/include/linux/mfd/stmpe.h
+++ b/include/linux/mfd/stmpe.h
@@ -190,7 +190,6 @@ struct stmpe_ts_platform_data {
  * @id: device id to distinguish between multiple STMPEs on the same board
  * @blocks: bitmask of blocks to enable (use STMPE_BLOCK_*)
  * @irq_trigger: IRQ trigger to use for the interrupt to the host
- * @irq_invert_polarity: IRQ line is connected with reversed polarity
  * @autosleep: bool to enable/disable stmpe autosleep
  * @autosleep_timeout: inactivity timeout in milliseconds for autosleep
  * @irq_base: base IRQ number.  %STMPE_NR_IRQS irqs will be used, or
@@ -207,7 +206,6 @@ struct stmpe_platform_data {
unsigned int blocks;
int irq_base;
unsigned int irq_trigger;
-   bool irq_invert_polarity;
bool autosleep;
bool irq_over_gpio;
int irq_gpio;
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V4 2/3] mfd: stmpe: Remove irq_trigger from platform data

2012-11-28 Thread Viresh Kumar

STMPE can confige the way the device emits interrupts and till now this
information is passed as part of platform data.

It would actually be good to ask the interrupt controller driver what kind of
interrupt signal it expects for a given interrupt line. We can get the irq type
by calling: irqd_get_trigger_type() routine.

So, now we don't need to pass it via platform data. This is earlier discussed
here:

https://lkml.org/lkml/2012/11/26/711

Signed-off-by: Viresh Kumar 
---

Linus/Shiraz: Can you please test this patch? You don't really need to test 1/3
and 3/3, but would be good if you do that too..

This is actually V1 of this patch.

 arch/arm/mach-ux500/board-mop500-stuib.c | 1 -
 drivers/mfd/stmpe.c  | 8 +---
 include/linux/mfd/stmpe.h| 4 ++--
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/arm/mach-ux500/board-mop500-stuib.c 
b/arch/arm/mach-ux500/board-mop500-stuib.c
index 564f57d..0efcf97 100644
--- a/arch/arm/mach-ux500/board-mop500-stuib.c
+++ b/arch/arm/mach-ux500/board-mop500-stuib.c
@@ -57,7 +57,6 @@ static struct stmpe_keypad_platform_data 
stmpe1601_keypad_data = {
 static struct stmpe_platform_data stmpe1601_data = {
.id = 1,
.blocks = STMPE_BLOCK_KEYPAD,
-   .irq_trigger= IRQF_TRIGGER_FALLING,
.irq_base   = MOP500_STMPE1601_IRQ(0),
.keypad = &stmpe1601_keypad_data,
.autosleep  = true,
diff --git a/drivers/mfd/stmpe.c b/drivers/mfd/stmpe.c
index fece28b..fa22ef4 100644
--- a/drivers/mfd/stmpe.c
+++ b/drivers/mfd/stmpe.c
@@ -914,7 +914,7 @@ static int __devinit stmpe_irq_init(struct stmpe *stmpe,
 
 static int __devinit stmpe_chip_init(struct stmpe *stmpe)
 {
-   unsigned int irq_trigger = stmpe->pdata->irq_trigger;
+   unsigned int irq_trigger = stmpe->irq_trigger;
int autosleep_timeout = stmpe->pdata->autosleep_timeout;
struct stmpe_variant_info *variant = stmpe->variant;
u8 icr = 0;
@@ -1106,6 +1106,9 @@ int __devinit stmpe_probe(struct stmpe_client_info *ci, 
int partnum)
return -ENODEV;
}
stmpe->variant = stmpe_noirq_variant_info[stmpe->partnum];
+   } else {
+   stmpe->irq_trigger =
+   irqd_get_trigger_type(irq_get_irq_data(stmpe->irq));
}
 
ret = stmpe_chip_init(stmpe);
@@ -1118,8 +1121,7 @@ int __devinit stmpe_probe(struct stmpe_client_info *ci, 
int partnum)
return ret;
 
ret = devm_request_threaded_irq(ci->dev, stmpe->irq, NULL,
-   stmpe_irq, pdata->irq_trigger | IRQF_ONESHOT,
-   "stmpe", stmpe);
+   stmpe_irq, IRQF_ONESHOT, "stmpe", stmpe);
if (ret) {
dev_err(stmpe->dev, "failed to request IRQ: %d\n",
ret);
diff --git a/include/linux/mfd/stmpe.h b/include/linux/mfd/stmpe.h
index 383ac15..0f2ea36 100644
--- a/include/linux/mfd/stmpe.h
+++ b/include/linux/mfd/stmpe.h
@@ -71,6 +71,7 @@ struct stmpe_client_info;
  *   different variants.  Indexed by one of STMPE_IDX_*.
  * @irq: irq number for stmpe
  * @irq_base: starting IRQ number for internal IRQs
+ * @irq_trigger: IRQ trigger to use for the interrupt to the host
  * @num_gpios: number of gpios, differs for variants
  * @ier: cache of IER registers for bus_lock
  * @oldier: cache of IER registers for bus_lock
@@ -89,6 +90,7 @@ struct stmpe {
 
int irq;
int irq_base;
+   unsigned int irq_trigger;
int num_gpios;
u8 ier[2];
u8 oldier[2];
@@ -189,7 +191,6 @@ struct stmpe_ts_platform_data {
  * struct stmpe_platform_data - STMPE platform data
  * @id: device id to distinguish between multiple STMPEs on the same board
  * @blocks: bitmask of blocks to enable (use STMPE_BLOCK_*)
- * @irq_trigger: IRQ trigger to use for the interrupt to the host
  * @autosleep: bool to enable/disable stmpe autosleep
  * @autosleep_timeout: inactivity timeout in milliseconds for autosleep
  * @irq_base: base IRQ number.  %STMPE_NR_IRQS irqs will be used, or
@@ -205,7 +206,6 @@ struct stmpe_platform_data {
int id;
unsigned int blocks;
int irq_base;
-   unsigned int irq_trigger;
bool autosleep;
bool irq_over_gpio;
int irq_gpio;
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V4 3/3] mfd: stmpe: Update DT support in stmpe driver

2012-11-28 Thread Viresh Kumar

From: Vipul Kumar Samar 

This patch extends existing DT support for stmpe devices. This updates:
- DT support from stmpe SPI and I2C drivers
- missing header files in stmpe.c
- stmpe_of_probe() with pwm, rotator and new bindings.
- Bindings are updated in binding document.

Signed-off-by: Vipul Kumar Samar 
Signed-off-by: Viresh Kumar 
---
V3->V4:
- Remove need to pass irq type and polarity from DT.
- Remove need to pass id from platform data.

 Documentation/devicetree/bindings/mfd/stmpe.txt |  9 ++---
 drivers/mfd/stmpe-i2c.c | 15 +++
 drivers/mfd/stmpe-spi.c | 15 +++
 drivers/mfd/stmpe.c | 22 --
 4 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/Documentation/devicetree/bindings/mfd/stmpe.txt 
b/Documentation/devicetree/bindings/mfd/stmpe.txt
index 8f0aeda..d2d88d9 100644
--- a/Documentation/devicetree/bindings/mfd/stmpe.txt
+++ b/Documentation/devicetree/bindings/mfd/stmpe.txt
@@ -1,8 +1,11 @@
-* STMPE Multi-Functional Device
+* ST Microelectronics STMPE Multi-Functional Device
+
+STMPE is an MFD device which may expose following inbuilt devices: gpio, 
keypad,
+touchscreen, adc, pwm, rotator.
 
 Required properties:
- - compatible   : "st,stmpe[811|1601|2401|2403]"
- - reg  : I2C address of the device
+ - compatible   : "st,stmpe[610|801|811|1601|2401|2403]"
+ - reg  : I2C/SPI address of the device
 
 Optional properties:
  - interrupts   : The interrupt outputs from the controller
diff --git a/drivers/mfd/stmpe-i2c.c b/drivers/mfd/stmpe-i2c.c
index c734dc3..003c49b 100644
--- a/drivers/mfd/stmpe-i2c.c
+++ b/drivers/mfd/stmpe-i2c.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "stmpe.h"
 
@@ -81,6 +82,19 @@ static const struct i2c_device_id stmpe_i2c_id[] = {
 };
 MODULE_DEVICE_TABLE(i2c, stmpe_id);
 
+#ifdef CONFIG_OF
+static const struct of_device_id stmpe_dt_ids[] = {
+   { .compatible = "st,stmpe610", .data = &stmpe_i2c_id[0], },
+   { .compatible = "st,stmpe801", .data = &stmpe_i2c_id[1], },
+   { .compatible = "st,stmpe811", .data = &stmpe_i2c_id[2], },
+   { .compatible = "st,stmpe1601", .data = &stmpe_i2c_id[3], },
+   { .compatible = "st,stmpe2401", .data = &stmpe_i2c_id[4], },
+   { .compatible = "st,stmpe2403", .data = &stmpe_i2c_id[5], },
+   { }
+};
+MODULE_DEVICE_TABLE(of, stmpe_dt_ids);
+#endif
+
 static struct i2c_driver stmpe_i2c_driver = {
.driver = {
.name = "stmpe-i2c",
@@ -88,6 +102,7 @@ static struct i2c_driver stmpe_i2c_driver = {
 #ifdef CONFIG_PM
.pm = &stmpe_dev_pm_ops,
 #endif
+   .of_match_table = of_match_ptr(stmpe_dt_ids),
},
.probe  = stmpe_i2c_probe,
.remove = __devexit_p(stmpe_i2c_remove),
diff --git a/drivers/mfd/stmpe-spi.c b/drivers/mfd/stmpe-spi.c
index f86e3fc..e482f5f 100644
--- a/drivers/mfd/stmpe-spi.c
+++ b/drivers/mfd/stmpe-spi.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "stmpe.h"
@@ -119,6 +120,19 @@ static const struct spi_device_id stmpe_spi_id[] = {
 };
 MODULE_DEVICE_TABLE(spi, stmpe_id);
 
+#ifdef CONFIG_OF
+static const struct of_device_id stmpe_dt_ids[] = {
+   { .compatible = "st,stmpe610", .data = (void *)STMPE610, },
+   { .compatible = "st,stmpe801", .data = (void *)STMPE801, },
+   { .compatible = "st,stmpe811", .data = (void *)STMPE811, },
+   { .compatible = "st,stmpe1601", .data = (void *)STMPE1601, },
+   { .compatible = "st,stmpe2401", .data = (void *)STMPE2401, },
+   { .compatible = "st,stmpe2403", .data = (void *)STMPE2403, },
+   { }
+};
+MODULE_DEVICE_TABLE(of, stmpe_dt_ids);
+#endif
+
 static struct spi_driver stmpe_spi_driver = {
.driver = {
.name   = "stmpe-spi",
@@ -126,6 +140,7 @@ static struct spi_driver stmpe_spi_driver = {
 #ifdef CONFIG_PM
.pm = &stmpe_dev_pm_ops,
 #endif
+   .of_match_table = of_match_ptr(stmpe_dt_ids),
},
.probe  = stmpe_spi_probe,
.remove = __devexit_p(stmpe_spi_remove),
diff --git a/drivers/mfd/stmpe.c b/drivers/mfd/stmpe.c
index fa22ef4..d3771ba 100644
--- a/drivers/mfd/stmpe.c
+++ b/drivers/mfd/stmpe.c
@@ -7,6 +7,7 @@
  * Author: Rabin Vincent  for ST-Ericsson
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -14,6 +15,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include "stmpe.h"
@@ -1019,6 +1022,12 @@ void __devinit stmpe_of_probe(struct stmpe_platform_data 
*pdata,
 {
struc

Re: [PATCH V4 1/3] mfd: stmpe: Get rid of irq_invert_polarity

2012-11-28 Thread Viresh Kumar

On 29 November 2012 00:05, Viresh Kumar  wrote:
> Since the very first patch, stmpe core driver is using irq_invert_polarity as
> part of platform data. But, nobody is actually using it in kernel till now.
>
> Also, this is not something part of hardware specs, but is included to cater
> some board mistakes or quirks.
>
> So, better get rid of it. This is earlier discussed here:
>
> https://lkml.org/lkml/2012/11/27/636
>
> Signed-off-by: Viresh Kumar 

Please ignore this series.. Two issues:
- rebased over an un accepted patch which rearranges header file
- I can improve patch 2/3 a bit.

I will resend it shortlly.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V4 Resend 1/3] mfd: stmpe: Get rid of irq_invert_polarity

2012-11-28 Thread Viresh Kumar

Since the very first patch, stmpe core driver is using irq_invert_polarity as
part of platform data. But, nobody is actually using it in kernel till now.

Also, this is not something part of hardware specs, but is included to cater
some board mistakes or quirks.

So, better get rid of it. This is earlier discussed here:

https://lkml.org/lkml/2012/11/27/636

Signed-off-by: Viresh Kumar 
---
This is actually V1 of this patch.

 drivers/mfd/stmpe.c   | 7 ---
 include/linux/mfd/stmpe.h | 2 --
 2 files changed, 9 deletions(-)

diff --git a/drivers/mfd/stmpe.c b/drivers/mfd/stmpe.c
index 6175aa4..34408b4 100644
--- a/drivers/mfd/stmpe.c
+++ b/drivers/mfd/stmpe.c
@@ -960,13 +960,6 @@ static int __devinit stmpe_chip_init(struct stmpe *stmpe)
else
icr |= STMPE_ICR_LSB_HIGH;
}
-
-   if (stmpe->pdata->irq_invert_polarity) {
-   if (id == STMPE801_ID)
-   icr ^= STMPE801_REG_SYS_CTRL_INT_HI;
-   else
-   icr ^= STMPE_ICR_LSB_HIGH;
-   }
}
 
if (stmpe->pdata->autosleep) {
diff --git a/include/linux/mfd/stmpe.h b/include/linux/mfd/stmpe.h
index 15dac79..383ac15 100644
--- a/include/linux/mfd/stmpe.h
+++ b/include/linux/mfd/stmpe.h
@@ -190,7 +190,6 @@ struct stmpe_ts_platform_data {
  * @id: device id to distinguish between multiple STMPEs on the same board
  * @blocks: bitmask of blocks to enable (use STMPE_BLOCK_*)
  * @irq_trigger: IRQ trigger to use for the interrupt to the host
- * @irq_invert_polarity: IRQ line is connected with reversed polarity
  * @autosleep: bool to enable/disable stmpe autosleep
  * @autosleep_timeout: inactivity timeout in milliseconds for autosleep
  * @irq_base: base IRQ number.  %STMPE_NR_IRQS irqs will be used, or
@@ -207,7 +206,6 @@ struct stmpe_platform_data {
unsigned int blocks;
int irq_base;
unsigned int irq_trigger;
-   bool irq_invert_polarity;
bool autosleep;
bool irq_over_gpio;
int irq_gpio;
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V4 Resend 2/3] mfd: stmpe: Remove irq_trigger from platform data

2012-11-28 Thread Viresh Kumar

STMPE can confige the way the device emits interrupts and till now this
information is passed as part of platform data.

It would actually be good to ask the interrupt controller driver what kind of
interrupt signal it expects for a given interrupt line. We can get the irq type
by calling: irqd_get_trigger_type() routine.

So, now we don't need to pass it via platform data. This is earlier discussed
here:

https://lkml.org/lkml/2012/11/26/711

Signed-off-by: Viresh Kumar 
---

@Linus/Shiraz: Can you please test this patch? You don't really need to test 1/3
and 3/3, but would be good if you do that too..

This is actually V1 of this patch.

 arch/arm/mach-ux500/board-mop500-stuib.c | 1 -
 drivers/mfd/stmpe.c  | 7 ---
 include/linux/mfd/stmpe.h| 2 --
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/arm/mach-ux500/board-mop500-stuib.c 
b/arch/arm/mach-ux500/board-mop500-stuib.c
index 564f57d..0efcf97 100644
--- a/arch/arm/mach-ux500/board-mop500-stuib.c
+++ b/arch/arm/mach-ux500/board-mop500-stuib.c
@@ -57,7 +57,6 @@ static struct stmpe_keypad_platform_data 
stmpe1601_keypad_data = {
 static struct stmpe_platform_data stmpe1601_data = {
.id = 1,
.blocks = STMPE_BLOCK_KEYPAD,
-   .irq_trigger= IRQF_TRIGGER_FALLING,
.irq_base   = MOP500_STMPE1601_IRQ(0),
.keypad = &stmpe1601_keypad_data,
.autosleep  = true,
diff --git a/drivers/mfd/stmpe.c b/drivers/mfd/stmpe.c
index 34408b4..10819e6 100644
--- a/drivers/mfd/stmpe.c
+++ b/drivers/mfd/stmpe.c
@@ -914,7 +914,6 @@ static int __devinit stmpe_irq_init(struct stmpe *stmpe,
 
 static int __devinit stmpe_chip_init(struct stmpe *stmpe)
 {
-   unsigned int irq_trigger = stmpe->pdata->irq_trigger;
int autosleep_timeout = stmpe->pdata->autosleep_timeout;
struct stmpe_variant_info *variant = stmpe->variant;
u8 icr = 0;
@@ -941,6 +940,9 @@ static int __devinit stmpe_chip_init(struct stmpe *stmpe)
return ret;
 
if (stmpe->irq >= 0) {
+   unsigned int irq_trigger =
+   irqd_get_trigger_type(irq_get_irq_data(stmpe->irq));
+
if (id == STMPE801_ID)
icr = STMPE801_REG_SYS_CTRL_INT_EN;
else
@@ -1118,8 +1120,7 @@ int __devinit stmpe_probe(struct stmpe_client_info *ci, 
int partnum)
return ret;
 
ret = devm_request_threaded_irq(ci->dev, stmpe->irq, NULL,
-   stmpe_irq, pdata->irq_trigger | IRQF_ONESHOT,
-   "stmpe", stmpe);
+   stmpe_irq, IRQF_ONESHOT, "stmpe", stmpe);
if (ret) {
dev_err(stmpe->dev, "failed to request IRQ: %d\n",
ret);
diff --git a/include/linux/mfd/stmpe.h b/include/linux/mfd/stmpe.h
index 383ac15..2519326 100644
--- a/include/linux/mfd/stmpe.h
+++ b/include/linux/mfd/stmpe.h
@@ -189,7 +189,6 @@ struct stmpe_ts_platform_data {
  * struct stmpe_platform_data - STMPE platform data
  * @id: device id to distinguish between multiple STMPEs on the same board
  * @blocks: bitmask of blocks to enable (use STMPE_BLOCK_*)
- * @irq_trigger: IRQ trigger to use for the interrupt to the host
  * @autosleep: bool to enable/disable stmpe autosleep
  * @autosleep_timeout: inactivity timeout in milliseconds for autosleep
  * @irq_base: base IRQ number.  %STMPE_NR_IRQS irqs will be used, or
@@ -205,7 +204,6 @@ struct stmpe_platform_data {
int id;
unsigned int blocks;
int irq_base;
-   unsigned int irq_trigger;
bool autosleep;
bool irq_over_gpio;
int irq_gpio;
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V4 Resend 3/3] mfd: stmpe: Update DT support in stmpe driver

2012-11-28 Thread Viresh Kumar

From: Vipul Kumar Samar 

This patch extends existing DT support for stmpe devices. This updates:
- DT support from stmpe SPI and I2C drivers
- missing header files in stmpe.c
- stmpe_of_probe() with pwm, rotator and new bindings.
- Bindings are updated in binding document.

Signed-off-by: Vipul Kumar Samar 
Signed-off-by: Viresh Kumar 
---
V3->V4:
- Remove need to pass irq type and polarity from DT.
- Remove need to pass id from platform data.

 Documentation/devicetree/bindings/mfd/stmpe.txt |  9 ++---
 drivers/mfd/stmpe-i2c.c | 15 +++
 drivers/mfd/stmpe-spi.c | 15 +++
 drivers/mfd/stmpe.c | 22 --
 4 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/Documentation/devicetree/bindings/mfd/stmpe.txt 
b/Documentation/devicetree/bindings/mfd/stmpe.txt
index 8f0aeda..d2d88d9 100644
--- a/Documentation/devicetree/bindings/mfd/stmpe.txt
+++ b/Documentation/devicetree/bindings/mfd/stmpe.txt
@@ -1,8 +1,11 @@
-* STMPE Multi-Functional Device
+* ST Microelectronics STMPE Multi-Functional Device
+
+STMPE is an MFD device which may expose following inbuilt devices: gpio, 
keypad,
+touchscreen, adc, pwm, rotator.
 
 Required properties:
- - compatible   : "st,stmpe[811|1601|2401|2403]"
- - reg  : I2C address of the device
+ - compatible   : "st,stmpe[610|801|811|1601|2401|2403]"
+ - reg  : I2C/SPI address of the device
 
 Optional properties:
  - interrupts   : The interrupt outputs from the controller
diff --git a/drivers/mfd/stmpe-i2c.c b/drivers/mfd/stmpe-i2c.c
index c734dc3..003c49b 100644
--- a/drivers/mfd/stmpe-i2c.c
+++ b/drivers/mfd/stmpe-i2c.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "stmpe.h"
 
@@ -81,6 +82,19 @@ static const struct i2c_device_id stmpe_i2c_id[] = {
 };
 MODULE_DEVICE_TABLE(i2c, stmpe_id);
 
+#ifdef CONFIG_OF
+static const struct of_device_id stmpe_dt_ids[] = {
+   { .compatible = "st,stmpe610", .data = &stmpe_i2c_id[0], },
+   { .compatible = "st,stmpe801", .data = &stmpe_i2c_id[1], },
+   { .compatible = "st,stmpe811", .data = &stmpe_i2c_id[2], },
+   { .compatible = "st,stmpe1601", .data = &stmpe_i2c_id[3], },
+   { .compatible = "st,stmpe2401", .data = &stmpe_i2c_id[4], },
+   { .compatible = "st,stmpe2403", .data = &stmpe_i2c_id[5], },
+   { }
+};
+MODULE_DEVICE_TABLE(of, stmpe_dt_ids);
+#endif
+
 static struct i2c_driver stmpe_i2c_driver = {
.driver = {
.name = "stmpe-i2c",
@@ -88,6 +102,7 @@ static struct i2c_driver stmpe_i2c_driver = {
 #ifdef CONFIG_PM
.pm = &stmpe_dev_pm_ops,
 #endif
+   .of_match_table = of_match_ptr(stmpe_dt_ids),
},
.probe  = stmpe_i2c_probe,
.remove = __devexit_p(stmpe_i2c_remove),
diff --git a/drivers/mfd/stmpe-spi.c b/drivers/mfd/stmpe-spi.c
index 9edfe86..1e2bff0 100644
--- a/drivers/mfd/stmpe-spi.c
+++ b/drivers/mfd/stmpe-spi.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "stmpe.h"
 
@@ -119,6 +120,19 @@ static const struct spi_device_id stmpe_spi_id[] = {
 };
 MODULE_DEVICE_TABLE(spi, stmpe_id);
 
+#ifdef CONFIG_OF
+static const struct of_device_id stmpe_dt_ids[] = {
+   { .compatible = "st,stmpe610", .data = (void *)STMPE610, },
+   { .compatible = "st,stmpe801", .data = (void *)STMPE801, },
+   { .compatible = "st,stmpe811", .data = (void *)STMPE811, },
+   { .compatible = "st,stmpe1601", .data = (void *)STMPE1601, },
+   { .compatible = "st,stmpe2401", .data = (void *)STMPE2401, },
+   { .compatible = "st,stmpe2403", .data = (void *)STMPE2403, },
+   { }
+};
+MODULE_DEVICE_TABLE(of, stmpe_dt_ids);
+#endif
+
 static struct spi_driver stmpe_spi_driver = {
.driver = {
.name   = "stmpe-spi",
@@ -126,6 +140,7 @@ static struct spi_driver stmpe_spi_driver = {
 #ifdef CONFIG_PM
.pm = &stmpe_dev_pm_ops,
 #endif
+   .of_match_table = of_match_ptr(stmpe_dt_ids),
},
.probe  = stmpe_spi_probe,
.remove = __devexit_p(stmpe_spi_remove),
diff --git a/drivers/mfd/stmpe.c b/drivers/mfd/stmpe.c
index 10819e6..88cd8e5 100644
--- a/drivers/mfd/stmpe.c
+++ b/drivers/mfd/stmpe.c
@@ -7,12 +7,15 @@
  * Author: Rabin Vincent  for ST-Ericsson
  */
 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -1021,6 +1024,12 @@ void __devinit stmpe_of_probe(struct stmpe_platform_data 
*pdata,
 {
struct device_node *child;
 
+

Re: linux-next: manual merge of the driver-core tree with the slave-dma tree

2012-11-28 Thread Viresh Kumar

On 29 November 2012 09:10, Stephen Rothwell  wrote:
> diff --cc drivers/dma/dw_dmac.c

>  +#ifdef CONFIG_OF
>  +static struct dw_dma_platform_data *
>  +__devinit dw_dma_parse_dt(struct platform_device *pdev)

Do we need a separate patch to fix this place? I believe yes.

> - static int __devinit dw_probe(struct platform_device *pdev)
> + static int dw_probe(struct platform_device *pdev)

Thanks for this.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V4 Resend 3/3] mfd: stmpe: Update DT support in stmpe driver

2012-11-29 Thread Viresh Kumar

On 29 November 2012 14:53, Lee Jones  wrote:
>> From: Vipul Kumar Samar 
>> + /*
>> +  * Distinct names of same cell-type within multiple instances of stmpe
>> +  * will be guaranteed by DT.
>> +  */
>> + pdata->id = -1;
>
> And what if we're not booting with DT?

Then it is responsibility of code creating platform data for stmpe to fill
it appropriately. i.e. If there is only one instance of stmpe on board, then
better pass it as -1, else start counting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V4 Resend 2/3] mfd: stmpe: Remove irq_trigger from platform data

2012-11-29 Thread Viresh Kumar

On 29 November 2012 15:04, Lee Jones  wrote:
> On Thu, 29 Nov 2012, Viresh Kumar wrote:
>
>> STMPE can confige
>
> configure?
>
>> the way the device emits interrupts and till now this
>
>   until?

Ahh... Will fix them. This happens when you send patches at midnight. :)

>> diff --git a/drivers/mfd/stmpe.c b/drivers/mfd/stmpe.c

>>   ret = devm_request_threaded_irq(ci->dev, stmpe->irq, NULL,
>> - stmpe_irq, pdata->irq_trigger | IRQF_ONESHOT,
>> - "stmpe", stmpe);
>> + stmpe_irq, IRQF_ONESHOT, "stmpe", stmpe);
>
> Forgive my ignorance, but you're no longer passing irq_trigger.
>
> Is this intentional? If so, why was it needed before and not now?

Yes, it was intentional. I thought it wasn't required at all. But my mind
is changing a bit now. I feel it is not required for DT, as trigger prop
is already passed in the interrupts cell. But for non-DT user, this is
still required. As there is not other way by which IRQ controller will
come to know what irq trigger type to enable for this irq line.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V5 1/2] mfd: stmpe: Get rid of irq_invert_polarity

2012-11-29 Thread Viresh Kumar

Since the very first patch, stmpe core driver is using irq_invert_polarity as
part of platform data. But, nobody is actually using it in kernel till now.

Also, this is not something part of hardware specs, but is included to cater
some board mistakes or quirks.

So, better get rid of it. This is earlier discussed here:

https://lkml.org/lkml/2012/11/27/636

Acked-by: Lee Jones 
Signed-off-by: Viresh Kumar 
---
Nothing changed from last version.

 drivers/mfd/stmpe.c   | 7 ---
 include/linux/mfd/stmpe.h | 2 --
 2 files changed, 9 deletions(-)

diff --git a/drivers/mfd/stmpe.c b/drivers/mfd/stmpe.c
index 6175aa4..34408b4 100644
--- a/drivers/mfd/stmpe.c
+++ b/drivers/mfd/stmpe.c
@@ -960,13 +960,6 @@ static int __devinit stmpe_chip_init(struct stmpe *stmpe)
else
icr |= STMPE_ICR_LSB_HIGH;
}
-
-   if (stmpe->pdata->irq_invert_polarity) {
-   if (id == STMPE801_ID)
-   icr ^= STMPE801_REG_SYS_CTRL_INT_HI;
-   else
-   icr ^= STMPE_ICR_LSB_HIGH;
-   }
}
 
if (stmpe->pdata->autosleep) {
diff --git a/include/linux/mfd/stmpe.h b/include/linux/mfd/stmpe.h
index 15dac79..383ac15 100644
--- a/include/linux/mfd/stmpe.h
+++ b/include/linux/mfd/stmpe.h
@@ -190,7 +190,6 @@ struct stmpe_ts_platform_data {
  * @id: device id to distinguish between multiple STMPEs on the same board
  * @blocks: bitmask of blocks to enable (use STMPE_BLOCK_*)
  * @irq_trigger: IRQ trigger to use for the interrupt to the host
- * @irq_invert_polarity: IRQ line is connected with reversed polarity
  * @autosleep: bool to enable/disable stmpe autosleep
  * @autosleep_timeout: inactivity timeout in milliseconds for autosleep
  * @irq_base: base IRQ number.  %STMPE_NR_IRQS irqs will be used, or
@@ -207,7 +206,6 @@ struct stmpe_platform_data {
unsigned int blocks;
int irq_base;
unsigned int irq_trigger;
-   bool irq_invert_polarity;
bool autosleep;
bool irq_over_gpio;
int irq_gpio;
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V5 2/2] mfd: stmpe: Update DT support in stmpe driver

2012-11-29 Thread Viresh Kumar

From: Vipul Kumar Samar 

This patch extends existing DT support for stmpe devices. This updates:
- DT support from stmpe SPI and I2C drivers
- missing header files in stmpe.c
- stmpe_of_probe() with pwm, rotator and new bindings.
- Bindings are updated in binding document.

Signed-off-by: Vipul Kumar Samar 
Signed-off-by: Viresh Kumar 
---
V4->V5:
--
- 2/3 and 3/3 merged.
- irq_trigger is kept same for non-DT booti.

@Lee: I haven't added your Acked-by, because this differs from your Acked
version.

 Documentation/devicetree/bindings/mfd/stmpe.txt |  9 ++---
 drivers/mfd/stmpe-i2c.c | 15 ++
 drivers/mfd/stmpe-spi.c | 15 ++
 drivers/mfd/stmpe.c | 27 +++--
 4 files changed, 57 insertions(+), 9 deletions(-)

diff --git a/Documentation/devicetree/bindings/mfd/stmpe.txt 
b/Documentation/devicetree/bindings/mfd/stmpe.txt
index 8f0aeda..d2d88d9 100644
--- a/Documentation/devicetree/bindings/mfd/stmpe.txt
+++ b/Documentation/devicetree/bindings/mfd/stmpe.txt
@@ -1,8 +1,11 @@
-* STMPE Multi-Functional Device
+* ST Microelectronics STMPE Multi-Functional Device
+
+STMPE is an MFD device which may expose following inbuilt devices: gpio, 
keypad,
+touchscreen, adc, pwm, rotator.
 
 Required properties:
- - compatible   : "st,stmpe[811|1601|2401|2403]"
- - reg  : I2C address of the device
+ - compatible   : "st,stmpe[610|801|811|1601|2401|2403]"
+ - reg  : I2C/SPI address of the device
 
 Optional properties:
  - interrupts   : The interrupt outputs from the controller
diff --git a/drivers/mfd/stmpe-i2c.c b/drivers/mfd/stmpe-i2c.c
index c734dc3..003c49b 100644
--- a/drivers/mfd/stmpe-i2c.c
+++ b/drivers/mfd/stmpe-i2c.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "stmpe.h"
 
@@ -81,6 +82,19 @@ static const struct i2c_device_id stmpe_i2c_id[] = {
 };
 MODULE_DEVICE_TABLE(i2c, stmpe_id);
 
+#ifdef CONFIG_OF
+static const struct of_device_id stmpe_dt_ids[] = {
+   { .compatible = "st,stmpe610", .data = &stmpe_i2c_id[0], },
+   { .compatible = "st,stmpe801", .data = &stmpe_i2c_id[1], },
+   { .compatible = "st,stmpe811", .data = &stmpe_i2c_id[2], },
+   { .compatible = "st,stmpe1601", .data = &stmpe_i2c_id[3], },
+   { .compatible = "st,stmpe2401", .data = &stmpe_i2c_id[4], },
+   { .compatible = "st,stmpe2403", .data = &stmpe_i2c_id[5], },
+   { }
+};
+MODULE_DEVICE_TABLE(of, stmpe_dt_ids);
+#endif
+
 static struct i2c_driver stmpe_i2c_driver = {
.driver = {
.name = "stmpe-i2c",
@@ -88,6 +102,7 @@ static struct i2c_driver stmpe_i2c_driver = {
 #ifdef CONFIG_PM
.pm = &stmpe_dev_pm_ops,
 #endif
+   .of_match_table = of_match_ptr(stmpe_dt_ids),
},
.probe  = stmpe_i2c_probe,
.remove = __devexit_p(stmpe_i2c_remove),
diff --git a/drivers/mfd/stmpe-spi.c b/drivers/mfd/stmpe-spi.c
index 9edfe86..1e2bff0 100644
--- a/drivers/mfd/stmpe-spi.c
+++ b/drivers/mfd/stmpe-spi.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "stmpe.h"
 
@@ -119,6 +120,19 @@ static const struct spi_device_id stmpe_spi_id[] = {
 };
 MODULE_DEVICE_TABLE(spi, stmpe_id);
 
+#ifdef CONFIG_OF
+static const struct of_device_id stmpe_dt_ids[] = {
+   { .compatible = "st,stmpe610", .data = (void *)STMPE610, },
+   { .compatible = "st,stmpe801", .data = (void *)STMPE801, },
+   { .compatible = "st,stmpe811", .data = (void *)STMPE811, },
+   { .compatible = "st,stmpe1601", .data = (void *)STMPE1601, },
+   { .compatible = "st,stmpe2401", .data = (void *)STMPE2401, },
+   { .compatible = "st,stmpe2403", .data = (void *)STMPE2403, },
+   { }
+};
+MODULE_DEVICE_TABLE(of, stmpe_dt_ids);
+#endif
+
 static struct spi_driver stmpe_spi_driver = {
.driver = {
.name   = "stmpe-spi",
@@ -126,6 +140,7 @@ static struct spi_driver stmpe_spi_driver = {
 #ifdef CONFIG_PM
.pm = &stmpe_dev_pm_ops,
 #endif
+   .of_match_table = of_match_ptr(stmpe_dt_ids),
},
.probe  = stmpe_spi_probe,
.remove = __devexit_p(stmpe_spi_remove),
diff --git a/drivers/mfd/stmpe.c b/drivers/mfd/stmpe.c
index 34408b4..bb2dfdb 100644
--- a/drivers/mfd/stmpe.c
+++ b/drivers/mfd/stmpe.c
@@ -7,12 +7,15 @@
  * Author: Rabin Vincent  for ST-Ericsson
  */
 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -1019,6 +1022,14 @@ void __devinit stmpe_of_probe(struct stmpe_plat

[PATCH V3 0/7] Create sched_select_cpu() and use it for workqueues

2013-03-18 Thread Viresh Kumar

In order to save power, it would be useful to schedule light weight work on cpus
that aren't IDLE instead of waking up an IDLE one.

By idle cpu (from scheduler's perspective) we mean:
- Current task is idle task
- nr_running == 0
- wake_list is empty

This is already implemented for timers as get_nohz_timer_target(). We can figure
out few more users of this feature, like workqueues.

This patchset converts get_nohz_timer_target() into a generic API
sched_select_cpu() so that other frameworks (like workqueue) can also use it.

This routine returns the cpu which is non-idle. It accepts a bitwise OR of SD_*
flags present in linux/sched.h. If the local CPU isn't idle OR all cpus are
idle, local cpu is returned back. If local cpu is idle, then we must look for
another CPU which have all the flags passed as argument as set and isn't idle.

This patchset in first two patches creates generic API sched_select_cpu(). In
the third patch we create a new set of APIs for workqueues to queue work on any
cpu. All other patches migrate some of the users of workqueues which showed up
significantly on my setup. Others can be migrated later.

Earlier discussions over v1 and v2 can be found here:
http://www.mail-archive.com/linaro-dev@lists.linaro.org/msg13342.html
http://lists.linaro.org/pipermail/linaro-dev/2012-November/014344.html

Earlier discussions over this concept were done at last LPC:
http://summit.linuxplumbersconf.org/lpc-2012/meeting/90/lpc2012-sched-timer-workqueue/

Setup:
-
- ARM Vexpress TC2 - big.LITTLE CPU
- Core 0-1: A15, 2-4: A7
- rootfs: linaro-ubuntu-devel

This patchset has been tested on a big LITTLE system (heterogeneous) but is
useful for all other homogeneous systems as well. During these tests audio was
played in background using aplay.

Results:
---

Cluster A15 Energy  Cluster A7 Energy   Total
--  -   -

Without this patchset (Energy in Joules):
-

0.1511622.1835452.334707
0.2237302.6870672.910797
0.2896872.7327023.022389
0.4541982.7459083.200106
0.4955522.7464653.242017

Average:
0.3228662.6191372.942003


With this patchset (Energy in Joules):
-

0.1333612.2678222.401183
0.2606262.8333893.094015
0.1423652.2779292.420294
0.2467932.5825502.829343
0.1304622.2570332.387495

Average:
0.1827212.4437452.626466


Above tests are repeated multiple times and events are tracked using trace-cmd
and analysed using kernelshark. And it was easily noticeable that idle time for
many cpus has increased considerably, which eventually saved some power.

These patches are applied here for others to test:
http://git.linaro.org/gitweb?p=people/vireshk/linux.git;a=shortlog;h=refs/heads/sched-wq-migration-v3

Viresh Kumar (7):
  sched: Create sched_select_cpu() to give preferred CPU for power
saving
  timer: hrtimer: Don't check idle_cpu() before calling
get_nohz_timer_target()
  workqueue: Add helpers to schedule work on any cpu
  PHYLIB: queue work on any cpu
  mmc: queue work on any cpu
  block: queue work on any cpu
  fbcon: queue work on any cpu

 block/blk-core.c  |   6 +-
 block/blk-ioc.c   |   2 +-
 block/genhd.c |   9 ++-
 drivers/mmc/core/core.c   |   2 +-
 drivers/net/phy/phy.c |   9 +--
 drivers/video/console/fbcon.c |   2 +-
 include/linux/sched.h |  21 +-
 include/linux/workqueue.h |   5 ++
 kernel/hrtimer.c  |   2 +-
 kernel/sched/core.c   |  63 +---
 kernel/timer.c|   2 +-
 kernel/workqueue.c| 163 +-
 12 files changed, 192 insertions(+), 94 deletions(-)

-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V3 2/7] timer: hrtimer: Don't check idle_cpu() before calling get_nohz_timer_target()

2013-03-18 Thread Viresh Kumar

Check for current cpu's idleness is already done in implementation of
sched_select_cpu() which is called by get_nohz_timer_target(). So, no need to
call idle_cpu() twice, once from sched_select_cpu() and once from timer and
hrtimer before calling get_nohz_timer_target().

This patch removes calls to idle_cpu() from timer and hrtimer.

Signed-off-by: Viresh Kumar 
---
 kernel/hrtimer.c | 2 +-
 kernel/timer.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index cc47812..c56668d 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -161,7 +161,7 @@ struct hrtimer_clock_base *lock_hrtimer_base(const struct 
hrtimer *timer,
 static int hrtimer_get_target(int this_cpu, int pinned)
 {
 #ifdef CONFIG_NO_HZ
-   if (!pinned && get_sysctl_timer_migration() && idle_cpu(this_cpu))
+   if (!pinned && get_sysctl_timer_migration())
return get_nohz_timer_target();
 #endif
return this_cpu;
diff --git a/kernel/timer.c b/kernel/timer.c
index dbf7a78..83a0d3c 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -739,7 +739,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
cpu = smp_processor_id();
 
 #if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
-   if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu))
+   if (!pinned && get_sysctl_timer_migration())
cpu = get_nohz_timer_target();
 #endif
new_base = per_cpu(tvec_bases, cpu);
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu

2013-03-18 Thread Viresh Kumar

queue_work() queues work on current cpu. This may wake up an idle CPU, which is
actually not required.

Some of these works can be processed by any CPU and so we must select a non-idle
CPU here. The initial idea was to modify implementation of queue_work(), but
that may end up breaking lots of kernel code that would be a nightmare to debug.

So, we finalized to adding new workqueue interfaces, for works that don't depend
on a cpu to execute them.

This patch adds following new routines:
- queue_work_on_any_cpu
- queue_delayed_work_on_any_cpu

These routines would look for the closest (via scheduling domains) non-idle cpu
(non-idle from schedulers perspective). If the current cpu is not idle or all
cpus are idle, work will be scheduled on local cpu.

Signed-off-by: Viresh Kumar 
---
 include/linux/workqueue.h |   5 ++
 kernel/workqueue.c| 163 --
 2 files changed, 118 insertions(+), 50 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index df30763..f0f7068 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -114,6 +114,7 @@ struct delayed_work {
/* target workqueue and CPU ->timer uses to queue ->work */
struct workqueue_struct *wq;
int cpu;
+   bool on_any_cpu;
 };
 
 /*
@@ -418,10 +419,14 @@ int apply_workqueue_attrs(struct workqueue_struct *wq,
 extern bool queue_work_on(int cpu, struct workqueue_struct *wq,
struct work_struct *work);
 extern bool queue_work(struct workqueue_struct *wq, struct work_struct *work);
+extern bool queue_work_on_any_cpu(struct workqueue_struct *wq,
+   struct work_struct *work);
 extern bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
struct delayed_work *work, unsigned long delay);
 extern bool queue_delayed_work(struct workqueue_struct *wq,
struct delayed_work *work, unsigned long delay);
+extern bool queue_delayed_work_on_any_cpu(struct workqueue_struct *wq,
+   struct delayed_work *dwork, unsigned long delay);
 extern bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
struct delayed_work *dwork, unsigned long delay);
 extern bool mod_delayed_work(struct workqueue_struct *wq,
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0e4fa1d..cf9c570 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1215,7 +1215,7 @@ static bool is_chained_work(struct workqueue_struct *wq)
 }
 
 static void __queue_work(int cpu, struct workqueue_struct *wq,
-struct work_struct *work)
+struct work_struct *work, bool on_any_cpu)
 {
struct pool_workqueue *pwq;
struct worker_pool *last_pool;
@@ -1240,8 +1240,13 @@ static void __queue_work(int cpu, struct 
workqueue_struct *wq,
 retry:
/* pwq which will be used unless @work is executing elsewhere */
if (!(wq->flags & WQ_UNBOUND)) {
-   if (cpu == WORK_CPU_UNBOUND)
-   cpu = raw_smp_processor_id();
+   if (cpu == WORK_CPU_UNBOUND) {
+   if (on_any_cpu)
+   cpu = sched_select_cpu(0);
+   else
+   cpu = raw_smp_processor_id();
+   }
+
pwq = per_cpu_ptr(wq->cpu_pwqs, cpu);
} else {
pwq = first_pwq(wq);
@@ -1315,6 +1320,22 @@ retry:
spin_unlock(&pwq->pool->lock);
 }
 
+static bool __queue_work_on(int cpu, struct workqueue_struct *wq,
+  struct work_struct *work, bool on_any_cpu)
+{
+   bool ret = false;
+   unsigned long flags;
+
+   local_irq_save(flags);
+
+   if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
+   __queue_work(cpu, wq, work, on_any_cpu);
+   ret = true;
+   }
+
+   local_irq_restore(flags);
+   return ret;
+}
 /**
  * queue_work_on - queue work on specific cpu
  * @cpu: CPU number to execute work on
@@ -1329,18 +1350,7 @@ retry:
 bool queue_work_on(int cpu, struct workqueue_struct *wq,
   struct work_struct *work)
 {
-   bool ret = false;
-   unsigned long flags;
-
-   local_irq_save(flags);
-
-   if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
-   __queue_work(cpu, wq, work);
-   ret = true;
-   }
-
-   local_irq_restore(flags);
-   return ret;
+   return __queue_work_on(cpu, wq, work, false);
 }
 EXPORT_SYMBOL_GPL(queue_work_on);
 
@@ -1356,21 +1366,38 @@ EXPORT_SYMBOL_GPL(queue_work_on);
  */
 bool queue_work(struct workqueue_struct *wq, struct work_struct *work)
 {
-   return queue_work_on(WORK_CPU_UNBOUND, wq, work);
+   return __queue_work_on(WORK_CPU_UNBOUND, wq, work, false);
 }
 EXPORT_SYMBOL_GPL(queue_work);
 
+/**
+ * que

[PATCH V3 4/7] PHYLIB: queue work on any cpu

2013-03-18 Thread Viresh Kumar

Phylib uses workqueues for multiple purposes. There is no real dependency of
scheduling these on the cpu which scheduled them.

On a idle system, it is observed that and idle cpu wakes up many times just to
service this work. It would be better if we can schedule it on a cpu which isn't
idle to save on power.

By idle cpu (from scheduler's perspective) we mean:
- Current task is idle task
- nr_running == 0
- wake_list is empty

This patch replaces the schedule_work() and schedule_delayed_work() routines
with their queue_[delayed_]work_on_any_cpu() siblings with system_wq as
parameter.

These routines would look for the closest (via scheduling domains) non-idle cpu
(non-idle from schedulers perspective). If the current cpu is not idle or all
cpus are idle, work will be scheduled on local cpu.

Cc: "David S. Miller" 
Cc: net...@vger.kernel.org
Signed-off-by: Viresh Kumar 
---
 drivers/net/phy/phy.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 298b4c2..a517706 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -439,7 +439,7 @@ void phy_start_machine(struct phy_device *phydev,
 {
phydev->adjust_state = handler;
 
-   schedule_delayed_work(&phydev->state_queue, HZ);
+   queue_delayed_work_on_any_cpu(system_wq, &phydev->state_queue, HZ);
 }
 
 /**
@@ -527,7 +527,7 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
disable_irq_nosync(irq);
atomic_inc(&phydev->irq_disable);
 
-   schedule_work(&phydev->phy_queue);
+   queue_work_on_any_cpu(system_wq, &phydev->phy_queue);
 
return IRQ_HANDLED;
 }
@@ -682,7 +682,7 @@ static void phy_change(struct work_struct *work)
 
/* reschedule state queue work to run as soon as possible */
cancel_delayed_work_sync(&phydev->state_queue);
-   schedule_delayed_work(&phydev->state_queue, 0);
+   queue_delayed_work_on_any_cpu(system_wq, &phydev->state_queue, 0);
 
return;
 
@@ -966,7 +966,8 @@ void phy_state_machine(struct work_struct *work)
if (err < 0)
phy_error(phydev);
 
-   schedule_delayed_work(&phydev->state_queue, PHY_STATE_TIME * HZ);
+   queue_delayed_work_on_any_cpu(system_wq, &phydev->state_queue,
+   PHY_STATE_TIME * HZ);
 }
 
 static inline void mmd_phy_indirect(struct mii_bus *bus, int prtad, int devad,
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V3 5/7] mmc: queue work on any cpu

2013-03-18 Thread Viresh Kumar

mmc uses workqueues for running mmc_rescan(). There is no real dependency of
scheduling these on the cpu which scheduled them.

On a idle system, it is observed that and idle cpu wakes up many times just to
service this work. It would be better if we can schedule it on a cpu which isn't
idle to save on power.

By idle cpu (from scheduler's perspective) we mean:
- Current task is idle task
- nr_running == 0
- wake_list is empty

This patch replaces the queue_delayed_work() with
queue_delayed_work_on_any_cpu() siblings.

This routine would look for the closest (via scheduling domains) non-idle cpu
(non-idle from schedulers perspective). If the current cpu is not idle or all
cpus are idle, work will be scheduled on local cpu.

Cc: Chris Ball 
Cc: linux-...@vger.kernel.org
Signed-off-by: Viresh Kumar 
---
 drivers/mmc/core/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 9290bb5..adf331a 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -85,7 +85,7 @@ MODULE_PARM_DESC(
 static int mmc_schedule_delayed_work(struct delayed_work *work,
 unsigned long delay)
 {
-   return queue_delayed_work(workqueue, work, delay);
+   return queue_delayed_work_on_any_cpu(workqueue, work, delay);
 }
 
 /*
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V3 6/7] block: queue work on any cpu

2013-03-18 Thread Viresh Kumar

block layer uses workqueues for multiple purposes. There is no real dependency
of scheduling these on the cpu which scheduled them.

On a idle system, it is observed that and idle cpu wakes up many times just to
service this work. It would be better if we can schedule it on a cpu which isn't
idle to save on power.

By idle cpu (from scheduler's perspective) we mean:
- Current task is idle task
- nr_running == 0
- wake_list is empty

This patch replaces schedule_work() and queue_[delayed_]work() with
queue_[delayed_]work_on_any_cpu() siblings.

These routines would look for the closest (via scheduling domains) non-idle cpu
(non-idle from schedulers perspective). If the current cpu is not idle or all
cpus are idle, work will be scheduled on local cpu.

Cc: Jens Axboe 
Signed-off-by: Viresh Kumar 
---
 block/blk-core.c | 6 +++---
 block/blk-ioc.c  | 2 +-
 block/genhd.c| 9 ++---
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 186603b..14ae74f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -225,7 +225,7 @@ static void blk_delay_work(struct work_struct *work)
 void blk_delay_queue(struct request_queue *q, unsigned long msecs)
 {
if (likely(!blk_queue_dead(q)))
-   queue_delayed_work(kblockd_workqueue, &q->delay_work,
+   queue_delayed_work_on_any_cpu(kblockd_workqueue, &q->delay_work,
   msecs_to_jiffies(msecs));
 }
 EXPORT_SYMBOL(blk_delay_queue);
@@ -2852,14 +2852,14 @@ EXPORT_SYMBOL_GPL(blk_rq_prep_clone);
 
 int kblockd_schedule_work(struct request_queue *q, struct work_struct *work)
 {
-   return queue_work(kblockd_workqueue, work);
+   return queue_work_on_any_cpu(kblockd_workqueue, work);
 }
 EXPORT_SYMBOL(kblockd_schedule_work);
 
 int kblockd_schedule_delayed_work(struct request_queue *q,
struct delayed_work *dwork, unsigned long delay)
 {
-   return queue_delayed_work(kblockd_workqueue, dwork, delay);
+   return queue_delayed_work_on_any_cpu(kblockd_workqueue, dwork, delay);
 }
 EXPORT_SYMBOL(kblockd_schedule_delayed_work);
 
diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index 9c4bb82..2eefeb1 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -144,7 +144,7 @@ void put_io_context(struct io_context *ioc)
if (atomic_long_dec_and_test(&ioc->refcount)) {
spin_lock_irqsave(&ioc->lock, flags);
if (!hlist_empty(&ioc->icq_list))
-   schedule_work(&ioc->release_work);
+   queue_work_on_any_cpu(system_wq, &ioc->release_work);
else
free_ioc = true;
spin_unlock_irqrestore(&ioc->lock, flags);
diff --git a/block/genhd.c b/block/genhd.c
index a1ed52a..4bdb735 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1488,9 +1488,11 @@ static void __disk_unblock_events(struct gendisk *disk, 
bool check_now)
intv = disk_events_poll_jiffies(disk);
set_timer_slack(&ev->dwork.timer, intv / 4);
if (check_now)
-   queue_delayed_work(system_freezable_wq, &ev->dwork, 0);
+   queue_delayed_work_on_any_cpu(system_freezable_wq, &ev->dwork,
+   0);
else if (intv)
-   queue_delayed_work(system_freezable_wq, &ev->dwork, intv);
+   queue_delayed_work_on_any_cpu(system_freezable_wq, &ev->dwork,
+   intv);
 out_unlock:
spin_unlock_irqrestore(&ev->lock, flags);
 }
@@ -1626,7 +1628,8 @@ static void disk_check_events(struct disk_events *ev,
 
intv = disk_events_poll_jiffies(disk);
if (!ev->block && intv)
-   queue_delayed_work(system_freezable_wq, &ev->dwork, intv);
+   queue_delayed_work_on_any_cpu(system_freezable_wq, &ev->dwork,
+   intv);
 
spin_unlock_irq(&ev->lock);
 
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V3 7/7] fbcon: queue work on any cpu

2013-03-18 Thread Viresh Kumar

fbcon uses workqueues and it has no real dependency of scheduling these on the
cpu which scheduled them.

On a idle system, it is observed that and idle cpu wakes up few times just to
service this work. It would be better if we can schedule it on a cpu which isn't
idle to save on power.

By idle cpu (from scheduler's perspective) we mean:
- Current task is idle task
- nr_running == 0
- wake_list is empty

This patch replaces schedule_work() routine with queue_work_on_any_cpu()
sibling with system_wq as workqueue.

This routine would look for the closest (via scheduling domains) non-idle cpu
(non-idle from schedulers perspective). If the current cpu is not idle or all
cpus are idle, work will be scheduled on local cpu.

Cc: Dave Airlie 
Cc: linux-fb...@vger.kernel.org
Signed-off-by: Viresh Kumar 
---
 drivers/video/console/fbcon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/video/console/fbcon.c b/drivers/video/console/fbcon.c
index 3cd6759..a900997 100644
--- a/drivers/video/console/fbcon.c
+++ b/drivers/video/console/fbcon.c
@@ -404,7 +404,7 @@ static void cursor_timer_handler(unsigned long dev_addr)
struct fb_info *info = (struct fb_info *) dev_addr;
struct fbcon_ops *ops = info->fbcon_par;
 
-   schedule_work(&info->queue);
+   queue_work_on_any_cpu(system_wq, &info->queue);
mod_timer(&ops->cursor_timer, jiffies + HZ/5);
 }
 
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving

2013-03-18 Thread Viresh Kumar

In order to save power, it would be useful to schedule light weight work on cpus
that aren't IDLE instead of waking up an IDLE one.

By idle cpu (from scheduler's perspective) we mean:
- Current task is idle task
- nr_running == 0
- wake_list is empty

This is already implemented for timers as get_nohz_timer_target(). We can figure
out few more users of this feature, like workqueues.

This patch converts get_nohz_timer_target() into a generic API
sched_select_cpu() so that other frameworks (like workqueue) can also use it.

This routine returns the cpu which is non-idle. It accepts a bitwise OR of SD_*
flags present in linux/sched.h. If the local CPU isn't idle OR all cpus are
idle, local cpu is returned back. If local cpu is idle, then we must look for
another CPU which have all the flags passed as argument as set and isn't idle.

This patch reuses the code from get_nohz_timer_target() routine, which had
similar implementation.

Signed-off-by: Viresh Kumar 
---
 include/linux/sched.h | 21 +++--
 kernel/sched/core.c   | 63 +--
 2 files changed, 55 insertions(+), 29 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index e20580d..216fa0d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -230,14 +230,31 @@ extern void init_idle_bootup_task(struct task_struct 
*idle);
 
 extern int runqueue_is_locked(int cpu);
 
-#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ)
+#ifdef CONFIG_SMP
+extern int sched_select_cpu(unsigned int sd_flags);
+
+#ifdef CONFIG_NO_HZ
 extern void nohz_balance_enter_idle(int cpu);
 extern void set_cpu_sd_state_idle(void);
-extern int get_nohz_timer_target(void);
+/*
+ * In the semi idle case, use the nearest busy cpu for migrating timers
+ * from an idle cpu.  This is good for power-savings.
+ *
+ * We don't do similar optimization for completely idle system, as
+ * selecting an idle cpu will add more delays to the timers than intended
+ * (as that cpu's timer base may not be uptodate wrt jiffies etc).
+ */
+#define get_nohz_timer_target() sched_select_cpu(0)
 #else
 static inline void nohz_balance_enter_idle(int cpu) { }
 static inline void set_cpu_sd_state_idle(void) { }
+
+static inline int sched_select_cpu(unsigned int sd_flags)
+{
+   return raw_smp_processor_id();
+}
 #endif
+#endif /* CONFIG_SMP */
 
 /*
  * Only dump TASK_* tasks. (0 for all tasks)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b36635e..ccd76d7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -551,33 +551,6 @@ void resched_cpu(int cpu)
 
 #ifdef CONFIG_NO_HZ
 /*
- * In the semi idle case, use the nearest busy cpu for migrating timers
- * from an idle cpu.  This is good for power-savings.
- *
- * We don't do similar optimization for completely idle system, as
- * selecting an idle cpu will add more delays to the timers than intended
- * (as that cpu's timer base may not be uptodate wrt jiffies etc).
- */
-int get_nohz_timer_target(void)
-{
-   int cpu = smp_processor_id();
-   int i;
-   struct sched_domain *sd;
-
-   rcu_read_lock();
-   for_each_domain(cpu, sd) {
-   for_each_cpu(i, sched_domain_span(sd)) {
-   if (!idle_cpu(i)) {
-   cpu = i;
-   goto unlock;
-   }
-   }
-   }
-unlock:
-   rcu_read_unlock();
-   return cpu;
-}
-/*
  * When add_timer_on() enqueues a timer into the timer wheel of an
  * idle CPU then this timer might expire before the next timer event
  * which is scheduled to wake up that CPU. In case of a completely
@@ -648,6 +621,42 @@ void sched_avg_update(struct rq *rq)
}
 }
 
+/*
+ * This routine returns the nearest non-idle cpu. It accepts a bitwise OR of
+ * SD_* flags present in linux/sched.h. If the local CPU isn't idle, it is
+ * returned back. If it is idle, then we must look for another CPU which have
+ * all the flags passed as argument as set.
+ */
+int sched_select_cpu(unsigned int sd_flags)
+{
+   struct sched_domain *sd;
+   int cpu = smp_processor_id();
+   int i;
+
+   /* If Current cpu isn't idle, don't migrate anything */
+   if (!idle_cpu(cpu))
+   return cpu;
+
+   rcu_read_lock();
+   for_each_domain(cpu, sd) {
+/* If sd doesnt' have sd_flags set skip sd. */
+   if ((sd->flags & sd_flags) != sd_flags)
+   continue;
+
+   for_each_cpu(i, sched_domain_span(sd)) {
+   if (i == cpu)
+   continue;
+   if (!idle_cpu(i)) {
+   cpu = i;
+   goto unlock;
+   }
+   }
+   }
+unlock:
+   rcu_read_unlock();
+   return cpu;
+}
+
 #else /* !CONFIG_SMP */
 vo

Re: [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving

2013-03-18 Thread Viresh Kumar

On Mon, Mar 18, 2013 at 9:09 PM, Frederic Weisbecker  wrote:
> 2013/3/18 Viresh Kumar :

>> +static inline int sched_select_cpu(unsigned int sd_flags)
>> +{
>> +   return raw_smp_processor_id();
>
> I feel this should be symetric with the requirement of having
> preemption disabled as in the CONFIG_NO_HZ version. This should be
> smp_processor_id().

Yes, my fault.

>> +int sched_select_cpu(unsigned int sd_flags)
>
> It would be nice to have some more precise naming. sched_select_busy_cpu() ?

You are correct that name can be improved but busy may give it a different
meaning. Maybe sched_select_non_idle_cpu()?

Thanks for your instantaneous review :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 0/7] Create sched_select_cpu() and use it for workqueues

2013-03-18 Thread Viresh Kumar

On 18 March 2013 20:53, Viresh Kumar  wrote:
> In order to save power, it would be useful to schedule light weight work on 
> cpus
> that aren't IDLE instead of waking up an IDLE one.
>
> By idle cpu (from scheduler's perspective) we mean:
> - Current task is idle task
> - nr_running == 0
> - wake_list is empty

Oops!! I forgot the change log:

V2->V3:
- Dropped changes into core queue_work() API, rather create *_on_any_cpu()
  APIs
- Dropped running timers migration patch as that was broken
- Migrated few users of workqueues to use *_on_any_cpu() APIs.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving

2013-03-18 Thread Viresh Kumar

On 18 March 2013 21:27, Frederic Weisbecker  wrote:
> 2013/3/18 Viresh Kumar :
>>> It would be nice to have some more precise naming. sched_select_busy_cpu() ?
>>
>> You are correct that name can be improved but busy may give it a different
>> meaning. Maybe sched_select_non_idle_cpu()?
>
> That works too yeah.

Here is the fixup that i applied (it fixes another stupid mistake in sched.h).
I have pushed this series here again:

http://git.linaro.org/gitweb?p=people/vireshk/linux.git;a=shortlog;h=refs/heads/sched-wq-migration-v3

commit e769ff0d78fd2fb504ae056a44b70bba7c259126
Author: Viresh Kumar 
Date:   Fri Sep 14 15:13:05 2012 +0530

fixup! sched: Create sched_select_cpu() to give preferred CPU for
power saving

Signed-off-by: Viresh Kumar 
---
 include/linux/sched.h | 19 ++-
 kernel/sched/core.c   |  2 +-
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 216fa0d..db6655a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -231,11 +231,18 @@ extern void init_idle_bootup_task(struct
task_struct *idle);
 extern int runqueue_is_locked(int cpu);

 #ifdef CONFIG_SMP
-extern int sched_select_cpu(unsigned int sd_flags);
+extern int sched_select_non_idle_cpu(unsigned int sd_flags);
+#else
+static inline int sched_select_non_idle_cpu(unsigned int sd_flags)
+{
+   return smp_processor_id();
+}
+#endif

-#ifdef CONFIG_NO_HZ
+#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ)
 extern void nohz_balance_enter_idle(int cpu);
 extern void set_cpu_sd_state_idle(void);
+
 /*
  * In the semi idle case, use the nearest busy cpu for migrating timers
  * from an idle cpu.  This is good for power-savings.
@@ -244,17 +251,11 @@ extern void set_cpu_sd_state_idle(void);
  * selecting an idle cpu will add more delays to the timers than intended
  * (as that cpu's timer base may not be uptodate wrt jiffies etc).
  */
-#define get_nohz_timer_target() sched_select_cpu(0)
+#define get_nohz_timer_target() sched_select_non_idle_cpu(0)
 #else
 static inline void nohz_balance_enter_idle(int cpu) { }
 static inline void set_cpu_sd_state_idle(void) { }
-
-static inline int sched_select_cpu(unsigned int sd_flags)
-{
-   return raw_smp_processor_id();
-}
 #endif
-#endif /* CONFIG_SMP */

 /*
  * Only dump TASK_* tasks. (0 for all tasks)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ccd76d7..0265a5e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -627,7 +627,7 @@ void sched_avg_update(struct rq *rq)
  * returned back. If it is idle, then we must look for another CPU which have
  * all the flags passed as argument as set.
  */
-int sched_select_cpu(unsigned int sd_flags)
+int sched_select_non_idle_cpu(unsigned int sd_flags)
 {
struct sched_domain *sd;
int cpu = smp_processor_id();
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu

2013-03-18 Thread Viresh Kumar

On 18 March 2013 20:53, Viresh Kumar  wrote:
> queue_work() queues work on current cpu. This may wake up an idle CPU, which 
> is
> actually not required.
>
> Some of these works can be processed by any CPU and so we must select a 
> non-idle
> CPU here. The initial idea was to modify implementation of queue_work(), but
> that may end up breaking lots of kernel code that would be a nightmare to 
> debug.
>
> So, we finalized to adding new workqueue interfaces, for works that don't 
> depend
> on a cpu to execute them.

Fixup:

commit 6e5b6fac3353f5e4e5490931b3eb6d3fd7864ca0
Author: Viresh Kumar 
Date:   Tue Mar 19 10:34:20 2013 +0530

fixup! workqueue: Add helpers to schedule work on any cpu
---
 kernel/workqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index cf9c570..68daf50 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1242,7 +1242,7 @@ retry:
if (!(wq->flags & WQ_UNBOUND)) {
if (cpu == WORK_CPU_UNBOUND) {
if (on_any_cpu)
-   cpu = sched_select_cpu(0);
+   cpu = sched_select_non_idle_cpu(0);
else
cpu = raw_smp_processor_id();
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving

2013-03-19 Thread Viresh Kumar

On 19 March 2013 18:00, Peter Zijlstra  wrote:
> On Mon, 2013-03-18 at 20:53 +0530, Viresh Kumar wrote:
>> +/*
>> + * This routine returns the nearest non-idle cpu. It accepts a
>> bitwise OR of
>> + * SD_* flags present in linux/sched.h. If the local CPU isn't idle,
>> it is
>> + * returned back. If it is idle, then we must look for another CPU
>> which have
>> + * all the flags passed as argument as set.
>> + */
>> +int sched_select_cpu(unsigned int sd_flags)
>
> So the only problem I have is that you expose the SD_flags to !sched
> code. The only proposed user is in the workqueue code and passes 0.
>
> Why not leave out the sd_flags argument and introduce it once you start
> using it; at which point we can argue on the interface.

Yes, that can be done. Will fix it up.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving

2013-03-19 Thread Viresh Kumar

On 19 March 2013 18:22, Viresh Kumar  wrote:
> On 19 March 2013 18:00, Peter Zijlstra  wrote:
>> Why not leave out the sd_flags argument and introduce it once you start
>> using it; at which point we can argue on the interface.
>
> Yes, that can be done. Will fix it up.

Fixup:

commit 77927939224520cbb0ac47270d3458bedffe42c4
Author: Viresh Kumar 
Date:   Tue Mar 19 18:50:37 2013 +0530

fixup! sched: Create sched_select_non_idle_cpu() to give preferred
CPU for power saving
---
 include/linux/sched.h | 6 +++---
 kernel/sched/core.c   | 6 +-
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index db6655a..37eb1dd 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -231,9 +231,9 @@ extern void init_idle_bootup_task(struct task_struct *idle);
 extern int runqueue_is_locked(int cpu);

 #ifdef CONFIG_SMP
-extern int sched_select_non_idle_cpu(unsigned int sd_flags);
+extern int sched_select_non_idle_cpu(void);
 #else
-static inline int sched_select_non_idle_cpu(unsigned int sd_flags)
+static inline int sched_select_non_idle_cpu(void)
 {
return smp_processor_id();
 }
@@ -251,7 +251,7 @@ extern void set_cpu_sd_state_idle(void);
  * selecting an idle cpu will add more delays to the timers than intended
  * (as that cpu's timer base may not be uptodate wrt jiffies etc).
  */
-#define get_nohz_timer_target() sched_select_non_idle_cpu(0)
+#define get_nohz_timer_target() sched_select_non_idle_cpu()
 #else
 static inline void nohz_balance_enter_idle(int cpu) { }
 static inline void set_cpu_sd_state_idle(void) { }
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0265a5e..f597d2b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -627,7 +627,7 @@ void sched_avg_update(struct rq *rq)
  * returned back. If it is idle, then we must look for another CPU which have
  * all the flags passed as argument as set.
  */
-int sched_select_non_idle_cpu(unsigned int sd_flags)
+int sched_select_non_idle_cpu(void)
 {
struct sched_domain *sd;
int cpu = smp_processor_id();
@@ -639,10 +639,6 @@ int sched_select_non_idle_cpu(unsigned int sd_flags)

rcu_read_lock();
for_each_domain(cpu, sd) {
-/* If sd doesnt' have sd_flags set skip sd. */
-   if ((sd->flags & sd_flags) != sd_flags)
-   continue;
-
for_each_cpu(i, sched_domain_span(sd)) {
if (i == cpu)
continue;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu

2013-03-19 Thread Viresh Kumar

On 19 March 2013 10:45, Viresh Kumar  wrote:
> On 18 March 2013 20:53, Viresh Kumar  wrote:
>> queue_work() queues work on current cpu. This may wake up an idle CPU, which 
>> is
>> actually not required.
>>
>> Some of these works can be processed by any CPU and so we must select a 
>> non-idle
>> CPU here. The initial idea was to modify implementation of queue_work(), but
>> that may end up breaking lots of kernel code that would be a nightmare to 
>> debug.
>>
>> So, we finalized to adding new workqueue interfaces, for works that don't 
>> depend
>> on a cpu to execute them.

Another fixup:

commit 8753c6d936faa6e3233cbf44a55913d05de05683
Author: Viresh Kumar 
Date:   Tue Mar 19 18:50:59 2013 +0530

fixup! workqueue: Add helpers to schedule work on any cpu
---
 kernel/workqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 68daf50..4e023ab 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1242,7 +1242,7 @@ retry:
if (!(wq->flags & WQ_UNBOUND)) {
if (cpu == WORK_CPU_UNBOUND) {
if (on_any_cpu)
-   cpu = sched_select_non_idle_cpu(0);
+   cpu = sched_select_non_idle_cpu();
else
cpu = raw_smp_processor_id();
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 4/4] cpufreq: Add Kconfig option to enable/disable have_multiple_policies

2013-03-19 Thread Viresh Kumar

On 20 March 2013 05:50, Rafael J. Wysocki  wrote:
> On Thursday, March 14, 2013 08:39:55 AM Viresh Kumar wrote:
>> On 14 March 2013 03:11, Rafael J. Wysocki  wrote:
>> > On Tuesday, March 12, 2013 08:55:12 AM Viresh Kumar wrote:
>> >> On 12 March 2013 07:38, Rafael J. Wysocki  wrote:
>> >> > One more question before I apply it.
>> >> >
>> >> > Is there any architecture/platform that will set
>> >> > CONFIG_CPU_FREQ_HAVE_MULTIPLE_POLICIES and keep have_multiple_policies 
>> >> > unset
>> >> > at the same time?
>> >>
>> >> No, they are redundant. That's why i have been forcing to drop this patch.
>> >
>> > I see.
>> >
>> > What about having the Kconfig option alone, however?
>>
>> Even that is not enough. We build multiplatform kernels and so need a
>> variable to be set by platform.
>
> Which means the Kconfig option and the field are not redundant in fact.

Yes. Redundant was the wrong word. Actually Kconfig option is just not required
as we can work efficiently without it.

> But do we need the field to reside in the policy structure?  It looks like
> it may just be a global bool variable

Yes. It is not per policy but per cpufreq driver. And this can be done
by sharing
a function from cpufreq core to driver. But when do you want me to
call this function
(which will set this global variable). If we do it from init, then we
will end up calling it
again and again. Then it has to be called before calling
cpufreq_register_driver(),
as init() gets called internally.

> (in which case the Kconfig option could be dropped IMO).

We are aligned now :)

> Is there any particular reason to put that thing into
> struct cpufreq_policy?

Just the problem i mentioned to you.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 4/4] cpufreq: Add Kconfig option to enable/disable have_multiple_policies

2013-03-19 Thread Viresh Kumar

On 20 March 2013 09:53, Viresh Kumar  wrote:

> But when do you want me to call this function

Guess what, i got answer to this question: struct cpufreq_driver :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 2/4] cpufreq: governor: Implement per policy instances of governors

2013-03-19 Thread Viresh Kumar

On 4 March 2013 13:07, Viresh Kumar  wrote:
> Currently, there can't be multiple instances of single governor_type. If we 
> have
> a multi-package system, where we have multiple instances of struct policy (per
> package), we can't have multiple instances of same governor. i.e. We can't 
> have
> multiple instances of ondemand governor for multiple packages.
>
> Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/
> governor-name/. Which again reflects that there can be only one instance of a
> governor_type in the system.
>
> This is a bottleneck for multicluster system, where we want different packages
> to use same governor type, but with different tunables.
>
> This patch uses the infrastructure provided by earlier patch and implements
> init/exit routines for ondemand and conservative governors.
>
> Signed-off-by: Viresh Kumar 

As discussed in other patch thread, i dropped "cpufreq: Add Kconfig option
to enable/disable have_multiple_policies" patch and following is the fixup
to this patch:

I have queued all patches i had for 3.10 here:

http://git.linaro.org/gitweb?p=people/vireshk/linux.git;a=shortlog;h=refs/heads/for-3.10

commit f02fca9a2478088c4f7dadf82d998ae007a56285
Author: Viresh Kumar 
Date:   Wed Mar 20 10:50:33 2013 +0530

fixup! cpufreq: governor: Implement per policy instances of governors
---
 drivers/cpufreq/cpufreq.c |  8 
 include/linux/cpufreq.h   | 22 --
 2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index a843855..3d83b02 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -128,6 +128,14 @@ void disable_cpufreq(void)
 static LIST_HEAD(cpufreq_governor_list);
 static DEFINE_MUTEX(cpufreq_governor_mutex);

+struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy)
+{
+   if (cpufreq_driver->have_multiple_policies)
+   return &policy->kobj;
+   else
+   return cpufreq_global_kobject;
+}
+
 static struct cpufreq_policy *__cpufreq_cpu_get(unsigned int cpu, bool sysfs)
 {
struct cpufreq_policy *data;
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 6e1abd2..805c4d3 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -107,11 +107,6 @@ struct cpufreq_policy {
unsigned intpolicy; /* see above */
struct cpufreq_governor *governor; /* see below */
void*governor_data;
-   /* This should be set by init() of platforms having multiple
-* clock-domains, i.e.  supporting multiple policies. With this sysfs
-* directories of governor would be created in cpu/cpu/cpufreq/
-* directory */
-   boolhave_multiple_policies;

struct work_struct  update; /* if update_policy() needs to be
 * called, but you're in IRQ context */
@@ -139,15 +134,6 @@ static inline bool policy_is_shared(struct
cpufreq_policy *policy)
return cpumask_weight(policy->cpus) > 1;
 }

-static inline struct kobject *
-get_governor_parent_kobj(struct cpufreq_policy *policy)
-{
-   if (policy->have_multiple_policies)
-   return &policy->kobj;
-   else
-   return cpufreq_global_kobject;
-}
-
 / cpufreq transition notifiers ***/

 #define CPUFREQ_PRECHANGE  (0)
@@ -245,6 +231,13 @@ struct cpufreq_driver {
struct module   *owner;
charname[CPUFREQ_NAME_LEN];
u8  flags;
+   /*
+* This should be set by init() of platforms having multiple
+* clock-domains, i.e.  supporting multiple policies. With this sysfs
+* directories of governor would be created in cpu/cpu/cpufreq/
+* directory
+*/
+   boolhave_multiple_policies;

/* needed by all drivers */
int (*init) (struct cpufreq_policy *policy);
@@ -329,6 +322,7 @@ const char *cpufreq_get_current_driver(void);
  */
 int cpufreq_get_policy(struct cpufreq_policy *policy, unsigned int cpu);
 int cpufreq_update_policy(unsigned int cpu);
+struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy);

 #ifdef CONFIG_CPU_FREQ
 /* query the current CPU frequency (in kHz). If zero, cpufreq
couldn't detect it */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] intel-pstate: Use #defines instead of hard-coded values.

2013-03-20 Thread Viresh Kumar

On Wed, Mar 20, 2013 at 7:51 PM, Konrad Rzeszutek Wilk
 wrote:
> They are defined in coreboot (MSR_PLATFORM) and the other
> one is already defined in msr-index.h.
>
> Lets use those.
>
> CC: rafael.j.wyso...@intel.com
> CC: dirk.j.brande...@intel.com
> Signed-off-by: Konrad Rzeszutek Wilk 
> ---
>  arch/x86/include/uapi/asm/msr-index.h | 1 +
>  drivers/cpufreq/intel_pstate.c| 6 +++---
>  2 files changed, 4 insertions(+), 3 deletions(-)

Acked-by: Viresh Kumar 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 4/4] timer: Migrate running timer

2013-03-20 Thread Viresh Kumar

On 27 November 2012 19:17, Steven Rostedt  wrote:
> On Tue, 2012-11-06 at 16:08 +0530, Viresh Kumar wrote:

>> diff --git a/kernel/timer.c b/kernel/timer.c
>> @@ -729,6 +730,12 @@ __mod_timer(struct timer_list *timer, unsigned long 
>> expires,
>>
>>   base = lock_timer_base(timer, &flags);
>>
>> + if (timer->sched_del) {
>> + /* Don't schedule it again, as it is getting deleted */
>> + ret = -EBUSY;
>> + goto out_unlock;
>> + }
>> +
>>   ret = detach_if_pending(timer, base, false);
>>   if (!ret && pending_only)
>>   goto out_unlock;
>> @@ -746,21 +753,12 @@ __mod_timer(struct timer_list *timer, unsigned long 
>> expires,
>>   new_base = per_cpu(tvec_bases, cpu);
>>
>>   if (base != new_base) {
>> - /*
>> -  * We are trying to schedule the timer on the local CPU.
>> -  * However we can't change timer's base while it is running,
>> -  * otherwise del_timer_sync() can't detect that the timer's
>> -  * handler yet has not finished. This also guarantees that
>> -  * the timer is serialized wrt itself.
>> -  */
>> - if (likely(base->running_timer != timer)) {
>> - /* See the comment in lock_timer_base() */
>> - timer_set_base(timer, NULL);
>> - spin_unlock(&base->lock);
>> - base = new_base;
>> - spin_lock(&base->lock);
>> - timer_set_base(timer, base);
>> - }
>> + /* See the comment in lock_timer_base() */
>> + timer_set_base(timer, NULL);
>> + spin_unlock(&base->lock);
>> + base = new_base;
>> + spin_lock(&base->lock);
>> + timer_set_base(timer, base);
>>   }

> I don't think this is good enough. For one thing, it doesn't handle
> try_to_del_timer_sync() or even del_timer_sync() for that matter. As
> that may return success when the timer happens to be running on another
> CPU.
>
> We have this:
>
> CPU0CPU1
> 
>timerA (running)
>mod_timer(timerA)
>[ migrate to CPU2 ]
>release timer base lock
>del_timer_sync(timerA)
>timer->sched_del = true
>try_to_del_timer_sync(timerA)
> base(CPU2)->timer != timerA
> [TRUE!]
>   timerA (finishes)
>
> Fail!

Hi Steven/Thomas,

I came back to this patch after completing some other stuff and posting
wq part of this patchset separately.

I got your point and understand how this would fail.

@Thomas: I need your opinion first. Do you like this concept of migrating
running timer or not? Or you see some basic problem with this concept?

If no (i.e. i can go ahead with another version), then i have some solution to
fix earlier problems reported by Steven:

The problem lies with del_timer_sync() which just checks
base->running_timer != timer to check if timer is currently running or not.

What if we add another variable in struct timer_list, that will store if we are
running timer callback or not. And so, before we call callback in timer core,
we will set this variable and will reset it after finishing callback.

del_timer_sync() will have something like:

if (base->running_timer != timer)
remove timer and return;
else if (timer->running_callback)
go back to its loop...

So, with my existing patch + this change, del_timer_sync() will not return
back unless the callback is completed on CPU0.

But what can happen now is base->running_timer == timer can be true
for two cpus simultaneously cpu0 (running callback) and cpu2 (running hardware
timer). Will that cause any issues?

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpufreq/intel_pstate: Add function to check that all MSR's are valid

2013-03-20 Thread Viresh Kumar

On Wed, Mar 20, 2013 at 9:47 PM,   wrote:
> From: Dirk Brandewie 
>
> Some VMs seem to try to implement some MSRs but not all the registers
> the driver needs.  Check to make sure all the MSR that we need are
> available. If any of the required MSRs are not available refuse to
> load.
>
> Signed-off-by: Dirk Brandewie 
> ---
>  drivers/cpufreq/intel_pstate.c |   26 ++
>  1 files changed, 26 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index f6dd1e7..cd9c5f4 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -752,6 +752,29 @@ static struct cpufreq_driver intel_pstate_driver = {
>
>  static int __initdata no_load;
>
> +static int intel_pstate_msrs_not_valid(void)
> +{
> +   /* Check that all the msr's we are using are valid. */
> +   u64 aperf, mperf, tmp;
> +
> +   rdmsrl(MSR_IA32_APERF, aperf);
> +   rdmsrl(MSR_IA32_MPERF, mperf);
> +
> +   if (!intel_pstate_min_pstate() ||
> +   !intel_pstate_max_pstate() ||
> +   !intel_pstate_turbo_pstate())
> +   return -ENODEV;
> +
> +   rdmsrl(MSR_IA32_APERF, tmp);
> +   if (!(tmp - aperf))
> +   return -ENODEV;
> +
> +   rdmsrl(MSR_IA32_MPERF, tmp);
> +   if (!(tmp - mperf))
> +   return -ENODEV;
> +
> +   return 0;
> +}

Add blank line here.

>  static int __init intel_pstate_init(void)
>  {
> int cpu, rc = 0;
> @@ -764,6 +787,9 @@ static int __init intel_pstate_init(void)
> if (!id)
> return -ENODEV;
>
> +   if (intel_pstate_msrs_not_valid())
> +   return -ENODEV;
> +
> pr_info("Intel P-state driver initializing.\n");
>
> all_cpu_data = vmalloc(sizeof(void *) * num_possible_cpus());

Acked-by: Viresh Kumar 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] DMA: OF: Check properties value before running be32_to_cpup() on it

2013-03-21 Thread Viresh Kumar

On 21 March 2013 15:16, Vinod Koul  wrote:
> On Fri, Mar 15, 2013 at 02:18:20PM +0530, Viresh Kumar wrote:
>> In of_dma_controller_register() routine we are calling of_get_property() as 
>> an
>> parameter to be32_to_cpup(). In case the property doesn't exist we will get a
>> crash.
>>
>> This patch changes this code to check if we got a valid property first and 
>> then
>> runs be32_to_cpup() on it.
>>
>> Signed-off-by: Viresh Kumar 
>> ---
>>
>> My mails are broken, i have pushed this patch here:
> I noticed you used git send-email. Usually it will send patch properly
> independent of whatever MUA you use.

Probably not!! Its the famous (Infamous) Microsoft exchange server working in
background and it breaks mails without treating mails coming from git send-email
specially :)

> So I have the patch and its applied now :)

I have seen this kind of discrimination on breaking patches based on the size of
patch. If its very small (like this one), you may get a unbroken patch
but if the size
is a bit large then nobody can save you :)

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] dw_dmac: don't wait for FIFO_EMPTY endlessly in dwc_chan_pause

2013-03-21 Thread Viresh Kumar

On 21 March 2013 15:19, Andy Shevchenko
 wrote:
> When we pause the channel after transfer is completed we might stuck in the
> dwc_chan_pause() because the FIFO_EMPTY flag will never be asserted. To avoid
> the endless loop we introduce a timeout here (*). The proper solution is to
> somehow get the residue in FIFO and avoid busyloop when transfer is done, but
> this task is not simple and fast.
>
> Unfortunately we can't use cpu_relax() in conjunction with jiffies checker, 
> due
> to we have interrupts disabled by spin_lock_irqsave() and there is a big 
> chance
> that no interrupts will come to update the jiffies..
>
> (*) The worst case is
> AHB write * FIFO size / hclk = 5.12 us,
> where
> AHB write = 2 cycles,
> hclk = 100 MHz,
> burst size = 1 byte,
> FIFO size = 256 bytes.
> The proposed 40us timeout might be considered as a big one, though we 
> enter
> to that state only when we have the transfer already completed.
>
> Signed-off-by: Andy Shevchenko 
> ---
>  drivers/dma/dw_dmac.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/dma/dw_dmac.c b/drivers/dma/dw_dmac.c
> index 43a5329..43e2e89 100644
> --- a/drivers/dma/dw_dmac.c
> +++ b/drivers/dma/dw_dmac.c
> @@ -1030,10 +1030,11 @@ set_runtime_config(struct dma_chan *chan, struct 
> dma_slave_config *sconfig)
>  static inline void dwc_chan_pause(struct dw_dma_chan *dwc)
>  {
> u32 cfglo = channel_readl(dwc, CFG_LO);
> +   unsigned int count = 20;/* timeout iterations */
>
> channel_writel(dwc, CFG_LO, cfglo | DWC_CFGL_CH_SUSP);
> -   while (!(channel_readl(dwc, CFG_LO) & DWC_CFGL_FIFO_EMPTY))
> -   cpu_relax();
> +   while (!(channel_readl(dwc, CFG_LO) & DWC_CFGL_FIFO_EMPTY) && count--)
> +   udelay(2);

Acked-by: Viresh Kumar 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu

2013-03-21 Thread Viresh Kumar

On 21 March 2013 05:42, Tejun Heo  wrote:
> On Mon, Mar 18, 2013 at 08:53:25PM +0530, Viresh Kumar wrote:

> For some reason, I've been always unsure about this patchset.  I've
> been thinking about it past few days and I think I now know why.

I knew this since the beginning and thought power numbers i showed might
change your view. My hard luck :)

> I can see that strategy like this being useful for timer, and
> impressive power saving numbers BTW - we definitely want that.  While
> workqueue shares a lot of attributes with timers, there's one
> important difference - scheduler is involved in work item executions.

Yes, scheduler is involved for queuing works queued on a UNBOUND wq.

> Work item scheduling for per-cpu work items artificially removes the
> choice of CPU from the scheduler with the assumption that such
> restrictions would lead to lower overall overhead.

Yes.. We aren't touching them here and we can't :)

> There is another
> variant of workqueue - WQ_UNBOUND - when such assumption wouldn't hold
> too well and it would be best to give the scheduler full latitude
> regarding scheduling work item execution including choice of CPU.

I liked this point of yours and what you said is correct too. But there is a
difference here with our approach.

I here see two parallel solutions:

1. What we proposed:
- Add another wq api and used sched_select_non_idle_cpu() to choose
the right cpu
- Fix user drivers we care about

2. What you proposed:
- Fix user drivers we care about by using UNBOUNDED workqueues rather than
system_wq or other private wq's. And let the scheduler do everything. Right?

Now there are two things i am worried about.

A.
About the difference in behavior of scheduler and sched_select_non_idle_cpu().
Scheduler will treat this work as any normal task and can schedule it anywhere
based on what it feels is the right place and may still wake up an idle one for
load balancing.

In our approach, We aren't switching CPU for a work if that cpu (or system) is
busy enough, like in case of high IOPS for block layer. We will find that cpu as
busy, most of the time on such situations and so we will stay where we were.

In case system is fairly idle, then also we stay on the same cpu.
The only time we migrate is when current cpu is idle but the whole system
isn't.

B.
It is sort of difficult to teach users about BOUND and UNBOUND workqueues
and people have used them without thinking too much about them since some
time. We need a straightforward API to make this more transparent. And
queue_work_on_any_cpu() was a good candidate here. I am not talking about
its implementation but about the interface.

Though in both suggestions we need to fix user drivers one by one to use
queue_work_on_any_cpu() or use WQ_UNBOUND..

> Now, this patchset, kinda sorta adds back CPU selection by scheduler
> in an ad-hoc way, right?

Yes.

> If we're gonna do that, why not just make it
> unbound workqueues?  Then, the scheduler would have full discretion
> over them and we don't need to implement this separate ad-hoc thing on
> the side.  It's the roundaboutness that bothers me.

I had my own reasons as explained earlier.

> I'm not sure about other workqueues but the block one can be very
> performance sensitive on machines with high IOPS and it would
> definitely be advisable to keep it CPU-affine unless power consumption
> is the overriding concern, so how about introducing another workqueue
> flag, say WQ_UNBOUND_FOR_POWER (it's terrible, please come up with
> something better), which is WQ_UNBOUND on configurations where this
> matters and becomes noop if not?

Maybe people wouldn't suffer much because of the reasons i told you earlier.
On a busy system we will never switch cpus.

This static configuration might not work well with multiplatform images.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-04 Thread Viresh Kumar

Currently, there can't be multiple instances of single governor_type. If we have
a multi-package system, where we have multiple instances of struct policy (per
package), we can't have multiple instances of same governor. i.e. We can't have
multiple instances of ondemand governor for multiple packages.

Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/
governor-name/. Which again reflects that there can be only one instance of a
governor_type in the system.

This is a bottleneck for multicluster system, where we want different packages
to use same governor type, but with different tunables.

This patchset is inclined towards fixing this issue.

Viresh Kumar (4):
  cpufreq: Don't check cpu_online(policy->cpu)
  cpufreq: stats: Get rid of CPUFREQ_STATDEVICE_ATTR
  cpufreq: Add per policy governor-init/exit infrastructure
  cpufreq: governor: Implement per policy instances of governors

 drivers/cpufreq/cpufreq.c  |  41 ---
 drivers/cpufreq/cpufreq_conservative.c | 142 +--
 drivers/cpufreq/cpufreq_governor.c | 140 +-
 drivers/cpufreq/cpufreq_governor.h |  42 ---
 drivers/cpufreq/cpufreq_ondemand.c | 205 +++--
 drivers/cpufreq/cpufreq_stats.c|  18 +--
 drivers/cpufreq/cpufreq_userspace.c|   2 -
 drivers/cpufreq/freq_table.c   |   6 -
 include/linux/cpufreq.h|  26 +
 9 files changed, 346 insertions(+), 276 deletions(-)

-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/4] cpufreq: Don't check cpu_online(policy->cpu)

2013-02-04 Thread Viresh Kumar

policy->cpu or cpus in policy->cpus can't be offline anymore. And so we don't
need to check if they are online or not.

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/cpufreq.c   | 17 +++--
 drivers/cpufreq/cpufreq_governor.c  |  2 +-
 drivers/cpufreq/cpufreq_userspace.c |  2 --
 drivers/cpufreq/freq_table.c|  6 --
 4 files changed, 4 insertions(+), 23 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 9656420..e619f4f 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -76,10 +76,6 @@ static int lock_policy_rwsem_##mode  
\
int policy_cpu = per_cpu(cpufreq_policy_cpu, cpu);  \
BUG_ON(policy_cpu == -1);   \
down_##mode(&per_cpu(cpu_policy_rwsem, policy_cpu));\
-   if (unlikely(!cpu_online(cpu))) {   \
-   up_##mode(&per_cpu(cpu_policy_rwsem, policy_cpu));  \
-   return -1;  \
-   }   \
\
return 0;   \
 }
@@ -719,8 +715,6 @@ static int cpufreq_add_dev_symlink(unsigned int cpu,
 
if (j == cpu)
continue;
-   if (!cpu_online(j))
-   continue;
 
pr_debug("CPU %u already managed, adding link\n", j);
managed_policy = cpufreq_cpu_get(cpu);
@@ -777,8 +771,6 @@ static int cpufreq_add_dev_interface(unsigned int cpu,
 
spin_lock_irqsave(&cpufreq_driver_lock, flags);
for_each_cpu(j, policy->cpus) {
-   if (!cpu_online(j))
-   continue;
per_cpu(cpufreq_cpu_data, j) = policy;
per_cpu(cpufreq_policy_cpu, j) = policy->cpu;
}
@@ -1005,11 +997,8 @@ static void update_policy_cpu(struct cpufreq_policy 
*policy, unsigned int cpu)
policy->last_cpu = policy->cpu;
policy->cpu = cpu;
 
-   for_each_cpu(j, policy->cpus) {
-   if (!cpu_online(j))
-   continue;
+   for_each_cpu(j, policy->cpus)
per_cpu(cpufreq_policy_cpu, j) = cpu;
-   }
 
 #ifdef CONFIG_CPU_FREQ_TABLE
cpufreq_frequency_table_update_policy_cpu(policy);
@@ -1468,7 +1457,7 @@ int __cpufreq_driver_target(struct cpufreq_policy *policy,
if (target_freq == policy->cur)
return 0;
 
-   if (cpu_online(policy->cpu) && cpufreq_driver->target)
+   if (cpufreq_driver->target)
retval = cpufreq_driver->target(policy, target_freq, relation);
 
return retval;
@@ -1506,7 +1495,7 @@ int __cpufreq_driver_getavg(struct cpufreq_policy 
*policy, unsigned int cpu)
if (cpufreq_disabled())
return ret;
 
-   if (!(cpu_online(cpu) && cpufreq_driver->getavg))
+   if (!cpufreq_driver->getavg)
return 0;
 
policy = cpufreq_cpu_get(policy->cpu);
diff --git a/drivers/cpufreq/cpufreq_governor.c 
b/drivers/cpufreq/cpufreq_governor.c
index 79795c4..e4a306c 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -225,7 +225,7 @@ int cpufreq_governor_dbs(struct dbs_data *dbs_data,
 
switch (event) {
case CPUFREQ_GOV_START:
-   if ((!cpu_online(cpu)) || (!policy->cur))
+   if (!policy->cur)
return -EINVAL;
 
mutex_lock(&dbs_data->mutex);
diff --git a/drivers/cpufreq/cpufreq_userspace.c 
b/drivers/cpufreq/cpufreq_userspace.c
index c8c3d29..bbeb9c0 100644
--- a/drivers/cpufreq/cpufreq_userspace.c
+++ b/drivers/cpufreq/cpufreq_userspace.c
@@ -118,8 +118,6 @@ static int cpufreq_governor_userspace(struct cpufreq_policy 
*policy,
 
switch (event) {
case CPUFREQ_GOV_START:
-   if (!cpu_online(cpu))
-   return -EINVAL;
BUG_ON(!policy->cur);
mutex_lock(&userspace_mutex);
 
diff --git a/drivers/cpufreq/freq_table.c b/drivers/cpufreq/freq_table.c
index aa5bd39..d7a7966 100644
--- a/drivers/cpufreq/freq_table.c
+++ b/drivers/cpufreq/freq_table.c
@@ -63,9 +63,6 @@ int cpufreq_frequency_table_verify(struct cpufreq_policy 
*policy,
pr_debug("request for verification of policy (%u - %u kHz) for cpu 
%u\n",
policy->min, policy->max, policy->cpu);
 
-   if (!cpu_online(policy->cpu))
-   return -EINVAL;
-
cpufreq_verify_within_limits(policy, policy->cpuinfo.min_freq,
 policy->cpuinfo.max_freq);
 
@@ -1

[PATCH 2/4] cpufreq: stats: Get rid of CPUFREQ_STATDEVICE_ATTR

2013-02-04 Thread Viresh Kumar

Macro "CPUFREQ_STATDEVICE_ATTR" is defined local to cpufreq_stats.c file and is
almost a copy of the generic version present in cpufreq.h file. Lets use the
generic version instead.

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/cpufreq_stats.c | 18 ++
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
index 572124c..a2dee4c 100644
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -24,12 +24,6 @@
 
 static spinlock_t cpufreq_stats_lock;
 
-#define CPUFREQ_STATDEVICE_ATTR(_name, _mode, _show) \
-static struct freq_attr _attr_##_name = {\
-   .attr = {.name = __stringify(_name), .mode = _mode, }, \
-   .show = _show,\
-};
-
 struct cpufreq_stats {
unsigned int cpu;
unsigned int total_trans;
@@ -136,17 +130,17 @@ static ssize_t show_trans_table(struct cpufreq_policy 
*policy, char *buf)
return PAGE_SIZE;
return len;
 }
-CPUFREQ_STATDEVICE_ATTR(trans_table, 0444, show_trans_table);
+cpufreq_freq_attr_ro(trans_table);
 #endif
 
-CPUFREQ_STATDEVICE_ATTR(total_trans, 0444, show_total_trans);
-CPUFREQ_STATDEVICE_ATTR(time_in_state, 0444, show_time_in_state);
+cpufreq_freq_attr_ro(total_trans);
+cpufreq_freq_attr_ro(time_in_state);
 
 static struct attribute *default_attrs[] = {
-   &_attr_total_trans.attr,
-   &_attr_time_in_state.attr,
+   &total_trans.attr,
+   &time_in_state.attr,
 #ifdef CONFIG_CPU_FREQ_STAT_DETAILS
-   &_attr_trans_table.attr,
+   &trans_table.attr,
 #endif
NULL
 };
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/4] cpufreq: Add per policy governor-init/exit infrastructure

2013-02-04 Thread Viresh Kumar

Currently, there can't be multiple instances of single governor_type. If we have
a multi-package system, where we have multiple instances of struct policy (per
package), we can't have multiple instances of same governor. i.e. We can't have
multiple instances of ondemand governor for multiple packages.

Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/
governor-name/. Which again reflects that there can be only one instance of a
governor_type in the system.

This is a bottleneck for multicluster system, where we want different packages
to use same governor type, but with different tunables.

This patch is inclined towards providing this infrastructure. Because we are
required to allocate governor's resources dynamically now, we must do it at
policy creation and end. And so got CPUFREQ_GOV_POLICY_INIT/EXIT.

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/cpufreq.c | 20 +---
 include/linux/cpufreq.h   |  9 ++---
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index e619f4f..1ae78d4 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1076,6 +1076,7 @@ static int __cpufreq_remove_dev(struct device *dev, 
struct subsys_interface *sif
 
/* If cpu is last user of policy, free policy */
if (cpus == 1) {
+   __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT);
lock_policy_rwsem_write(cpu);
kobj = &data->kobj;
cmp = &data->kobj_unregister;
@@ -1655,7 +1656,7 @@ EXPORT_SYMBOL(cpufreq_get_policy);
 static int __cpufreq_set_policy(struct cpufreq_policy *data,
struct cpufreq_policy *policy)
 {
-   int ret = 0;
+   int ret = 0, failed = 1;
 
pr_debug("setting new policy for CPU %u: %u - %u kHz\n", policy->cpu,
policy->min, policy->max);
@@ -1709,18 +1710,31 @@ static int __cpufreq_set_policy(struct cpufreq_policy 
*data,
pr_debug("governor switch\n");
 
/* end old governor */
-   if (data->governor)
+   if (data->governor) {
__cpufreq_governor(data, CPUFREQ_GOV_STOP);
+   __cpufreq_governor(data,
+   CPUFREQ_GOV_POLICY_EXIT);
+   }
 
/* start new governor */
data->governor = policy->governor;
-   if (__cpufreq_governor(data, CPUFREQ_GOV_START)) {
+   if (!__cpufreq_governor(data, CPUFREQ_GOV_POLICY_INIT)) 
{
+   if (!__cpufreq_governor(data, 
CPUFREQ_GOV_START))
+   failed = 0;
+   else
+   __cpufreq_governor(data,
+   
CPUFREQ_GOV_POLICY_EXIT);
+   }
+
+   if (failed) {
/* new governor failed, so re-start old one */
pr_debug("starting governor %s failed\n",
data->governor->name);
if (old_gov) {
data->governor = old_gov;
__cpufreq_governor(data,
+   
CPUFREQ_GOV_POLICY_INIT);
+   __cpufreq_governor(data,
   CPUFREQ_GOV_START);
}
ret = -EINVAL;
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index a22944c..3b822ce 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -106,6 +106,7 @@ struct cpufreq_policy {
 * governors are used */
unsigned intpolicy; /* see above */
struct cpufreq_governor *governor; /* see below */
+   void*governor_data;
 
struct work_struct  update; /* if update_policy() needs to be
 * called, but you're in IRQ context */
@@ -178,9 +179,11 @@ static inline unsigned long cpufreq_scale(unsigned long 
old, u_int div, u_int mu
  *  CPUFREQ GOVERNORS*
  */
 
-#define CPUFREQ_GOV_START  1
-#define CPUFREQ_GOV_STOP   2
-#define CPUFREQ_GOV_LIMITS 3
+#define CPUFREQ_GOV_START  1
+#define CPUFREQ_GOV_STOP   2
+#define CPUFREQ_GOV_LIMITS 3
+#define CPUFREQ_GOV_POLICY_INIT4
+#define

[PATCH 4/4] cpufreq: governor: Implement per policy instances of governors

2013-02-04 Thread Viresh Kumar

Currently, there can't be multiple instances of single governor_type. If we have
a multi-package system, where we have multiple instances of struct policy (per
package), we can't have multiple instances of same governor. i.e. We can't have
multiple instances of ondemand governor for multiple packages.

Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/
governor-name/. Which again reflects that there can be only one instance of a
governor_type in the system.

This is a bottleneck for multicluster system, where we want different packages
to use same governor type, but with different tunables.

This patch uses the infrastructure provided by earlier patch and implements
init/exit routines for ondemand and conservative governors.

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/cpufreq.c  |   4 -
 drivers/cpufreq/cpufreq_conservative.c | 142 +--
 drivers/cpufreq/cpufreq_governor.c | 138 +-
 drivers/cpufreq/cpufreq_governor.h |  42 ---
 drivers/cpufreq/cpufreq_ondemand.c | 205 +++--
 include/linux/cpufreq.h|  19 +--
 6 files changed, 314 insertions(+), 236 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 1ae78d4..7aacfbf 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1551,9 +1551,6 @@ static int __cpufreq_governor(struct cpufreq_policy 
*policy,
policy->cpu, event);
ret = policy->governor->governor(policy, event);
 
-   if (!policy->governor->initialized && (event == CPUFREQ_GOV_START))
-   policy->governor->initialized = 1;
-
/* we keep one module reference alive for
each CPU governed by this CPU */
if ((event != CPUFREQ_GOV_START) || ret)
@@ -1577,7 +1574,6 @@ int cpufreq_register_governor(struct cpufreq_governor 
*governor)
 
mutex_lock(&cpufreq_governor_mutex);
 
-   governor->initialized = 0;
err = -EBUSY;
if (__find_governor(governor->name) == NULL) {
err = 0;
diff --git a/drivers/cpufreq/cpufreq_conservative.c 
b/drivers/cpufreq/cpufreq_conservative.c
index e8bb915..6b13f9f 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -31,17 +32,8 @@
 #define DEF_SAMPLING_DOWN_FACTOR   (1)
 #define MAX_SAMPLING_DOWN_FACTOR   (10)
 
-static struct dbs_data cs_dbs_data;
 static DEFINE_PER_CPU(struct cs_cpu_dbs_info_s, cs_cpu_dbs_info);
 
-static struct cs_dbs_tuners cs_tuners = {
-   .up_threshold = DEF_FREQUENCY_UP_THRESHOLD,
-   .down_threshold = DEF_FREQUENCY_DOWN_THRESHOLD,
-   .sampling_down_factor = DEF_SAMPLING_DOWN_FACTOR,
-   .ignore_nice = 0,
-   .freq_step = 5,
-};
-
 /*
  * Every sampling_rate, we check, if current idle time is less than 20%
  * (default), then we try to increase frequency Every sampling_rate *
@@ -55,24 +47,26 @@ static void cs_check_cpu(int cpu, unsigned int load)
 {
struct cs_cpu_dbs_info_s *dbs_info = &per_cpu(cs_cpu_dbs_info, cpu);
struct cpufreq_policy *policy = dbs_info->cdbs.cur_policy;
+   struct dbs_data *dbs_data = policy->governor_data;
+   struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
unsigned int freq_target;
 
/*
 * break out if we 'cannot' reduce the speed as the user might
 * want freq_step to be zero
 */
-   if (cs_tuners.freq_step == 0)
+   if (cs_tuners->freq_step == 0)
return;
 
/* Check for frequency increase */
-   if (load > cs_tuners.up_threshold) {
+   if (load > cs_tuners->up_threshold) {
dbs_info->down_skip = 0;
 
/* if we are already at full speed then break out early */
if (dbs_info->requested_freq == policy->max)
return;
 
-   freq_target = (cs_tuners.freq_step * policy->max) / 100;
+   freq_target = (cs_tuners->freq_step * policy->max) / 100;
 
/* max freq cannot be less than 100. But who knows */
if (unlikely(freq_target == 0))
@@ -92,8 +86,8 @@ static void cs_check_cpu(int cpu, unsigned int load)
 * support the current CPU usage without triggering the up policy. To be
 * safe, we focus 10 points under the threshold.
 */
-   if (load < (cs_tuners.down_threshold - 10)) {
-   freq_target = (cs_tuners.freq_step * policy->max) / 100;
+   if (load < (cs_tuners->down_threshold - 10)) {
+   freq_target = (cs_tuners->freq_step * policy->max) / 100;
 
dbs_info->requested_freq -= freq_target;

Re: [PATCH] cpufreq: exynos: simplify .init() for setting policy->cpus

2013-02-04 Thread Viresh Kumar

On 31 January 2013 07:56, Viresh Kumar  wrote:
> With the recent changes in cpufreq core, we just need to set mask of all
> possible cpus into policy->cpus. Rest would be done by core.
>
> Signed-off-by: Viresh Kumar 
> ---
>  drivers/cpufreq/exynos-cpufreq.c | 14 +-
>  1 file changed, 1 insertion(+), 13 deletions(-)

Hi Rafael,

Are you picking up this patch ?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-04 Thread Viresh Kumar

On 4 February 2013 17:47, Rafael J. Wysocki  wrote:
> Well, [1-2/4] are things I can take for v3.9 no problem.  The other two I'd
> wait for the next cycle to be honest.  We already have 30+ cpufreq patches
> scheduled for v3.9 and some of them quite subtle for that matter.

To be honest, i wanted to get these in 3.9 :) .
And that's why hurried on them and got this full series working in a single
day of work :)

Anyway, i understand your point and i also believe last two patches require some
testing/review before these getting in.

One important point i would like to highlight is:
governors directory would be present in cpu/cpu*/cpufreq/ now instead
of cpu/cpufreq/.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-04 Thread Viresh Kumar

On 4 February 2013 18:02, Borislav Petkov  wrote:
> On Mon, Feb 04, 2013 at 05:54:18PM +0530, Viresh Kumar wrote:
>> One important point i would like to highlight is: governors directory
>> would be present in cpu/cpu*/cpufreq/ now instead of cpu/cpufreq/.
>
> Uh, hold on, isn't this breaking a bunch of userspace with this move?
> Also, on all those other systems which don't need per-policy governors,
> we probably don't need this. So maybe this should be made optional, to
> be enabled by a config option IMO...

That's why i am highlighting it again and again. :)
What i believe is, the place where this directory was present earlier
(cpu/cpufreq/) wasn't the right place. Everything else was in cpu/cpu*/cpufreq,
then why this in cpu/cpufreq/ ?

I don't know how much of a pain it would be to fix userspace for it, but i know
it wouldn't be that small.

I had another idea of doing this only for platforms where we have multiple
struct policy alive at the same time. But didn't wanted to implement it before
discussing this further.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-04 Thread Viresh Kumar

On 4 February 2013 18:34, Borislav Petkov  wrote:
> On Mon, Feb 04, 2013 at 06:24:19PM +0530, Viresh Kumar wrote:

>> What i believe is, the place where this directory was present earlier
>> (cpu/cpufreq/) wasn't the right place. Everything else was in 
>> cpu/cpu*/cpufreq,
>> then why this in cpu/cpufreq/ ?
>
> For the simple reason that the "cpu*" stuff is per-cpu - the
> "cpu/cpufreq" is per system, i.e. one governor for the whole system.

That's not completely true. There lies cpufreq directory in cpu/cpu*/ too,
where we have per policy stuff in cpu/cpu*/, like policy tunables and stats.
And the same is true for governor too.

>> I don't know how much of a pain it would be to fix userspace for it,
>> but i know it wouldn't be that small.
>
> I wouldn't fix userspace but simply not touch it. You can add your
> per-policy stuff in "cpu/cpu*" as new sysfs nodes and no need to
> change anything.

That was slightly confusing to me :(
The whole governor directory is per policy, i have to keep that in
cpu/cpu*/cpufreq
instead of cpu/cpufreq .

> And, also, as I suggested earlier, you should make it
> configurable since this code wouldn't make sense on x86, for example,
> where one system-wide governor should suffice.

Its not only for multicluster system, but a system where multiple cpus have
separate clock control and hence multiple policy structures.

>> I had another idea of doing this only for platforms where we have
>> multiple struct policy alive at the same time. But didn't wanted to
>> implement it before discussing this further.
>
> Simply put it behind a config option like
> CONFIG_CPU_IDLE_MULTIPLE_DRIVERS, call the whole menu
> "Multi-power-domain-policy" something and that should be modulary
> enough.

Problem with this is it would fail for single image solutions on which everybody
is working on. So, with multiple platforms compiled into a single
image, this wouldn't
work.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-04 Thread Viresh Kumar

On 4 February 2013 19:06, Borislav Petkov  wrote:
> On Mon, Feb 04, 2013 at 06:55:25PM +0530, Viresh Kumar wrote:
>> That's not completely true. There lies cpufreq directory in cpu/cpu*/
>> too, where we have per policy stuff in cpu/cpu*/, like policy tunables
>> and stats. And the same is true for governor too.
>
> $ tree /sys/devices/system/cpu/cpu0/cpufreq/
> /sys/devices/system/cpu/cpu0/cpufreq/
> ├── affected_cpus
> ├── bios_limit
> ├── cpb
> ├── cpuinfo_cur_freq
> ├── cpuinfo_max_freq
> ├── cpuinfo_min_freq
> ├── cpuinfo_transition_latency
> ├── related_cpus
> ├── scaling_available_frequencies
> ├── scaling_available_governors
> ├── scaling_cur_freq
> ├── scaling_driver
> ├── scaling_governor
> ├── scaling_max_freq
> ├── scaling_min_freq
> ├── scaling_setspeed
> └── stats
> ├── time_in_state
> ├── total_trans
> └── trans_table
>
> 1 directory, 19 files
>
> $ grep -r . /sys/devices/system/cpu/cpu0/cpufreq/*
> /sys/devices/system/cpu/cpu0/cpufreq/affected_cpus:0
> /sys/devices/system/cpu/cpu0/cpufreq/bios_limit:400
> /sys/devices/system/cpu/cpu0/cpufreq/cpb:1
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq:140
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq:400
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq:140
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_transition_latency:4000
> /sys/devices/system/cpu/cpu0/cpufreq/related_cpus:0 1
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies:400 
> 340 280 210 140
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors:powersave 
> userspace conservative ondemand perform
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:140
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver:acpi-cpufreq
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:ondemand
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:400
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq:140
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed:
> /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state:400 3089328
> /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state:340 47448
> /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state:280 67185
> /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state:210 92731
> /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state:140 11416914
> /sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table:   From  :To
> /sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table: :   400   
> 340   280   210   14
> /sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table:  400: 0   
>   34756 46388 5317921824
> /sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table:  340: 12938   
>   0  3755  3555 1450
> /sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table:  280: 19940   
>   0 0  4547 2565
> /sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table:  210: 18523   
>   0 0 0 4275
> /sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table:  140:301168   
>   0 0 0
> /sys/devices/system/cpu/cpu0/cpufreq/stats/total_trans:799918
>
> Show me the policy tunables here.

All files which are directly present in cpu/cpu*/cpufreq/ folder. I am not
talking about governor tunables but policy tunables. Things like
scaling_[min]max_freq are policy tunables.

>> That was slightly confusing to me :( The whole governor directory
>> is per policy, i have to keep that in cpu/cpu*/cpufreq instead of
>> cpu/cpufreq.
>
> So make a /sys/devices/system/cpu/cpufreq/policies/ and add
> functionality to assign cpus to policies or whatever the design of this
> thing will be.

Policies don't have a name associated with them and so cpu/cpufreq/policies
doesn't make any sense. Rather one policy is related to multiple cpus and
its tunables are linked in all the cpus that belong to it, like
scaling_[min]max_freq.

>> Its not only for multicluster system, but a system where multiple cpus
>> have separate clock control and hence multiple policy structures.
>
> What are those systems? Examples?

Don't have examples of these, but there can be few. Over that it is
a must for multicluster systems as clusters normally have separate
clock control.

>> Problem with this is it would fail for single image solutions on which
>> everybody is working on. So, with multiple platforms compiled into a
>> single image, this wouldn't work.
>
> Single-image solutions will enable that config option and get built with
> it - no problem at all.

But then we will get governors tunables i

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-04 Thread Viresh Kumar

On 4 February 2013 19:39, Borislav Petkov  wrote:
> On Mon, Feb 04, 2013 at 07:28:16PM +0530, Viresh Kumar wrote:
>> All files which are directly present in cpu/cpu*/cpufreq/ folder. I am
>> not talking about governor tunables but policy tunables. Things like
>> scaling_[min]max_freq are policy tunables.
>
> No, on x86 those are the P-states frequencies. They're defined by the
> hardware.

A question we need to answer: What is "policy"?

For me, "policy" is the hardware policy for a group of cpus, defined by the
hardware. We call them P-states. But these are strictly per policy (i.e. per
cpu group sharing clock line).

So, if we have two packages with two cpus each, we will have two copies
of these P-states and every other file/directory in cpu/cpu*/cpufreq.
One common copy would be shared between cpu/cpu[0-1]/cpufreq
directory and other one between cpu/cpu[2-3]/cpufreq.

>> Policies don't have a name associated with them and so
>> cpu/cpufreq/policies doesn't make any sense. Rather one policy is
>> related to multiple cpus and its tunables are linked in all the cpus
>> that belong to it, like scaling_[min]max_freq.
>
> Then do the following:
>
> cpu/cpufreq/policies/
> |-> policy0
> |-> min_freq
> |-> max_freq
> |-> affected_cpus
> ...

We correlate things with cpus rather than policies and so the current directory
structure of cpu/cpu*/cpufreq/*** is the best suited ones.

> or whatever needs to be a flexible interface for multi-policy cpufreq
> support.

The multi policy part is already well implemented, we are talking about governor
per policy here.

> Remember: once you do those, they're more or less cast in stone so take
> your time and do the design right, do not hurry those.

Yes, and that's why cpu/cpu*/cpufreq/ondemand/*** suits the best, with exactly
the same logic that went for P-states or cpufreq-stats.

>> But then we will get governors tunables in cpu/cpu*/cpufreq/ instead
>> of cpu/cpufreq/ . Will that not break userspace for other systems?
>
> What's wrong with having both? The cpu/cpufreq/ governor will set the
> system-wide governor and the cpu/cpu*/cpufreq/ will add the different
> policies.

Hmm.. confused..
Consider two systems:
- A dual core system, with cores sharing clocks.
- A dual cluster system (dual core per cluster), with separate clocks
per cluster.

Where will you keep governor directories for both of these configurations?

We need to select only one... cpu/cpufreq doesn't suit the second case at all
as we need to use ondemand governor for both the clusters but with separate
tunables. And so a single cpu/cpufreq/ondemand directory wouldn't solve the
issue.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-04 Thread Viresh Kumar

On 4 February 2013 20:35, Borislav Petkov  wrote:
> On Mon, Feb 04, 2013 at 07:51:33PM +0530, Viresh Kumar wrote:
>> We correlate things with cpus rather than policies and so the current
>> directory structure of cpu/cpu*/cpufreq/*** is the best suited ones.
>
> Ok, show me the details of that layout. How is that going to look?

I don't have board right now to take the snapshot, but it would be
like:

$ tree /sys/devices/system/cpu/cpu0/cpufreq/
/sys/devices/system/cpu/cpu0/cpufreq/
├── affected_cpus
├── bios_limit
├── cpb
├── cpuinfo_cur_freq
├── cpuinfo_max_freq
├── cpuinfo_min_freq
├── cpuinfo_transition_latency
├── related_cpus
├── scaling_available_frequencies
├── scaling_available_governors
├── scaling_cur_freq
├── scaling_driver
├── scaling_governor
├── scaling_max_freq
├── scaling_min_freq
├── scaling_setspeed
└── stats
├── time_in_state
├── total_trans
└── trans_table
└── ondemand
├── sampling_rate
├── up_threshold
└── ignore_nice
etc..

> One thing I've come to realize with the current interface is that if
> you want to change stuff, you need to iterate over all cpus instead of
> writing to a system-wide node.

Not really. Following is the way by which cpu/cpu*/cpufreq directories
are created:

For policy->cpu:
ret = kobject_init_and_add(&policy->kobj, &ktype_cpufreq,
   &dev->kobj, "cpufreq");

This creates cpufreq directory for policy in policy->cpu...

For all other cpus in policy->cpus, we do:
ret = sysfs_create_link(&cpu_dev->kobj, &policy->kobj,
"cpufreq");

And so whatever gets added in cpu/cpu0/cpufreq directory is reflected in
all other policy->cpus.

> And, in this case, if you can and need to change the policy per
> clock-domain, I wouldn't make it needlessly too-granulary per-cpu.
>
> That's why I'm advocating the cpu/cpufreq/ path.

Its already like this, i.e. per policy or clock-domain. Other cpus just have a
link. And that's why in my code, i just add governor directory in policy->cpu's
cpufreq directory and it gets reflected in other cpus of policy->cpus.

That's why i said P-states as policy tunables.

>> Hmm.. confused..
>> Consider two systems:
>> - A dual core system, with cores sharing clocks.
>> - A dual cluster system (dual core per cluster), with separate clocks
>> per cluster.
>>
>> Where will you keep governor directories for both of these configurations?
>
> Easy: as said above, make the policy granularity per clock-domain. On
> systems which have only one set of P-states - like it is the case with
> the overwhelming majority of systems running linux now - nothing should
> change.

Currently its not per policy, but single instance of any governor is supported.
And it is present in cpu/cpufreq . That's why i said earlier, it isn't the right
place for governor's directory. It is very much related to a policy or
clock-domain.

>> We need to select only one... cpu/cpufreq doesn't suit the second case
>> at all as we need to use ondemand governor for both the clusters but
>> with separate tunables. And so a single cpu/cpufreq/ondemand directory
>> wouldn't solve the issue.
>
> Think of it this way: what is the highest granularity you need per
> clock-domain? If you want to control the policy per clock-domain, then
> cpu/cpufreq/ is what you want. If you want finer-grained control -
> and you need to think hard of what use cases are sensible for that
> finer-grained solution - then you're better off with cpu/cpu*/ layout.

I want to control it over clock-domain, but can't get that in cpu/cpufreq/.
Policies don't have numbers assigned to them.

> In both cases though, having clear examples of why you've come up with
> the layout you're advocating would help reviewers a lot. If you simply
> come and say we need this because there might be systems out there who
> could use it, then that probably is not going to get you that far.

So, i am working on ARM's big.LITTLE system where we have two clusters.
One of A15s and other of A7s. Because of their different power ratings or
performance figures, we need to have separate set of ondemand tunables
for them. And hence this patch. Though this patch is required for any
multi-cluster system.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/6] cpufreq: balance out cpufreq_cpu_{get,put} for scaling drivers using setpolicy

2013-02-04 Thread Viresh Kumar

On 4 February 2013 23:55, Dirk Brandewie  wrote:
> Hi Viresh,
>
> I have rebased onto bleeding-edge when I reboot the system I get a kernel
> panic.
> Any idea what I could have done that would have broken the restructured
> cpufreq_{add/remove}_dev.

Dirk, there are two mail chains for similar issue. Lets discuss these
issues in the
other thread: "BUG in bleeding edge c560f3d".
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-04 Thread Viresh Kumar

On Mon, Feb 4, 2013 at 10:20 PM, Borislav Petkov  wrote:
> On Mon, Feb 04, 2013 at 09:07:11PM +0530, Viresh Kumar wrote:
>> └── ondemand
>> ├── sampling_rate
>> ├── up_threshold
>> └── ignore_nice
>
> So this is adding the current governor as a per-cpu thing.

Its per policy, but yes it is replicated into all cpus as all policy->cpus
share the same directory.

>> > One thing I've come to realize with the current interface is that if
>> > you want to change stuff, you need to iterate over all cpus instead of
>> > writing to a system-wide node.
>>
>> Not really. Following is the way by which cpu/cpu*/cpufreq directories
>> are created:
>
> That's not what I meant - I meant from userspace:
>
> for $i in $(grep processor /proc/cpuinfo | awk '{ print $3 }');
> do
> echo "performance" > 
> /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor;
> done
>
> Instead of
>
> echo "performance" > /sys/devices/system/cpu/cpufreq/scaling_governor
>
> which is hypothetical but sets it for the whole system without fuss.

We actually need to do this only for policy->cpu, but yes the user might not
be well aware of policy cpu-groups and may do what you wrote.

But that is true even today, when we need to change any policy tunables or
P-states.

>> I want to control it over clock-domain, but can't get that in cpu/cpufreq/.
>> Policies don't have numbers assigned to them.
>
> So, give them names.

IMHO, names doesn't suit policies.

>> So, i am working on ARM's big.LITTLE system where we have two
>> clusters. One of A15s and other of A7s. Because of their different
>> power ratings or performance figures, we need to have separate set of
>> ondemand tunables for them. And hence this patch. Though this patch is
>> required for any multi-cluster system.
>
> So you want this (values after "="):
>
> cpu/cpufreq/
> |-> policy0
> |-> name= A15
> |-> min_freq= ...
> |-> max_freq= ...
> |-> affected_cpus   = 0,1,2,...
> |-> ondemand
> |-> sampling_rate
> |-> up_threshold
> |-> ignore_nice
> ...
> |-> policy1
> |-> name= A7
> |-> min_freq= ...
> |-> max_freq= ...
> |-> affected_cpus   = n,n+1,n+2,...
> |-> performance
> |-> sampling_rate
> |-> up_threshold
> |-> ignore_nice
> ...

We may have two clusters of A7's also, in a non-big little arch. Then
these names become very confusing.

> Other arches create other policies and that's it. If you need another
> policy added to the set, you simply add 'policyN++' and that's it.

For me, adding policy->names per arch is increasing complexity without
much gain. We already have an existing infrastructure where this info
is present inside cpus and that looks good to me.

> I think this is cleaner but whatever - I don't care that much. My
> only strong concern is that this thing should be a Kconfig option and
> optional for arches where it doesn't apply.

Your concern is: we don't want to fix userspace for existing platforms
where we have just a single cluster and so struct policy in the system.

so, a good solution instead of Kconfig stuff is, add another field in policy
structure that would be filled by platform specific cpufreq drivers init()
routine. Based on which we can decide to put governor directory in
cpu/cpufreq/gov-name or cpu/cpu*/cpufreq/gov-name.

But i am not sure if keeping both kind of directory locations for separate
platforms is a good idea. Userspace needs to adapt to these changes as
well for multi-cluster platforms.

@Rafael: you know about any other multi-cluster/multi-clock-domain
platform leaving big.LITTLE, where we have multiple structs policy?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] cpufreq: Convert the cpufreq_driver_lock to a rwlock

2013-02-05 Thread Viresh Kumar

On Tue, Feb 5, 2013 at 4:15 AM, Nathan Zimmer  wrote:
> This completely eliminates the contention I am seeing in __cpufreq_cpu_get.
> It also nicely stages the lock to be replaced by the rcu.
>
> CC: "Rafael J. Wysocki" 
> Signed-off-by: Nathan Zimmer 
> ---
>  drivers/cpufreq/cpufreq.c | 44 ++--
>  1 file changed, 22 insertions(+), 22 deletions(-)

You have rebased it against an old version of this file. Please rebase it on
latest Rafael's linux-next branch.

The patch looks good otherwise.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] cpufreq: cpufreq_driver_lock is hot on large systems

2013-02-05 Thread Viresh Kumar

On Tue, Feb 5, 2013 at 6:37 AM, Rafael J. Wysocki  wrote:
> On Monday, February 04, 2013 04:45:11 PM Nathan Zimmer wrote:
>> I am noticing the cpufreq_driver_lock is quite hot.
>> On an idle 512 system perf shows me most of the system time is spent on this
>> lock.  This is quite signifigant as top shows 5% of time in system time.
>> My solution was to first convert the lock to a rwlock and then to the rcu.
>>
>>
>> Nathan Zimmer (2):
>>   cpufreq: Convert the cpufreq_driver_lock to a rwlock
>>   cpufreq: Convert the cpufreq_driver_lock to use the rcu
>>
>>  drivers/cpufreq/cpufreq.c | 139 
>> ++
>>  1 file changed, 79 insertions(+), 60 deletions(-)
>
> I like these changes.
>
> Viresh, anyone, any comments?

Hi Nathan/Rafael,

Even i liked the basic idea behind the patchset, but didn't like the way it
is divided into patches. For me, it is highly discouraged to undo something
that you added in the same patchset. And you did exactly the same thing.

Patch 2 is revert of 1 + rcu stuff.

So, i would expect a single patch, i.e. merge of both patches + rebased
on latest stuff.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-05 Thread Viresh Kumar

On 4 February 2013 19:06, Borislav Petkov  wrote:
> On Mon, Feb 04, 2013 at 06:55:25PM +0530, Viresh Kumar wrote:

>> Its not only for multicluster system, but a system where multiple cpus
>> have separate clock control and hence multiple policy structures.
>
> What are those systems? Examples?

Qualcomm's ARM based "krait". Currently shipping in millions of Android
phones.

http://en.wikipedia.org/wiki/Krait_(CPU)

Thanks Charles for pointing it out, I knew there is one :)

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-05 Thread Viresh Kumar

On 5 February 2013 14:45, Borislav Petkov  wrote:
> On Tue, Feb 05, 2013 at 12:50:31PM +0530, Viresh Kumar wrote:
>> > I think this is cleaner but whatever - I don't care that much. My
>> > only strong concern is that this thing should be a Kconfig option and
>> > optional for arches where it doesn't apply.
>>
>> Your concern is: we don't want to fix userspace for existing platforms
>> where we have just a single cluster and so struct policy in the system.
>
> No, as I said so many times already and you're unwilling to understand
> it:

I am willing to, but not able to :)

> multiple policies support in cpufreq should be optional and
> selectable in Kconfig so that systems which don't need that, don't
> have to see or use it. It is yet another feature which doesn't apply
> universally so we make such features optional. Like the rest of the
> gazillion things in the kernel already.

I understand what Kconfig options are for, but i am not able to understand
what's the benefit of this option here. For example: for single image solutions
we need to keep it enabled. And so, would need some sort of logic in cpufreq
core & platform driver to decide where to create the governors directory.

The code without Kconfig option would be as simple as:

platform_driver:
init(struct cpufreq_policy *policy)
{
..
policy->have_multiple_policies = true;
..
}

cpufreq-core:

add_dev()
{
if (policy->have_multiple_policies)
create-folder-in-cpu/cpu*/cpufreq;
else
create-folder-in-cpu/cpufreq;
}

And so, platforms like Krait or big.LITTLE can set it to true from their
cpufreq-drivers. And this wouldn't break any of the current platforms.

> The existing sysfs layout cannot be changed because you're breaking
> userspace and we don't do that. It is that simple.

That's fine. I understood it already. :)

The problem i see is:
- both governor tunables, cpufreq-stats & policy tunables (P-states) have the
same requirement. They are all per policy or clock-domain, instead of per cpu.
- I want to keep all of these at the same place, as they should be
present in the
same hierarchy.
- If we move everything to cpu/cpufreq/policy-names/ then also we would break
existing userspace stuff for stats and P-states.
- If we move everything to cpu/cpu*/cpufreq/ then also we would break
existing userspace stuff for governors.

> Concerning adding new sysfs entries, I told you to make it as easy as
> possible and as sensible as possible, dictated by the use cases. If you
> can't come up with some, then talk to the people who are going to use
> your design and ask them what makes sense the most.

For me cpu/cpu*/ is the most sensible as it is an very easy/convenient interface
for users. I am the first one who is going to use it :)

@Rafael: What's your view on this discussion we are having? We probably need few
more "minds" to jump in, as we are not moving towards a conclusion. :)

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] cpufreq: cpufreq_driver_lock is hot on large systems

2013-02-05 Thread Viresh Kumar

On Tue, Feb 5, 2013 at 3:33 PM, Rafael J. Wysocki  wrote:
> I actually don't agree with that, becuase the Nathan's apprach shows the
> reasoning that leads to the RCU introduction quite clearly.  So if you
> don't have technical problems with the patchset, I'm going to take it as is.

Great!!

Okay.. I don't have any technical problems with it, i reviewed most of it
carefully. The only pending thing is rebase on linux-next, after that i can
give my ack for it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-05 Thread Viresh Kumar

On 5 February 2013 15:57, Borislav Petkov  wrote:
> Are you kidding me? You're simply not reading what I'm saying to you:
> "... should be optional and selectable in Kconfig so that systems which
> don't need that, don't have to see or use it." Because on those systems
> it doesn't apply.
>
> How about we add an x86-specific extension which is a big wad of code
> and is needlessly run on ARM just because it is easier?

There isn't lot of code that we have to keep inside the macro you suggest.
Its just an if else (with single line block), which would give the parent
kobject. Nothing else.

I didn't wanted to create a macro for just that. For me an if/else is not that
big code.

Anyway, if nobody else comes on my side i can create that macro for you.
But, personally i would prefer code without such macros.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-05 Thread Viresh Kumar

On 5 February 2013 16:34, Borislav Petkov  wrote:
> Here's an even cleaner way:
>
> platform_driver:
> init(struct cpufreq_policy *policy)
> {
> ...
>
> add_additional_sysfs_entries(policy);
>
> ...
> }
>
> ...
>
> static void add_additional_sysfs_entries(struct cpufreq_policy *policy)
> {
> #ifdef CONFIG_CPUFREQ_MULTIPLE_POLICIES
> create-folder-in-cpu/cpu*/cpufreq;
> ...
> #endif
> }
>
> and the platform driver will have in its Kconfig section:
>
> config CPUFREQ_PLATFORM_DRIVER_X
> ...
> select CPUFREQ_MULTIPLE_POLICIES
>
>
> You don't need the policy->have_multiple_policies member even.

Tricky part is the name of this routine: add_additional_sysfs_entries().

I am not proposing additional set of directories inside cpu/cpu*/cpufreq/
but either of cpu/cpufreq/ or cpu/cpu*/cpufreq/ for governor.

And so, keeping that additional variable looks a better solution. Let me
get a patch with this change only and see how it looks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-05 Thread Viresh Kumar

On 5 February 2013 16:49, Borislav Petkov  wrote:
> On Tue, Feb 05, 2013 at 04:42:23PM +0530, Viresh Kumar wrote:
>> Tricky part is the name of this routine: add_additional_sysfs_entries().
>
> Now you're just being silly - this is just an example how to do it. If
> you want me to do it for ya, you need to send me your monthly salary.

Haha... I don't want you to do it. I don't want such routine to be exposed
to cpufreq drivers as this belongs to the core code.

Just some kind of indication from platform driver is required about
how/where it wants its governor directory to be present.
And the variable suits more here.

Lets discuss it more on the next patch i send :)

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-05 Thread Viresh Kumar

On 5 February 2013 17:02, Borislav Petkov  wrote:
> On Tue, Feb 05, 2013 at 04:56:03PM +0530, Viresh Kumar wrote:
>> Just some kind of indication from platform driver is required about
>> how/where it wants its governor directory to be present.
>
> The indication is this:
>
> config CPUFREQ_PLATFORM_DRIVER_X
> ...
> select CPUFREQ_MULTIPLE_POLICIES
>
> You really need to slow down and really look at what I'm proposing.

This indication isn't enough. On a single image solution, we need to
identify the system which needs support for multiple policies and
i still feel we need that variable type indication :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-05 Thread Viresh Kumar

On 5 February 2013 18:52, Borislav Petkov  wrote:
> On Tue, Feb 05, 2013 at 05:54:57PM +0530, Viresh Kumar wrote:q
>> This indication isn't enough. On a single image solution, we need to
>> identify the system which needs support for multiple policies and i
>> still feel we need that variable type indication :)
>
> If the image is going to run also on systems which support only a
> single policy, then I guess you can make it a bool, stuff it in struct
> cpufreq_policy and ifdef around it.

That's what i proposed initially.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-05 Thread Viresh Kumar

On 4 February 2013 17:08, Viresh Kumar  wrote:
> Currently, there can't be multiple instances of single governor_type. If we 
> have
> a multi-package system, where we have multiple instances of struct policy (per
> package), we can't have multiple instances of same governor. i.e. We can't 
> have
> multiple instances of ondemand governor for multiple packages.
>
> Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/
> governor-name/. Which again reflects that there can be only one instance of a
> governor_type in the system.
>
> This is a bottleneck for multicluster system, where we want different packages
> to use same governor type, but with different tunables.
>
> This patchset is inclined towards fixing this issue.

After a long & hot discussion over the changes made by this patchset, following
patches came out:

--x--x---

commit 15b5548c9ccfb8088270f7574710d9d67edfe33b
Author: Viresh Kumar 
Date:   Tue Feb 5 21:29:05 2013 +0530

cpufreq: Make governors directory sysfs location based on
have_multiple_policies

Until now directory for governors tunables was getting created in
cpu/cpufreq/. With the introduction of following patch:
"cpufreq: governor: Implement per policy instances of governors"

this directory would be created in
cpu/cpu/cpufreq/. This might
break userspace of existing platforms. Lets do this change only
for platforms
which need support for multiple policies and thus above mentioned patch.

From now on, such platforms would be require to do following from
their init()
routines:

    policy->have_multiple_policies = true;

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/cpufreq_governor.c |  2 +-
 include/linux/cpufreq.h| 14 ++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c
b/drivers/cpufreq/cpufreq_governor.c
index 545ae24..41ee86f 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -300,7 +300,7 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
 dbs_data->cdata->gov_dbs_timer);
}

-   rc = sysfs_create_group(&policy->kobj,
+   rc = sysfs_create_group(get_governor_parent_kobj(policy),
dbs_data->cdata->attr_group);
if (rc) {
mutex_unlock(&dbs_data->mutex);
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 5dae7e6..6e1abd2 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -107,6 +107,11 @@ struct cpufreq_policy {
unsigned intpolicy; /* see above */
struct cpufreq_governor *governor; /* see below */
void*governor_data;
+   /* This should be set by init() of platforms having multiple
+* clock-domains, i.e.  supporting multiple policies. With this sysfs
+* directories of governor would be created in cpu/cpu/cpufreq/
+* directory */
+   boolhave_multiple_policies;

struct work_struct  update; /* if update_policy() needs to be
 * called, but you're in IRQ context */
@@ -134,6 +139,15 @@ static inline bool policy_is_shared(struct
cpufreq_policy *policy)
return cpumask_weight(policy->cpus) > 1;
 }

+static inline struct kobject *
+get_governor_parent_kobj(struct cpufreq_policy *policy)
+{
+   if (policy->have_multiple_policies)
+   return &policy->kobj;
+   else
+   return cpufreq_global_kobject;
+}
+
 / cpufreq transition notifiers ***/

 #define CPUFREQ_PRECHANGE  (0)

--x--x---

Plus the following patch, though i am still not in favor of this patch.
@Rafael: Please share your opinion too on this one. :)

--x--x---
commit 1c7e9871fce7388136eda1c86ca75a520e4d3b9d
Author: Viresh Kumar 
Date:   Tue Feb 5 21:41:40 2013 +0530

cpufreq: Add Kconfig option to enable/disable have_multiple_policies

have_multiple_policies is required by platforms having multiple
clock-domains
for cpus, i.e. supporting multiple policies for cpus. This patch adds in a
Kconfig option for enabling execution of this code.

Requested-by: Borislav Petkov 
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/Kconfig | 3 +++
 include/linux/cpufreq.h | 4 
 2 files changed, 7 insertions(+)

diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig
index cbcb21e..e6e6939 100644
--- a/drivers/cpufreq/Kconfig
+++ b/drivers/cpufreq/Kconfig
@@ -23,6 +23,9 @@ config CPU_FREQ_TABLE
 config CPU_FREQ_GOV_COMMON
bool

+config

Re: [PATCH 2/7] cpufreq: Retrieve current frequency from scaling drivers with internal governors

2013-02-05 Thread Viresh Kumar

On Tue, Feb 5, 2013 at 11:54 PM,   wrote:
> From: Dirk Brandewie 
>
> Scaling drivers that implement the cpufreq_driver.setpolicy() versus
> the cpufreq_driver.target() interface do not set policy->cur.

> Signed-off-by: Dirk Brandewie 

> From: Dirk Brandewie 
> Date: 2013-02-04 18:44:40 -0800
>
> cpufreq: balance out cpufreq_cpu_{get,put} for scaling drivers using setpolicy

> Signed-off-by: Dirk Brandewie 

Wow!! Something happened here. :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/7] cpufreq: Don't remove sysfs link for policy->cpu

2013-02-05 Thread Viresh Kumar

On Tue, Feb 5, 2013 at 11:54 PM,   wrote:
> From: Viresh Kumar 
>
> "cpufreq" directory in policy->cpu is never created using sysfs_create_link(),
> but using kobject_init_and_add(). And so we shouldn't call sysfs_remove_link()
> for policy->cpu(). sysfs stuff for policy->cpu is automatically removed when 
> we
> call kobject_put() for dying policy.
>
> Signed-off-by: Viresh Kumar 
> Tested-by: Dirk Brandewie 

Its already included by Rafael, don't include it in your next version:

commit 73bf0fc2b03d1f4fdada0ec430dc20bfb089cfd5
Author: Viresh Kumar 
Date:   Tue Feb 5 22:21:14 2013 +0100

cpufreq: Don't remove sysfs link for policy->cpu

"cpufreq" directory in policy->cpu is never created using
sysfs_create_link(), but using kobject_init_and_add(). And so we
shouldn't call sysfs_remove_link() for policy->cpu().  sysfs stuff
for policy->cpu is automatically removed when we call kobject_put()
for dying policy.

Signed-off-by: Viresh Kumar 
Tested-by: Dirk Brandewie 
Signed-off-by: Rafael J. Wysocki 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/7] cpufreq: Retrieve current frequency from scaling drivers with internal governors

2013-02-05 Thread Viresh Kumar

On Tue, Feb 5, 2013 at 11:54 PM,   wrote:
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 2817c3c..96bc302 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1181,6 +1181,13 @@ unsigned int cpufreq_quick_get(unsigned int cpu)
> struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
> unsigned int ret_freq = 0;
>
> +   if (cpufreq_driver && cpufreq_driver->setpolicy &&
> +   cpufreq_driver->get) {
> +   ret_freq = cpufreq_driver->get(cpu);
> +   cpufreq_cpu_put(policy);
> +   return ret_freq;
> +   }
> +

This was my comment on the last version:

"You are required to do cpufreq_cpu_put() in this case too... Better do
cpufreq_cpu_get() after your check."

It still applies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/7] cpufreq: Only query drivers that implement cpufreq_driver.target()

2013-02-05 Thread Viresh Kumar

On Tue, Feb 5, 2013 at 11:54 PM,   wrote:
> From: Dirk Brandewie 
>
> Scaling drivers that implement cpufreq_driver.setpolicy() have
> internal governors and may/will change the current operating frequency
> very frequently this will cause cpufreq_out_of_sync() to be called
> every time. Only call cpufreq_driver.get() for drivers that implement
> cpufreq_driver.target()
>
> Signed-off-by: Dirk Brandewie 
> ---
>  drivers/cpufreq/cpufreq.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 96bc302..d8daa4b 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1787,7 +1787,7 @@ int cpufreq_update_policy(unsigned int cpu)
>
> /* BIOS might change freq behind our back
>   -> ask driver for current freq and notify governors about a change 
> */
> -   if (cpufreq_driver->get) {
> +   if (cpufreq_driver->get && cpufreq_driver->target) {
> policy.cur = cpufreq_driver->get(cpu);
> if (!data->cur) {
> pr_debug("Driver did not initialize current freq");

I am really not liking copy-pasting my older comments here :(

"This would mean policy->cur has a garbage value. I don't really know
how would other routine behave on this. Would it make sense to make
policy->cur zero atleast?
"
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/7] cpufreq: Do not track governor name for scaling drivers with internal governors.

2013-02-05 Thread Viresh Kumar

On Tue, Feb 5, 2013 at 11:54 PM,   wrote:
> From: Dirk Brandewie 
>
> Scaling drivers that implement internal governors do not have governor
> structures assocaited with them.  Only track the name of the governor
> associated with the CPU if the driver does not implement
> cpufreq_driver.setpolicy()
>
> Signed-off-by: Dirk Brandewie 
> ---
>  drivers/cpufreq/cpufreq.c |4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index d8daa4b..622e282 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1050,7 +1050,9 @@ static int __cpufreq_remove_dev(struct device *dev, 
> struct subsys_interface *sif
> __cpufreq_governor(data, CPUFREQ_GOV_STOP);
>
>  #ifdef CONFIG_HOTPLUG_CPU
> -   strncpy(per_cpu(cpufreq_cpu_governor, cpu), data->governor->name,
> +   if (!cpufreq_driver->setpolicy)
> +   strncpy(per_cpu(cpufreq_cpu_governor, cpu),
> +   data->governor->name,
> CPUFREQ_NAME_LEN);

Acked-by: Viresh Kumar 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/7] cpufreq: balance out cpufreq_cpu_{get,put} for scaling drivers using setpolicy

2013-02-05 Thread Viresh Kumar

On Tue, Feb 5, 2013 at 11:54 PM,   wrote:
> From: Dirk Brandewie 
>
> There is an additional reference added to the driver in
> cpufreq_add_dev()  that is removed in__cpufreq_governor() if the
> driver implements target().  Remove the last reference when the
> driver implements setpolicy()
>
> Signed-off-by: Dirk Brandewie 
> ---
>  drivers/cpufreq/cpufreq.c |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 622e282..d17477b 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1049,6 +1049,9 @@ static int __cpufreq_remove_dev(struct device *dev, 
> struct subsys_interface *sif
> if (cpufreq_driver->target)
> __cpufreq_governor(data, CPUFREQ_GOV_STOP);
>
> +   if (cpufreq_driver->setpolicy)
> +   cpufreq_cpu_put(data);

I don't understand this patch at all.. I grepped both cpufreq_cpu_get() & put()
in bleeding-edge and found everything to be correct.

Can you please point me to the exact line numbers ?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 6/7] cpufreq_stats: do not remove sysfs files if frequency table is not present

2013-02-05 Thread Viresh Kumar

On Tue, Feb 5, 2013 at 11:54 PM,   wrote:
> From: Dirk Brandewie 
>
> The sysfs files for cpufreq_stats are created in cpufreq_stats_create_table()
> called from cpufreq_stat_notifier_policy() when a policy is added to
> the cpu. cpufreq_stats_create_table() will not be called if the
> scaling driver does not export a frequency table to cpufreq.  Use the
> same fence on tear down.
>
> Signed-off-by: Dirk Brandewie 
> ---
>  drivers/cpufreq/cpufreq_stats.c |5 +
>  1 files changed, 5 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
> index 572124c..fb001e2 100644
> --- a/drivers/cpufreq/cpufreq_stats.c
> +++ b/drivers/cpufreq/cpufreq_stats.c
> @@ -185,6 +185,11 @@ static void cpufreq_stats_free_table(unsigned int cpu)
>  static void cpufreq_stats_free_sysfs(unsigned int cpu)
>  {
> struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
> +   struct cpufreq_frequency_table *table;
> +
> +   table = cpufreq_frequency_get_table(cpu);
> +   if (!table)
> +   return;

Because we aren't using table later on, i would write it as:

> +   if (!cpufreq_frequency_get_table(cpu))
> +   return;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/7] cpufreq: Retrieve current frequency from scaling drivers with internal governors

2013-02-05 Thread Viresh Kumar

On Wed, Feb 6, 2013 at 7:45 AM, Dirk Brandewie  wrote:
> How about this?
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 2817c3c..9c0eac4 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1182,7 +1182,12 @@ unsigned int cpufreq_quick_get(unsigned int cpu)
>
> unsigned int ret_freq = 0;
>
> if (policy) {
> -   ret_freq = policy->cur;
>
> +   if (cpufreq_driver && cpufreq_driver->setpolicy &&
> +   cpufreq_driver->get) {
> +   ret_freq = cpufreq_driver->get(cpu);
> +   } else {
> +   ret_freq = policy->cur;
> +   }
> cpufreq_cpu_put(policy);

The problem is: You don't need to do get/put in your case at all.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/7] cpufreq: Only query drivers that implement cpufreq_driver.target()

2013-02-05 Thread Viresh Kumar

On 6 February 2013 07:36, Dirk Brandewie  wrote:
> On 02/05/2013 05:47 PM, Viresh Kumar wrote:
>>
>> On Tue, Feb 5, 2013 at 11:54 PM,   wrote:
>>>
>>> From: Dirk Brandewie 
>>>
>>> Scaling drivers that implement cpufreq_driver.setpolicy() have
>>> internal governors and may/will change the current operating frequency
>>> very frequently this will cause cpufreq_out_of_sync() to be called
>>> every time. Only call cpufreq_driver.get() for drivers that implement
>>> cpufreq_driver.target()
>>>
>>> Signed-off-by: Dirk Brandewie 
>>> ---
>>>   drivers/cpufreq/cpufreq.c |2 +-
>>>   1 files changed, 1 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>>> index 96bc302..d8daa4b 100644
>>> --- a/drivers/cpufreq/cpufreq.c
>>> +++ b/drivers/cpufreq/cpufreq.c
>>> @@ -1787,7 +1787,7 @@ int cpufreq_update_policy(unsigned int cpu)
>>>
>>>  /* BIOS might change freq behind our back
>>>-> ask driver for current freq and notify governors about a
>>> change */
>>> -   if (cpufreq_driver->get) {
>>> +   if (cpufreq_driver->get && cpufreq_driver->target) {
>>>  policy.cur = cpufreq_driver->get(cpu);
>>>  if (!data->cur) {
>>>  pr_debug("Driver did not initialize current
>>> freq");
>>
>>
>> I am really not liking copy-pasting my older comments here :(
>>
>> "This would mean policy->cur has a garbage value. I don't really know
>> how would other routine behave on this. Would it make sense to make
>> policy->cur zero atleast?
>> "
>>
> The driver implements get() and will return a valid value but the other
> components that track the current frequency will not have been notified
> about any change so there is nothing to be out of sync with.  There is no
> reason to call cpufreq_out_of_sync() where the driver being used implements
> an internal
> governor.

Not sure if we are discussing the same issue. What i am saying is, with your
patch we aren't calling following line:

>>>  policy.cur = cpufreq_driver->get(cpu);

and policy was a local variable. Hence policy->cur is having a garbage value.
policy->cur is used at multiple places in cpufreq.c . Please check everywhere
if this garbage value doesn't break set_policy() type systems.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/7] cpufreq: balance out cpufreq_cpu_{get,put} for scaling drivers using setpolicy

2013-02-05 Thread Viresh Kumar

On 6 February 2013 07:38, Dirk Brandewie  wrote:
> On 02/05/2013 05:58 PM, Viresh Kumar wrote:
>>
>> On Tue, Feb 5, 2013 at 11:54 PM,   wrote:
>>>
>>> From: Dirk Brandewie 
>>>
>>> There is an additional reference added to the driver in
>>> cpufreq_add_dev()  that is removed in__cpufreq_governor() if the
>>>
>>> driver implements target().  Remove the last reference when the
>>> driver implements setpolicy()
>>>
>>> Signed-off-by: Dirk Brandewie 
>>>
>>> ---
>>>   drivers/cpufreq/cpufreq.c |3 +++
>>>   1 files changed, 3 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>>> index 622e282..d17477b 100644
>>> --- a/drivers/cpufreq/cpufreq.c
>>> +++ b/drivers/cpufreq/cpufreq.c
>>> @@ -1049,6 +1049,9 @@ static int __cpufreq_remove_dev(struct device *dev,
>>> struct subsys_interface *sif
>>>
>>>  if (cpufreq_driver->target)
>>>  __cpufreq_governor(data, CPUFREQ_GOV_STOP);
>>>
>>> +   if (cpufreq_driver->setpolicy)
>>> +   cpufreq_cpu_put(data);
>>
>>
>> I don't understand this patch at all.. I grepped both cpufreq_cpu_get() &
>> put()
>> in bleeding-edge and found everything to be correct.
>>
>> Can you please point me to the exact line numbers ?
>>
>
> Line 878 in cpufreq_add_dev()

Following is line 878:

for_each_online_cpu(sibling) {
struct cpufreq_policy *cp = per_cpu(cpufreq_cpu_data, sibling);
if (cp && cpumask_test_cpu(cpu, cp->related_cpus))
return cpufreq_add_policy_cpu(cpu, sibling, dev);
}

How is this related to your patch?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/7] cpufreq: Retrieve current frequency from scaling drivers with internal governors

2013-02-05 Thread Viresh Kumar

On 6 February 2013 08:01, Dirk Brandewie  wrote:
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 2817c3c..7516b7d 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1178,9 +1178,14 @@ static void cpufreq_out_of_sync(unsigned int cpu,
> unsigned int old_freq,
>   */
>
>  unsigned int cpufreq_quick_get(unsigned int cpu)
>  {
> -   struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
> +   struct cpufreq_policy *policy;
>
> unsigned int ret_freq = 0;
>
> +   if (cpufreq_driver && cpufreq_driver->setpolicy &&
> +   cpufreq_driver->get)
> +   return cpufreq_driver->get(cpu);
> +
> +   policy = cpufreq_cpu_get(cpu);
> if (policy) {
>     ret_freq = policy->cur;
> cpufreq_cpu_put(policy);

Acked-by: Viresh Kumar 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 linux-next 1/2] cpufreq: Convert the cpufreq_driver_lock to a rwlock

2013-02-05 Thread Viresh Kumar

On 6 February 2013 07:34, Nathan Zimmer  wrote:
> This eliminates the contention I am seeing in __cpufreq_cpu_get.
> It also nicely stages the lock to be replaced by the rcu.
>
> Cc: Viresh Kumar 
> Cc: "Rafael J. Wysocki" 
> Signed-off-by: Nathan Zimmer 
> ---
>  drivers/cpufreq/cpufreq.c | 42 +-
>  1 file changed, 21 insertions(+), 21 deletions(-)

Acked-by: Viresh Kumar 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 linux-next 2/2] cpufreq: Convert the cpufreq_driver_lock to use the rcu

2013-02-05 Thread Viresh Kumar

On 6 February 2013 07:34, Nathan Zimmer  wrote:
> In general rwlocks are discourged so we are moving it to use the rcu instead.
>
> Cc: Viresh Kumar 
> Cc: "Rafael J. Wysocki" 
> Signed-off-by: Nathan Zimmer 
> ---
>  drivers/cpufreq/cpufreq.c | 173 
> +-
>  1 file changed, 96 insertions(+), 77 deletions(-)

bleeding-edge got updated again and this patch would have conflicts :)

Acked-by: Viresh Kumar 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpufreq: ondemand: Replace down_differential tuner with adj_up_threshold

2013-02-05 Thread Viresh Kumar

On Wed, Feb 6, 2013 at 2:34 AM, Stratos Karafotis  wrote:
> In order to avoid the calculation of up_threshold - down_differential
> every time that the frequency must be decreased, we replace the
> down_differential tuner with the adj_up_threshold which keeps the
> difference across multiple checks.
>
> Update the adj_up_threshold only when the up_theshold is also updated.
>
> Signed-off-by: Stratos Karafotis 
> ---
>  drivers/cpufreq/cpufreq_governor.h |  2 +-
>  drivers/cpufreq/cpufreq_ondemand.c | 16 ++--
>  2 files changed, 11 insertions(+), 7 deletions(-)

Makes sense.

Acked-by: Viresh Kumar 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 linux-next 2/2] cpufreq: Convert the cpufreq_driver_lock to use the rcu

2013-02-06 Thread Viresh Kumar

On 6 February 2013 08:22, Viresh Kumar  wrote:
> On 6 February 2013 07:34, Nathan Zimmer  wrote:
>> In general rwlocks are discourged so we are moving it to use the rcu instead.
>>
>> Cc: Viresh Kumar 
>> Cc: "Rafael J. Wysocki" 
>> Signed-off-by: Nathan Zimmer 
>> ---
>>  drivers/cpufreq/cpufreq.c | 173 
>> +-
>>  1 file changed, 96 insertions(+), 77 deletions(-)
>
> bleeding-edge got updated again and this patch would have conflicts :)
>
> Acked-by: Viresh Kumar 

Rafael,

I have applied them over bleeding-edge after resolving few conflicts.
Please pick them up from:

Branch: for-rafael
Repo: git://git.linaro.org/people/vireshk/linux.git
http://git.linaro.org/gitweb?p=people/vireshk/linux.git;a=shortlog;h=refs/heads/for-rafael

I have got my other cpufreq work too in that repo:
http://git.linaro.org/gitweb?p=people/vireshk/linux.git;a=shortlog;h=refs/heads/cpufreq-updates
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-06 Thread Viresh Kumar

On 5 February 2013 21:51, Viresh Kumar  wrote:
> commit 15b5548c9ccfb8088270f7574710d9d67edfe33b
> Author: Viresh Kumar 
> Date:   Tue Feb 5 21:29:05 2013 +0530
>
> cpufreq: Make governors directory sysfs location based on
> have_multiple_policies
>
> Until now directory for governors tunables was getting created in
> cpu/cpufreq/. With the introduction of following patch:
> "cpufreq: governor: Implement per policy instances of governors"
>
> this directory would be created in
> cpu/cpu/cpufreq/. This might
> break userspace of existing platforms. Lets do this change only
> for platforms
> which need support for multiple policies and thus above mentioned patch.
>
> From now on, such platforms would be require to do following from
> their init()
> routines:
>
>     policy->have_multiple_policies = true;
>
> Signed-off-by: Viresh Kumar 
> ---
>  drivers/cpufreq/cpufreq_governor.c |  2 +-
>  include/linux/cpufreq.h| 14 ++
>  2 files changed, 15 insertions(+), 1 deletion(-)

Hi Rafael,

Because this patch was quite big (317 insertions(+), 238 deletions(-)), i was
planning a detailed self review to capture any mistakes and luckily i found
one for above patch :)

diff --git a/drivers/cpufreq/cpufreq_governor.c
b/drivers/cpufreq/cpufreq_governor.c
index 41ee86f..fe037c0 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -342,7 +342,8 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
mutex_lock(&dbs_data->mutex);
mutex_destroy(&cpu_cdbs->timer_mutex);

-   sysfs_remove_group(&policy->kobj, dbs_data->cdata->attr_group);
+   sysfs_remove_group(get_governor_parent_kobj(policy),
+   dbs_data->cdata->attr_group);
if (dbs_data->cdata->governor == GOV_CONSERVATIVE)
cpufreq_unregister_notifier(cs_ops->notifier_block,
CPUFREQ_TRANSITION_NOTIFIER);

I have pushed the complete patchset here:

http://git.linaro.org/gitweb?p=people/vireshk/linux.git;a=shortlog;h=refs/heads/cpufreq-updates
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-06 Thread Viresh Kumar

On 6 February 2013 15:38, Amit Kucheria  wrote:
> On Wed, Feb 6, 2013 at 3:28 PM, Viresh Kumar  wrote:
>> I have pushed the complete patchset here:
>>
>> http://git.linaro.org/gitweb?p=people/vireshk/linux.git;a=shortlog;h=refs/heads/cpufreq-updates
>>
>
> Viresh, perhaps you should ask Stephen Rothwell to pull in your tree
> to get some more testing before Rafael pulls it in for 3.10?

Its has been made clear by Rafael that these patches wouldn't make it for
3.9 (though i wanted them to :) ), and so once the merge window is over
Rafael might pull them in and so they would reach Stephen's linux-next too...

I am not sure if sending a cpufreq pull request directly to Stephen is
preferred.
@Rafael: ??
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/7] cpufreq: balance out cpufreq_cpu_{get,put} for scaling drivers using setpolicy

2013-02-06 Thread Viresh Kumar

On 6 February 2013 21:41, Dirk Brandewie  wrote:
> our files are clearly out of sync :-)  The code in cpufreq_add_dev() is

I was at latest bleeding-edge then, but i still have below code in my HEAD.

> #ifdef CONFIG_SMP
> /* check whether a different CPU already registered this
>  * CPU because it is in the same boat. */
> policy = cpufreq_cpu_get(cpu);
> if (unlikely(policy)) {
> cpufreq_cpu_put(policy);
> return 0;
> }
>
> The reference added by this cpufreq_cpu_get() is finally dropped in
> __cpufreq_remove_dev() with the call to __cpufreq_governor()

__cpufreq_governor() is not calling cpufreq_cpu_put() at all and is dropping
reference for the governor only, not for policy.

> if (driver->target)
> __cpufreq_governor(data, CPUFREQ_GOV_STOP);
>
> Without this change I hang at:
> pr_debug("waiting for dropping of refcount\n");
> wait_for_completion(cmp);

Below should have fixed it for you i believe.

pr_debug("%s: removing link, cpu: %d\n", __func__, cpu);
cpufreq_cpu_put(data);
unlock_policy_rwsem_write(cpu);

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpufreq: exynos: simplify .init() for setting policy->cpus

2013-02-06 Thread Viresh Kumar

On 4 February 2013 17:52, Viresh Kumar  wrote:
> On 31 January 2013 07:56, Viresh Kumar  wrote:
>> With the recent changes in cpufreq core, we just need to set mask of all
>> possible cpus into policy->cpus. Rest would be done by core.
>>
>> Signed-off-by: Viresh Kumar 
>> ---
>>  drivers/cpufreq/exynos-cpufreq.c | 14 +-
>>  1 file changed, 1 insertion(+), 13 deletions(-)
>
> Hi Rafael,
>
> Are you picking up this patch ?

Ping!!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: next-20130206 cpufreq - WARN in sysfs_add_one

2013-02-06 Thread Viresh Kumar

On Thu, Feb 7, 2013 at 2:54 AM, Rafael J. Wysocki  wrote:
> On Wednesday, February 06, 2013 12:44:35 PM Valdis Kletnieks wrote:
>> Seen in dmesg.  next-20130128 was OK. Haven't done a bisect, but can
>> do so if the offender isn't obvious...
>
> I suppose this is 73bf0fc "cpufreq: Don't remove sysfs link for policy->cpu".

Not really. :)

Hi Valdis,

First of all i want to confirm something about your system. I am sure it is a
multi-policy system (or multi cluster system). i.e. there are more than one
clock line for different cpus ? And so multiple struct policy exist
simultaneously.

Because this crash can only come on those.

Anyway, i have tested and pushed a fix here:

http://git.linaro.org/gitweb?p=people/vireshk/linux.git;a=shortlog;h=refs/heads/for-valdis

Please test it.

For others, the patch is:

commit 007dda326f1b1415846671d7fcfbd520f4f16151
Author: Viresh Kumar 
Date:   Thu Feb 7 12:51:27 2013 +0530

cpufreq: governors: Fix WARN_ON() for multi-policy platforms

On multi-policy systems there is a single instance of governor for both the
policies (if same governor is chosen for both policies). With the
code update
from following patches:

8eeed09 cpufreq: governors: Get rid of dbs_data->enable field
b394058 cpufreq: governors: Reset tunables only for
cpufreq_unregister_governor()

We are creating/removing sysfs directory of governor for for every call to
GOV_START and STOP. This would fail for multi-policy system as there is a
per-policy call to START/STOP.

This patch reuses the governor->initialized variable to detect
total users of
governor.

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/cpufreq.c  |  6 --
 drivers/cpufreq/cpufreq_governor.c | 32 +++-
 2 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index ccc598a..3b941a1 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1567,8 +1567,10 @@ static int __cpufreq_governor(struct
cpufreq_policy *policy,
policy->cpu, event);
ret = policy->governor->governor(policy, event);

-   if (!policy->governor->initialized && (event == CPUFREQ_GOV_START))
-   policy->governor->initialized = 1;
+   if (event == CPUFREQ_GOV_START)
+   policy->governor->initialized++;
+   else if (event == CPUFREQ_GOV_STOP)
+   policy->governor->initialized--;

/* we keep one module reference alive for
each CPU governed by this CPU */
diff --git a/drivers/cpufreq/cpufreq_governor.c
b/drivers/cpufreq/cpufreq_governor.c
index e4a306c..5a76086 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -247,11 +247,13 @@ int cpufreq_governor_dbs(struct dbs_data *dbs_data,
 dbs_data->gov_dbs_timer);
}

-   rc = sysfs_create_group(cpufreq_global_kobject,
-   dbs_data->attr_group);
-   if (rc) {
-   mutex_unlock(&dbs_data->mutex);
-   return rc;
+   if (!policy->governor->initialized) {
+   rc = sysfs_create_group(cpufreq_global_kobject,
+   dbs_data->attr_group);
+   if (rc) {
+   mutex_unlock(&dbs_data->mutex);
+   return rc;
+   }
}

/*
@@ -262,13 +264,15 @@ int cpufreq_governor_dbs(struct dbs_data *dbs_data,
cs_dbs_info->down_skip = 0;
cs_dbs_info->enable = 1;
cs_dbs_info->requested_freq = policy->cur;
-   cpufreq_register_notifier(cs_ops->notifier_block,
-   CPUFREQ_TRANSITION_NOTIFIER);

-   if (!policy->governor->initialized)
+   if (!policy->governor->initialized) {
+
cpufreq_register_notifier(cs_ops->notifier_block,
+   CPUFREQ_TRANSITION_NOTIFIER);
+
dbs_data->min_sampling_rate =
MIN_SAMPLING_RATE_RATIO *
jiffies_to_usecs(10);
+   }
} else {
od_dbs_info->rate_mult = 1;
od_dbs_info->sample_type = OD_NORMAL_SAMPLE;
@@ -311,11 +315,13 @@ unlock:
mutex_lock(&dbs_data->mutex);
mutex_destroy(&cpu_cdbs->timer_mutex);

-   sysfs_remove_group(cpufreq_global_kobj

[PATCH 0/4] CPUFreq Fixes for 3.9

2013-02-07 Thread Viresh Kumar

Hi Rafael,

This is another unplanned patchset for all the platforms that i broke. :)

Okay, there are two important fixes (1 & 4) and two general cleanups (2 & 3). I
hope most of the issues would be resolved by these and we would be able to push
clean cpufreq core into 3.9.

I have pushed them in my for-rafael branch at:

http://git.linaro.org/gitweb?p=people/vireshk/linux.git;a=shortlog;h=refs/heads/for-rafael

@Artem & Valdis: Please test them and reply with your Tested-by's (in case they
work :) ).

Viresh Kumar (4):
  cpufreq: governors: Fix WARN_ON() for multi-policy platforms
  cpufreq: Remove unused HOTPLUG_CPU code
  cpufreq: Create a macro for unlock_policy_rwsem{read,write}
  cpufreq: Fix locking issues

 drivers/cpufreq/cpufreq.c  | 126 ++---
 drivers/cpufreq/cpufreq_governor.c |  32 ++
 2 files changed, 79 insertions(+), 79 deletions(-)

-- 
1.7.12.rc2.18.g61b472e


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/4] cpufreq: Create a macro for unlock_policy_rwsem{read,write}

2013-02-07 Thread Viresh Kumar

On the lines of macro: lock_policy_rwsem, we can create another macro for
unlock_policy_rwsem. Lets do it.

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/cpufreq.c | 24 +---
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 795b0e8..c46aed4 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -70,8 +70,7 @@ static DEFINE_PER_CPU(int, cpufreq_policy_cpu);
 static DEFINE_PER_CPU(struct rw_semaphore, cpu_policy_rwsem);
 
 #define lock_policy_rwsem(mode, cpu)   \
-static int lock_policy_rwsem_##mode\
-(int cpu)  \
+static int lock_policy_rwsem_##mode(int cpu)   \
 {  \
int policy_cpu = per_cpu(cpufreq_policy_cpu, cpu);  \
BUG_ON(policy_cpu == -1);   \
@@ -81,23 +80,18 @@ static int lock_policy_rwsem_##mode 
\
 }
 
 lock_policy_rwsem(read, cpu);
-
 lock_policy_rwsem(write, cpu);
 
-static void unlock_policy_rwsem_read(int cpu)
-{
-   int policy_cpu = per_cpu(cpufreq_policy_cpu, cpu);
-   BUG_ON(policy_cpu == -1);
-   up_read(&per_cpu(cpu_policy_rwsem, policy_cpu));
-}
-
-static void unlock_policy_rwsem_write(int cpu)
-{
-   int policy_cpu = per_cpu(cpufreq_policy_cpu, cpu);
-   BUG_ON(policy_cpu == -1);
-   up_write(&per_cpu(cpu_policy_rwsem, policy_cpu));
+#define unlock_policy_rwsem(mode, cpu) \
+static void unlock_policy_rwsem_##mode(int cpu)
\
+{  \
+   int policy_cpu = per_cpu(cpufreq_policy_cpu, cpu);  \
+   BUG_ON(policy_cpu == -1);   \
+   up_##mode(&per_cpu(cpu_policy_rwsem, policy_cpu));  \
 }
 
+unlock_policy_rwsem(read, cpu);
+unlock_policy_rwsem(write, cpu);
 
 /* internal prototypes */
 static int __cpufreq_governor(struct cpufreq_policy *policy,
-- 
1.7.12.rc2.18.g61b472e


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/4] cpufreq: Fix locking issues

2013-02-07 Thread Viresh Kumar

cpufreq core uses two locks:
- cpufreq_driver_lock: General lock for driver and cpufreq_cpu_data array.
- cpu_policy_rwsemfix locking: per CPU reader-writer semaphore designed to cure
  all cpufreq/hotplug/workqueue/etc related lock issues.

These locks were not used properly and are placed against their principle
(present before their definition) at various places. This patch is an attempt to
fix their use.

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/cpufreq.c | 79 +++
 1 file changed, 45 insertions(+), 34 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index c46aed4..79511ab 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -59,8 +59,6 @@ static DEFINE_SPINLOCK(cpufreq_driver_lock);
  *   mode before doing so.
  *
  * Additional rules:
- * - All holders of the lock should check to make sure that the CPU they
- *   are concerned with are online after they get the lock.
  * - Governor routines that can be called in cpufreq hotplug path should not
  *   take this sem as top level hotplug notifier handler takes this.
  * - Lock should not be held across
@@ -257,6 +255,7 @@ void cpufreq_notify_transition(struct cpufreq_freqs *freqs, 
unsigned int state)
 {
struct cpufreq_policy *policy;
struct cpufreq_driver *driver;
+   unsigned long flags;
 
BUG_ON(irqs_disabled());
 
@@ -269,7 +268,10 @@ void cpufreq_notify_transition(struct cpufreq_freqs 
*freqs, unsigned int state)
pr_debug("notification %u of frequency transition to %u kHz\n",
state, freqs->new);
 
+   spin_lock_irqsave(&cpufreq_driver_lock, flags);
policy = per_cpu(cpufreq_cpu_data, freqs->cpu);
+   spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
+
switch (state) {
 
case CPUFREQ_PRECHANGE:
@@ -792,8 +794,10 @@ static int cpufreq_add_dev_interface(unsigned int cpu,
 
if (ret) {
pr_debug("setting policy failed\n");
+   spin_lock_irqsave(&cpufreq_driver_lock, flags);
if (driver->exit)
driver->exit(policy);
+   spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
}
return ret;
 
@@ -814,22 +818,22 @@ static int cpufreq_add_policy_cpu(unsigned int cpu, 
unsigned int sibling,
policy = cpufreq_cpu_get(sibling);
WARN_ON(!policy);
 
-   per_cpu(cpufreq_policy_cpu, cpu) = policy->cpu;
-
-   lock_policy_rwsem_write(cpu);
-
__cpufreq_governor(policy, CPUFREQ_GOV_STOP);
 
+   lock_policy_rwsem_write(sibling);
+
spin_lock_irqsave(&cpufreq_driver_lock, flags);
+
cpumask_set_cpu(cpu, policy->cpus);
+   per_cpu(cpufreq_policy_cpu, cpu) = policy->cpu;
per_cpu(cpufreq_cpu_data, cpu) = policy;
spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
 
+   unlock_policy_rwsem_write(sibling);
+
__cpufreq_governor(policy, CPUFREQ_GOV_START);
__cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
 
-   unlock_policy_rwsem_write(cpu);
-
ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq");
if (ret) {
cpufreq_cpu_put(policy);
@@ -878,11 +882,15 @@ static int cpufreq_add_dev(struct device *dev, struct 
subsys_interface *sif)
 
 #ifdef CONFIG_HOTPLUG_CPU
/* Check if this cpu was hot-unplugged earlier and has siblings */
+   spin_lock_irqsave(&cpufreq_driver_lock, flags);
for_each_online_cpu(sibling) {
struct cpufreq_policy *cp = per_cpu(cpufreq_cpu_data, sibling);
-   if (cp && cpumask_test_cpu(cpu, cp->related_cpus))
+   if (cp && cpumask_test_cpu(cpu, cp->related_cpus)) {
+   spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
return cpufreq_add_policy_cpu(cpu, sibling, dev);
+   }
}
+   spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
 #endif
 #endif
 
@@ -907,20 +915,21 @@ static int cpufreq_add_dev(struct device *dev, struct 
subsys_interface *sif)
 
/* Initially set CPU itself as the policy_cpu */
per_cpu(cpufreq_policy_cpu, cpu) = cpu;
-   ret = (lock_policy_rwsem_write(cpu) < 0);
-   WARN_ON(ret);
 
init_completion(&policy->kobj_unregister);
INIT_WORK(&policy->update, handle_update);
 
+   spin_lock_irqsave(&cpufreq_driver_lock, flags);
/* call driver. From then on the cpufreq must be able
 * to accept all calls to ->verify and ->setpolicy for this CPU
 */
ret = driver->init(policy);
if (ret) {
pr_debug("initialization failed\n");
-   goto err_unlock_policy;
+   spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
+

[PATCH 2/4] cpufreq: Remove unused HOTPLUG_CPU code

2013-02-07 Thread Viresh Kumar

Because the sibling cpu of any online cpu is identified very early in
cpufreq_add_dev(), below code is never executed. And so can be removed.

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/cpufreq.c | 17 ++---
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 3b941a1..795b0e8 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -858,7 +858,7 @@ static int cpufreq_add_policy_cpu(unsigned int cpu, 
unsigned int sibling,
 static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
 {
unsigned int j, cpu = dev->id;
-   int ret = -ENOMEM, found = 0;
+   int ret = -ENOMEM;
struct cpufreq_policy *policy;
struct cpufreq_driver *driver;
unsigned long flags;
@@ -908,6 +908,7 @@ static int cpufreq_add_dev(struct device *dev, struct 
subsys_interface *sif)
goto err_free_cpumask;
 
policy->cpu = cpu;
+   policy->governor = CPUFREQ_DEFAULT_GOVERNOR;
cpumask_copy(policy->cpus, cpumask_of(cpu));
 
/* Initially set CPU itself as the policy_cpu */
@@ -918,20 +919,6 @@ static int cpufreq_add_dev(struct device *dev, struct 
subsys_interface *sif)
init_completion(&policy->kobj_unregister);
INIT_WORK(&policy->update, handle_update);
 
-   /* Set governor before ->init, so that driver could check it */
-#ifdef CONFIG_HOTPLUG_CPU
-   for_each_online_cpu(sibling) {
-   struct cpufreq_policy *cp = per_cpu(cpufreq_cpu_data, sibling);
-   if (cp && cp->governor &&
-   (cpumask_test_cpu(cpu, cp->related_cpus))) {
-   policy->governor = cp->governor;
-   found = 1;
-   break;
-   }
-   }
-#endif
-   if (!found)
-   policy->governor = CPUFREQ_DEFAULT_GOVERNOR;
/* call driver. From then on the cpufreq must be able
 * to accept all calls to ->verify and ->setpolicy for this CPU
 */
-- 
1.7.12.rc2.18.g61b472e


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 9135 matches

Mail list logo