from:"Qais Yousef"

Re: [PATCH 03/10] ALSA: add AXD Audio Processing IP alsa driver

2015-08-28 Thread Qais Yousef


On 08/27/2015 04:32 PM, Mark Brown wrote:

On Thu, Aug 27, 2015 at 01:15:51PM +0100, Qais Yousef wrote:

On 08/26/2015 07:37 PM, Mark Brown wrote:

On Mon, Aug 24, 2015 at 01:39:12PM +0100, Qais Yousef wrote:

+#define AXD_INPUT_DESCRIPTORS 10
+struct axd_input {
+   struct axd_buffer_desc descriptors[AXD_INPUT_DESCRIPTORS];
+};

Where do these numbers come from?  Are they hardware limits or something
else?

These numbers are what the firmware designed to work with. We had to set a
limit and we sought 10 to be a good one for our purposes. We don't expect to
need to change this number.

So we have hard coded numbers in the firmware that we need in the driver
but we can't read those numbers back from the firmware.  That's sad.


+#define AXD_BASE_VADDR 0xD000

This sounds like something that is going to be platform dependant,
should this be supplied from board configuration?

I don't expect this to change. Can we add the configuration later if we hit
the need to change it?

It should be trivial to make things configurable shouldn't it?


Yes and I am all with configurability but I don't think it makes sense 
here. AXD will always have its own MMU and will not share virtual 
address space, so the possibility of us wanting to move this somewhere 
else is really very thin. Also I don't think this is the kind of detail 
we need to concern the user with. I'll see if I can make the binary 
header parsing more flexible so we can add more info like this and the 
one above in the future and be more future proof.



+   if (!*offp) {
+   unsigned int flags = axd_platform_lock();
+   unsigned int log_offset = ioread32(log_addr);
+   unsigned int log_wrapped = ioread32(log_addr + 8);
+   char __iomem *log_buff = (char __iomem *)(log_addr + 12);
+
+   /* new read from beginning, fill up our internal buffer */
+   if (!log_wrapped) {
+   memcpy_fromio(axd->log_rbuf, log_buff, log_offset);
+   axd->log_rbuf_rem = log_offset;
+   } else {
+   char __iomem *pos = log_buff + log_offset;
+   unsigned int rem = log_size - log_offset;
+
+   memcpy_fromio(axd->log_rbuf, pos, rem);
+   memcpy_fromio(axd->log_rbuf + rem, log_buff, 
log_offset);
+   axd->log_rbuf_rem = log_size;
+   }
+   axd_platform_unlock(flags);

I didn't see the lock being taken?

The lock is the first line in the block (unsigned int flags =
axd_platform_lock()). I'll tidy it up to make it more readable.

It's very bad practice to bury lock taking in with the variable
declaration.


Yes. I'll fix it.


+#ifdef CONFIG_CRYPTO_LZO
+#include 

This include should be with all the other includes, not down here.

Was trying to reduce the ifdefery. Will fix.

You don't need any ifdefs for the include, you can just include the
header.


+{
+   dev_err(axd->dev, "The firmware must be lzo decompressed first, compile 
driver again with CONFIG_CRYPTO_LZO enabled in kernel or do the decompression in user 
space.\n");

Please split this up into a few prints for wrapping, similarly in
several other places.

OK. I thought the convention for strings to leave them as is to allow
grepping. I'll fix it.

You should keep strings that are displayed as a single string together
but if you are splitting something in the output then that split won't
hurt grepping in the source.


+* We copy through the cache, fw will do the necessary cache
+* flushes and syncing at startup.
+* Copying from uncached makes it more difficult for the
+* firmware to keep the caches coherent with memory when it sets
+* tlbs and start running.
+*/
+   memcpy_toio((void *)cached_fw_base, fw->data, fw->size);

Why the cast here?  I'm also not seeing where we handled the copying to
I/O in the decompression case?

I couldn't avoid the cast. If cached_fw_base is 'void *' I'll get a warning
when initialising cached_fw_base from CAC_ADDR().

Why do you get a warning from that?  Perhaps the warnings are trying to
tell us something...


Because we try to assign an int to a pointer. So the error is 'makes 
pointer from integer without a cast'. To convert an address from 
uncached to cached we need to convert to an int as in MIPS it's a case 
of adding or subtracting a value then convert this value back to it's 
original form.
I'll see if I can find a better way to fix the coherency issue when we 
copy through uncached.





Good point. When decompressing crypto_comp_decompress() will write directly
to the memory. It is safe but it doesn't go through the correct API. Not
sure what I can do here.

Uncompress to a buffer then write that buffer

Re: [PATCH 01/10] irqchip: irq-mips-gic: export gic_send_ipi

2015-08-28 Thread Qais Yousef


On 08/26/2015 10:40 PM, Thomas Gleixner wrote:

On Wed, 26 Aug 2015, Qais Yousef wrote:

On 08/26/2015 04:08 PM, Thomas Gleixner wrote:

IPI = Inter Processor Interrupt

As the name says that's an interrupt which goes from one cpu to
another. So an IPI has a very clear target.

OK understood. My interpretation of the processor here was the difference. I
was viewing the whole linux cpus as one unit with regard to its coprocessors.

You can only view it this way if you talk about peripheral interrupts
which are not used as per cpu interrupts and can be routed to a single
cpu or a set of cpus via set_affinity.


Whether the platform implements IPIs via general interrupts which are
made affine to a particular cpu or some other specialized mechanism is
completely irrelevant. An IPI is not subject to affinity settings,
period.

So if you want to use an IPI then you need a target cpu for that IPI.

If you want something which can be affined to any cpu, then you need a
general interrupt and not an IPI.

We are using IPIs to exchange interrupts. Affinity is not important to me.

That's a bold statement. If you chose CPU x as the target for the
interrupts received from the coprocessor, then you have pinned the
processing for this stuff on to CPU x. So you limit the freedom of
moving stuff around on the linux cpus.


I said that because I thought you were telling me if I'm expecting my 
IPIs to be movable then I must be using general interrupts. So what I 
was saying is that we use IPIs and if it's against the rule for them to 
have affinity we can live with that.




And if your root irq controller supports sending normal device
interrupts in the same or a similar way as it sends IPIs you can spare
quite some extra handling on the linux side for receiving the
coprocessor interrupt, i.e. you can use just the bog standard
request_irq() mechanism and have the ability to set the affinity of
that interrupt from user space so you can move it to the core on which
your processing happens. Definitely simpler and more flexible, so I
would go there if the hardware allows.


That's what I was trying to say but words failed me to explain it 
clearly maybe :(




But back to the IPIs. We need infrastructure and DT support to:

1) reserve an IPI

2) send an IPI

3) request/free an IPI

#1 We have no infrastructure for that, but we definitely need one.

We can look at the IPI as a single linux irq number which is
replicated on all cpu cores. The replication can happen in hardware
or by software, but that depends on the underlying root irq
controller. How that is implemented does not matter for the
reservation.

The most flexible and platform independent solution would be to
describe the IPI space as a seperate irq domain. In most cases this
would be a hierarchical domain stacked on the root irq domain:

[IPI-domain] --> [GIC-MIPS-domain]

on x86 this would be:

[IPI-domain] --> [vector-domain]

That needs some change how the IPIs which are used by the kernel
(rescheduling, function call ..) are set up, but we get a proper
management and collision avoidance that way. Depending on the
platform we could actually remove the whole IPI compile time
reservation and hand out IPIs at boot time on demand and
dynamically.

So the reservation function would be something like:

unsigned int irq_reserve_ipi(const struct cpumask *dest, void *devid);

@dest contains the possible targets for the IPI. So for generic
linux IPIs this would be cpu_possible_mask. For your coprocessor
the target would be a cpumask with just the bit of the coprocessor
core set. If you need to use an IPI for sending an interrupt from
the coprocessor to a specific linux core then @dest will contain
just that target cpu.

@devid is stored in the IPI domain for sanity checks during
operation.

The function returns a linux irq number or 0 if allocation fails.

We need a complementary interface as well, so you can hand back the
IPI to the core when the coprocessor is disabled:

void irq_destroy_ipi(unsigned int irq, void *devid);


To configure your coprocessor proper, we need a translation
mechanism from the linux interrupt number to the magic value which
needs to be written into the trigger register when the coprocessor
wants to send an interrupt or an IPI.

int irq_get_irq_hwcfg(unsigned int irq, struct irq_hwcfg *cfg);

struct irq_hwcfg needs to be defined, but it might look like this:

  {
/* Generic fields */
x;
...
union {
  mips_gic;
  ...
};
  };

The actual hw specific value(s) need to be filled in from the irq
domain specific code.


#2 We have no generic mechanism for that either.

Something like this is needed:

void irq_send_ipi(unsigned int irq, const struct cpumask *dest,
 vo

Re: [PATCH 01/10] irqchip: irq-mips-gic: export gic_send_ipi

2015-08-28 Thread Qais Yousef


On 08/28/2015 03:22 PM, Thomas Gleixner wrote:

 To configure your coprocessor proper, we need a translation
 mechanism from the linux interrupt number to the magic value which
 needs to be written into the trigger register when the coprocessor
 wants to send an interrupt or an IPI.

 int irq_get_irq_hwcfg(unsigned int irq, struct irq_hwcfg *cfg);

 struct irq_hwcfg needs to be defined, but it might look like this:

   {
/* Generic fields */
x;
...
union {
  mips_gic;
  ...
};
   };

That function provides you the information which you have to hand over
to your coprocessor firmware.




Of course!

* me slapping myself on the back *

Thanks,
Qais
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 05/10] ALSA: axd: add buffers manipulation files

2015-09-01 Thread Qais Yousef


On 08/29/2015 10:47 AM, Mark Brown wrote:

On Thu, Aug 27, 2015 at 03:21:17PM +0100, Qais Yousef wrote:

On 08/26/2015 07:43 PM, Mark Brown wrote:

On Mon, Aug 24, 2015 at 01:39:14PM +0100, Qais Yousef wrote:

+   /*
+* must ensure we have one access at a time to the queue and rd_idx
+* to be preemption and SMP safe
+* Sempahores will ensure that we will only read after a complete write
+* has finished, so we will never read and write from the same location.
+*/

In what way will sempahores ensure that we will only read after a
complete write?

This comment needs fixing. What it is trying to say is that if we reached
this point of the code then we're certainly allowed to modify the buffer
queue and {rd, wr}_idx because the semaphore would have gone to sleep
otherwise if the queue is full/empty.
Should I just remove the reference to Semaphores from the comment or worth
rephrasing it?

Any comments need to be comprehensible.


Would it be better to rename {rd, wr}_{idx, sem} to {take, put}_{idx, sem}?

I'm not sure that helps to be honest, the main issue is that the scheme
is fairly complex and unexplained.


+   buf = bufferq->queue[bufferq->rd_idx];

So buffers are always retired in the same order that they are acquired?

I don't think I get you here. axd_bufferq_take() and axd_bufferq_put() could
be called in any order.

Retiring buffers in the order they are acquired means that buffers are
always freed in the same order they are acquired, you can't free one
buffer before another that was acquired first.

What this code is trying to do is make a contiguous memory area behave as a
ring buffer. Then this ring buffer behave as a queue. We use semaphore
counts to control how many are available to take/put. rd_idx and wr_idx
should always point at the next location to take/put from/to.
Does this help answering your question?

No.  Why are we doing this?  Essentially all ALSA buffers are ring
buffers handled in blocks, why does this one need this complex locking
scheme?


There are 2 sides to this. The ALSA/driver iface and the driver/firmware 
one. The ALSA/driver iface is called from ALSA ops but the 
driver/firmware is handled by the interrupt and workqueues. The code is 
trying to deal with this concurrency. Also once AXD consumed a buffer it 
sends back an interrupt to the driver that it can reuse it, there's no 
guarantee that this returned buffer is in the same order it was sent.


I hear you though. Let me see how I can simplify this :-)


+void axd_bufferq_abort_put(struct axd_bufferq *bufferq)
+{
+   if (axd_bufferq_is_full(bufferq)) {
+   bufferq->abort_put = 1;
+   up(&bufferq->wr_sem);
+   }
+}

These look *incredibly* racy.  Why are they here and why are they safe?

If we want to restart the firmware we will need to abort any blocking reads
or writes for the user space to react. I also needed that to implement

I'm not questioning what the functionns are doing, I'm questioning their
implementation - it doesn't look like they are safe or reliable.  They
just set a flag, relying on something else to notice that the flag has
been set and act appropriately before it goes on and corrupts data.
That just screams concurrency issues.


OK. I'll see how I can rework the code to address all of your comments.

Thanks,
Qais


nonblocking access in user space when this was a sysfs based driver. It was
important then to implement omx IL component correctly.

Nobody cares about OMX ILs in mainline or sysfs based interfaces.


Do I need to support nonblock reads and writes in ALSA? If I use SIGKILL as
you suggested in the other email when restarting and nonblock is not
important then I can remove this.

It would be better to support non blocking access.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 06/10] ALSA: axd: add basic files for sending/receiving axd cmds

2015-09-01 Thread Qais Yousef


On 08/29/2015 11:18 AM, Mark Brown wrote:

On Thu, Aug 27, 2015 at 04:40:09PM +0100, Qais Yousef wrote:

On 08/26/2015 08:16 PM, Mark Brown wrote:

On Mon, Aug 24, 2015 at 01:39:15PM +0100, Qais Yousef wrote:

+unsigned long  axd_cmd_get_datain_address(struct axd_cmd *cmd)
+{
+   struct axd_dev *axd = container_of(cmd, struct axd_dev, cmd);
+
+   return (unsigned long) axd->buf_base_m;
+}

What's going on with these casts?

As with the other cases. buf_base_m is void * __iomem but we want to do some
arithmatic to help AXD start up and understand where it needs to run. I
agree they don't look nice and if I can avoid them I'd be happy to do so.

C supports poinnter arithmetic...


+static inline void axd_set_flag(unsigned int *flag, unsigned int value)
+{
+   *flag = value;
+   smp_wmb();  /* guarantee smp ordering */
+}
+static inline unsigned int axd_get_flag(unsigned int *flag)
+{
+   smp_rmb();  /* guarantee smp ordering */
+   return *flag;
+}

Please use a normal locking construct rather than hand rolling
something, or alternatively introduce new generic operations.  The fact
that you're hand rolling these things that have no driver specific
content is really worrying in terms of their safety.

I need to check atomic_ops.txt again but I think atomic_t is not always smb
safe. I definitely was running on a version of Meta archicture in the past
where atomic_t wasn't always smp safe.
I'll check if the rules have changed or something new was introduced to deal
with this.

It is true that when using atomic_t with multiprocessor you still need
memory barriers but that doesn't mean atomics bring no benefit.  But
that's not really the point here - the point is the more general one
that the whole idea of open coding memory barrier concurrency constructs
doesn't look great.  It makes the code much more complex and error prone
compared to using normal locking and other concurrency constructs (which
Linux has a rich set of).

If we really need performance then it can make sense but I'm not seeing
anything here that appears to motivate this.  In general all the
concurrency code looks much more complex than I would expect.


OK I'll improve on this.




+#define PIPE_STARTED   1
+#define PIPE_RUNNING   2

Why is the case with in place buffers not a simple zero iteration loop?

This is important when AXD is not consuming the data through I2S and
returning them to Linux. What we're trying to deal with here is the firmware
processed some data and expects Linux to consume whatever it has sent back
to it. We want to ensure that if the user suddenly stopped consuming this
data by closing the pipe to drop anything we receive back from AXD otherwise
the workqueue would block indefinitely waiting for the user that disappeared
to consume it causing a deadlock.

That doesn't seem to address the question...


I'm sorry I don't understand your question then. Can you rephrase it please?


+   axd_platform_irq_ack();

When would this ever be called anywhere else?  Just inline it (and it's
better practice to only ack things we handle...).

It wouldn't be called anywhere else but its implementation could be platform
specific that's why it's abstracted. At the moment it does nothing now we're
using MIPS but we shouldn't assume that this will always be the case.
The main purpose of this function is to deassert the interrupt line if the
way interrrupts are wired for that platform required so. In the past we were
running in hardware where interrupts are sent through special slave port and
the interrupt required to be acked or deasserted.

This sounds like something that should be in the interrupt controller
implementation not the leaf driver, just remove this unless you're
actually abstracting something.


We're actually abstracting something. This mechanism might not be part 
of an interrupt controller that is understood by Linux. At least I had 
this case in the past where the interrupt generated by AXD must be acked 
by writing to a special memory address.
We don't have a current user for it now though so it makes sense to 
remove it and if a similar user comes up in the future we can sort it 
out then.



+   flags = axd_platform_lock();
+   int_status = ioread32(&cmd->message->int_status);
+   iowrite32(0, &cmd->message->int_status);
+
+   if (!int_status)
+   goto out;

This should cause us to return IRQ_NONE.

I don't think it's necessary. It could happen that AXD sent a DATAIN
interrupt and shortly after sent DATAOUT interrupt but the handler was
running before the DATAOUT case is handled causing both interrupts to be
handled in one go but the handler could be called again to find out that
there's nothing to do.

Please implement your interrupt handler properly so that the genirq
error handling code

[RFC v2 PATCH 01/14] irq: add new IRQ_DOMAIN_FLAGS_IPI

2015-10-13 Thread Qais Yousef

This flag will be used to identify an IPI domain.

Signed-off-by: Qais Yousef 
---
 include/linux/irqdomain.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index d3ca79236fb0..9b3dc6c2a3cc 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -153,6 +153,9 @@ enum {
/* Core calls alloc/free recursive through the domain hierarchy. */
IRQ_DOMAIN_FLAG_AUTO_RECURSIVE  = (1 << 1),
 
+   /* Irq domain is an IPI domain */
+   IRQ_DOMAIN_FLAG_IPI = (1 << 2),
+
/*
 * Flags starting from IRQ_DOMAIN_FLAG_NONCORE are reserved
 * for implementation specific purposes and ignored by the
@@ -326,6 +329,11 @@ static inline bool irq_domain_is_hierarchy(struct 
irq_domain *domain)
 {
return domain->flags & IRQ_DOMAIN_FLAG_HIERARCHY;
 }
+
+static inline bool irq_domain_is_ipi(struct irq_domain *domain)
+{
+   return domain->flags & IRQ_DOMAIN_FLAG_IPI;
+}
 #else  /* CONFIG_IRQ_DOMAIN_HIERARCHY */
 static inline void irq_domain_activate_irq(struct irq_data *data) { }
 static inline void irq_domain_deactivate_irq(struct irq_data *data) { }
@@ -339,6 +347,11 @@ static inline bool irq_domain_is_hierarchy(struct 
irq_domain *domain)
 {
return false;
 }
+
+static inline bool irq_domain_is_ipi(struct irq_domain *domain)
+{
+   return false;
+}
 #endif /* CONFIG_IRQ_DOMAIN_HIERARCHY */
 
 #else /* CONFIG_IRQ_DOMAIN */
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC v2 PATCH 04/14] irq: add a new irq_send_ipi() to irq_chip

2015-10-13 Thread Qais Yousef

Introduce the new function to allow generic IPI send mechanism to be
used from drivers code.

Signed-off-by: Qais Yousef 
---
 include/linux/irq.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index 4b537e4d393b..504133671985 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -369,6 +369,7 @@ static inline irq_hw_number_t irqd_to_hwirq(struct irq_data 
*d)
  * @irq_get_irqchip_state: return the internal state of an interrupt
  * @irq_set_irqchip_state: set the internal state of a interrupt
  * @irq_set_vcpu_affinity: optional to target a vCPU in a virtual machine
+ * @irq_send_ipi:  send an IPI to destination cpus
  * @flags: chip specific flags
  */
 struct irq_chip {
@@ -413,6 +414,8 @@ struct irq_chip {
 
int (*irq_set_vcpu_affinity)(struct irq_data *data, void 
*vcpu_info);
 
+   void(*irq_send_ipi)(struct irq_data *data, const struct 
ipi_mask *dest);
+
unsigned long   flags;
 };
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC v2 PATCH 02/14] irq: add GENERIC_IRQ_IPI Kconfig symbol

2015-10-13 Thread Qais Yousef

irqchip should select this config to denote it supports generic IPI.

This will aid generic arch code to know when it can use generic IPI layer.

Signed-off-by: Qais Yousef 
---
 kernel/irq/Kconfig | 4 
 1 file changed, 4 insertions(+)

diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
index 9a76e3beda54..cb5044c54288 100644
--- a/kernel/irq/Kconfig
+++ b/kernel/irq/Kconfig
@@ -60,6 +60,10 @@ config IRQ_DOMAIN_HIERARCHY
bool
select IRQ_DOMAIN
 
+# Generic IRQ IPI support
+config GENERIC_IRQ_IPI
+   bool
+
 # Generic MSI interrupt support
 config GENERIC_MSI_IRQ
bool
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC v2 PATCH 00/14] Implement generic IPI support mechanism

2015-10-13 Thread Qais Yousef

This RFC series attempts to implement a generic IPI layer for reserving and 
sending IPIs.

It is based on the discussion in these links

https://lkml.org/lkml/2015/8/26/713
https://lkml.org/lkml/2015/9/29/875

In summary. We need a generic IPI layer to allow driver code to send IPIs to 
coprocessor
without caring about implementation details of the underlying controller.

Also it will help in making SMP IPI support more generic.

The goal is to have a mechanism to dynamically reserve an IPI to destination 
CPUs and
provide a single virq to send an IPI to any of these CPUs using generic 
irq_send_ipi()
API.

This v2 addresses the comments from v1 and implements a simpler mapping 
mechanism and moves
the irq_send_ipi() to be part of irqchip instead of irqdomain.

The implementation falls more natural and fits into place now (hopefully). So 
hopefully next
series would be non RFC. The only thing I haven't addressed is whether we want 
to make
request_percpu_irq() enable a coprocessor or defer that to the coprocessor 
itself.

This series is based on Linus tree. I couldn't compile test it because MIPS 
compilation was
broken due to other reasons. I expect some brokeness because of the 
introduction of
struct irq_common_data which is not present on the 4.1 tree I was testing my 
code on before
porting it to Linus tip. I will fix these issues and introduce proper accessors 
for accessing
struct ipi_mask given that the concept is approved.

I hope my commit messages aren't too terse.

Credit goes to Thomas for spec'ing and outlining the proper way to get this new 
API in.

Qais Yousef (14):
  irq: add new IRQ_DOMAIN_FLAGS_IPI
  irq: add GENERIC_IRQ_IPI Kconfig symbol
  irq: add new struct ipi_mask
  irq: add a new irq_send_ipi() to irq_chip
  irq: add struct ipi_mask to irq_data
  irq: add struct ipi_mapping and its helper functions
  irq: add a new generic IPI reservation code to irq core
  irq: implement irq_send_ipi
  MIPS: add support for generic SMP IPI support
  MIPS: make smp CMP, CPS and MT use the new generic IPI functions
  MIPS: delete smp-gic.c
  irqchip: mips-gic: add a IPI hierarchy domain
  irqchip: mips-gic: implement the new irq_send_ipi
  irqchip: mips-gic: remove IPI init code

 arch/mips/Kconfig|   6 --
 arch/mips/include/asm/smp-ops.h  |   5 +-
 arch/mips/kernel/Makefile|   1 -
 arch/mips/kernel/smp-cmp.c   |   4 +-
 arch/mips/kernel/smp-cps.c   |   4 +-
 arch/mips/kernel/smp-gic.c   |  64 ---
 arch/mips/kernel/smp-mt.c|   2 +-
 arch/mips/kernel/smp.c   | 117 
 drivers/irqchip/Kconfig  |   2 +
 drivers/irqchip/irq-mips-gic.c   | 225 ---
 include/linux/irq.h  |  43 
 include/linux/irqchip/mips-gic.h |   3 -
 include/linux/irqdomain.h|  19 
 kernel/irq/Kconfig   |   4 +
 kernel/irq/irqdomain.c   |  84 +++
 kernel/irq/manage.c  | 103 ++
 16 files changed, 517 insertions(+), 169 deletions(-)
 delete mode 100644 arch/mips/kernel/smp-gic.c

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC v2 PATCH 11/14] MIPS: delete smp-gic.c

2015-10-13 Thread Qais Yousef

We now have a generic IPI layer that will use GIC automatically
if it's compiled in.

Signed-off-by: Qais Yousef 
---
 arch/mips/Kconfig  |  6 -
 arch/mips/kernel/Makefile  |  1 -
 arch/mips/kernel/smp-gic.c | 64 --
 3 files changed, 71 deletions(-)
 delete mode 100644 arch/mips/kernel/smp-gic.c

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index e3aa5b0b4ef1..5a73c1217af7 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -2123,7 +2123,6 @@ config MIPS_MT_SMP
select CPU_MIPSR2_IRQ_VI
select CPU_MIPSR2_IRQ_EI
select SYNC_R4K
-   select MIPS_GIC_IPI
select MIPS_MT
select SMP
select SMP_UP
@@ -2221,7 +2220,6 @@ config MIPS_VPE_APSP_API_MT
 config MIPS_CMP
bool "MIPS CMP framework support (DEPRECATED)"
depends on SYS_SUPPORTS_MIPS_CMP && !CPU_MIPSR6
-   select MIPS_GIC_IPI
select SMP
select SYNC_R4K
select SYS_SUPPORTS_SMP
@@ -2241,7 +2239,6 @@ config MIPS_CPS
select MIPS_CM
select MIPS_CPC
select MIPS_CPS_PM if HOTPLUG_CPU
-   select MIPS_GIC_IPI
select SMP
select SYNC_R4K if (CEVT_R4K || CSRC_R4K)
select SYS_SUPPORTS_HOTPLUG_CPU
@@ -2259,9 +2256,6 @@ config MIPS_CPS_PM
select MIPS_CPC
bool
 
-config MIPS_GIC_IPI
-   bool
-
 config MIPS_CM
bool
 
diff --git a/arch/mips/kernel/Makefile b/arch/mips/kernel/Makefile
index d982be1ea1c3..a4bfc41d46b5 100644
--- a/arch/mips/kernel/Makefile
+++ b/arch/mips/kernel/Makefile
@@ -51,7 +51,6 @@ obj-$(CONFIG_MIPS_MT_FPAFF)   += mips-mt-fpaff.o
 obj-$(CONFIG_MIPS_MT_SMP)  += smp-mt.o
 obj-$(CONFIG_MIPS_CMP) += smp-cmp.o
 obj-$(CONFIG_MIPS_CPS) += smp-cps.o cps-vec.o
-obj-$(CONFIG_MIPS_GIC_IPI) += smp-gic.o
 obj-$(CONFIG_MIPS_SPRAM)   += spram.o
 
 obj-$(CONFIG_MIPS_VPE_LOADER)  += vpe.o
diff --git a/arch/mips/kernel/smp-gic.c b/arch/mips/kernel/smp-gic.c
deleted file mode 100644
index 5f0ab5bcd01e..
--- a/arch/mips/kernel/smp-gic.c
+++ /dev/null
@@ -1,64 +0,0 @@
-/*
- * Copyright (C) 2013 Imagination Technologies
- * Author: Paul Burton 
- *
- * Based on smp-cmp.c:
- *  Copyright (C) 2007 MIPS Technologies, Inc.
- *  Author: Chris Dearman (ch...@mips.com)
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the
- * Free Software Foundation;  either version 2 of the  License, or (at your
- * option) any later version.
- */
-
-#include 
-#include 
-
-#include 
-#include 
-
-void gic_send_ipi_single(int cpu, unsigned int action)
-{
-   unsigned long flags;
-   unsigned int intr;
-   unsigned int core = cpu_data[cpu].core;
-
-   pr_debug("CPU%d: %s cpu %d action %u status %08x\n",
-smp_processor_id(), __func__, cpu, action, read_c0_status());
-
-   local_irq_save(flags);
-
-   switch (action) {
-   case SMP_CALL_FUNCTION:
-   intr = plat_ipi_call_int_xlate(cpu);
-   break;
-
-   case SMP_RESCHEDULE_YOURSELF:
-   intr = plat_ipi_resched_int_xlate(cpu);
-   break;
-
-   default:
-   BUG();
-   }
-
-   gic_send_ipi(intr);
-
-   if (mips_cpc_present() && (core != current_cpu_data.core)) {
-   while (!cpumask_test_cpu(cpu, &cpu_coherent_mask)) {
-   mips_cpc_lock_other(core);
-   write_cpc_co_cmd(CPC_Cx_CMD_PWRUP);
-   mips_cpc_unlock_other();
-   }
-   }
-
-   local_irq_restore(flags);
-}
-
-void gic_send_ipi_mask(const struct cpumask *mask, unsigned int action)
-{
-   unsigned int i;
-
-   for_each_cpu(i, mask)
-   gic_send_ipi_single(i, action);
-}
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC v2 PATCH 06/14] irq: add struct ipi_mapping and its helper functions

2015-10-13 Thread Qais Yousef

struct ipi_mapping will provide a mechanism for irqdomain/architure
code to fill out the mapping for the generic code later to implement
generic IPI reserve and send functions.

Signed-off-by: Qais Yousef 
---
 include/linux/irq.h | 21 +++
 kernel/irq/manage.c | 59 +
 2 files changed, 80 insertions(+)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index b000b217ea24..c3d0f26c3eff 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -964,4 +964,25 @@ static inline u32 irq_reg_readl(struct irq_chip_generic 
*gc,
return readl(gc->reg_base + reg_offset);
 }
 
+#define INVALID_HWIRQ  -1
+
+/**
+ * struct ipi_mapping - IPI mapping information object
+ * @nr_hwirqs: number of hwirqs mapped
+ * @nr_cpus: number of cpus the controller can talk to
+ * @cpumap: per cpu hwirq mapping table
+ */
+struct ipi_mapping {
+   unsigned int nr_hwirqs;
+   unsigned int nr_cpus;
+   unsigned int *cpumap;
+};
+
+struct ipi_mapping *irq_alloc_ipi_mapping(unsigned int nr_cpus);
+void irq_free_ipi_mapping(struct ipi_mapping *map);
+int irq_map_ipi(struct ipi_mapping *map,
+   unsigned int cpu, irq_hw_number_t hwirq);
+int irq_unmap_ipi(struct ipi_mapping *map,
+ unsigned int cpu, irq_hw_number_t *hwirq);
+
 #endif /* _LINUX_IRQ_H */
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index f9a59f6cabd2..9a9bc0822c8f 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1924,3 +1924,62 @@ int irq_set_irqchip_state(unsigned int irq, enum 
irqchip_irq_state which,
return err;
 }
 EXPORT_SYMBOL_GPL(irq_set_irqchip_state);
+
+struct ipi_mapping *irq_alloc_ipi_mapping(unsigned int nr_cpus)
+{
+   struct ipi_mapping *map;
+   int i;
+
+   map = kzalloc(sizeof(struct ipi_mapping), GFP_KERNEL);
+   if (!map)
+   return NULL;
+
+   map->nr_cpus = nr_cpus;
+
+   map->cpumap = kmalloc(sizeof(irq_hw_number_t) * nr_cpus, GFP_KERNEL);
+   if (!map->cpumap) {
+   kfree(map);
+   return NULL;
+   }
+
+   for (i = 0; i < nr_cpus; i++)
+   map->cpumap[i] = INVALID_HWIRQ;
+
+   return map;
+}
+
+void irq_free_ipi_mapping(struct ipi_mapping *map)
+{
+   kfree(map->cpumap);
+   kfree(map);
+}
+
+int irq_map_ipi(struct ipi_mapping *map,
+   unsigned int cpu, irq_hw_number_t hwirq)
+{
+   if (cpu >= map->nr_cpus)
+   return -EINVAL;
+
+   map->cpumap[cpu] = hwirq;
+   map->nr_hwirqs++;
+
+   return 0;
+}
+
+int irq_unmap_ipi(struct ipi_mapping *map,
+ unsigned int cpu, irq_hw_number_t *hwirq)
+{
+   if (cpu >= map->nr_cpus)
+   return -EINVAL;
+
+   if (map->cpumap[cpu] == INVALID_HWIRQ)
+   return -EINVAL;
+
+   if (hwirq)
+   *hwirq = map->cpumap[cpu];
+
+   map->cpumap[cpu] = INVALID_HWIRQ;
+   map->nr_hwirqs--;
+
+   return 0;
+}
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC v2 PATCH 10/14] MIPS: make smp CMP, CPS and MT use the new generic IPI functions

2015-10-13 Thread Qais Yousef

Only the SMP variants that use GIC were converted as it's the only irqchip that
will have the support for generic IPI for now which will be introduced in
the following patches.

Signed-off-by: Qais Yousef 
---
 arch/mips/include/asm/smp-ops.h | 5 +++--
 arch/mips/kernel/smp-cmp.c  | 4 ++--
 arch/mips/kernel/smp-cps.c  | 4 ++--
 arch/mips/kernel/smp-mt.c   | 2 +-
 4 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/mips/include/asm/smp-ops.h b/arch/mips/include/asm/smp-ops.h
index 6ba1fb8b11e2..5e83aec0a14c 100644
--- a/arch/mips/include/asm/smp-ops.h
+++ b/arch/mips/include/asm/smp-ops.h
@@ -44,8 +44,9 @@ static inline void plat_smp_setup(void)
mp_ops->smp_setup();
 }
 
-extern void gic_send_ipi_single(int cpu, unsigned int action);
-extern void gic_send_ipi_mask(const struct cpumask *mask, unsigned int action);
+extern void generic_smp_send_ipi_single(int cpu, unsigned int action);
+extern void generic_smp_send_ipi_mask(const struct cpumask *mask,
+ unsigned int action);
 
 #else /* !CONFIG_SMP */
 
diff --git a/arch/mips/kernel/smp-cmp.c b/arch/mips/kernel/smp-cmp.c
index d5e0f949dc48..a70080500e04 100644
--- a/arch/mips/kernel/smp-cmp.c
+++ b/arch/mips/kernel/smp-cmp.c
@@ -149,8 +149,8 @@ void __init cmp_prepare_cpus(unsigned int max_cpus)
 }
 
 struct plat_smp_ops cmp_smp_ops = {
-   .send_ipi_single= gic_send_ipi_single,
-   .send_ipi_mask  = gic_send_ipi_mask,
+   .send_ipi_single= generic_smp_send_ipi_single,
+   .send_ipi_mask  = generic_smp_send_ipi_mask,
.init_secondary = cmp_init_secondary,
.smp_finish = cmp_smp_finish,
.boot_secondary = cmp_boot_secondary,
diff --git a/arch/mips/kernel/smp-cps.c b/arch/mips/kernel/smp-cps.c
index c88937745b4e..dd6834564ecb 100644
--- a/arch/mips/kernel/smp-cps.c
+++ b/arch/mips/kernel/smp-cps.c
@@ -444,8 +444,8 @@ static struct plat_smp_ops cps_smp_ops = {
.boot_secondary = cps_boot_secondary,
.init_secondary = cps_init_secondary,
.smp_finish = cps_smp_finish,
-   .send_ipi_single= gic_send_ipi_single,
-   .send_ipi_mask  = gic_send_ipi_mask,
+   .send_ipi_single= generic_smp_send_ipi_single,
+   .send_ipi_mask  = generic_smp_send_ipi_mask,
 #ifdef CONFIG_HOTPLUG_CPU
.cpu_disable= cps_cpu_disable,
.cpu_die= cps_cpu_die,
diff --git a/arch/mips/kernel/smp-mt.c b/arch/mips/kernel/smp-mt.c
index 86311a164ef1..33793abe5a20 100644
--- a/arch/mips/kernel/smp-mt.c
+++ b/arch/mips/kernel/smp-mt.c
@@ -121,7 +121,7 @@ static void vsmp_send_ipi_single(int cpu, unsigned int 
action)
 
 #ifdef CONFIG_MIPS_GIC
if (gic_present) {
-   gic_send_ipi_single(cpu, action);
+   generic_smp_send_ipi_single(cpu, action);
return;
}
 #endif
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC v2 PATCH 14/14] irqchip: mips-gic: remove IPI init code

2015-10-13 Thread Qais Yousef

It's now all handled in generic arch code.

Signed-off-by: Qais Yousef 
---
 drivers/irqchip/irq-mips-gic.c   | 79 
 include/linux/irqchip/mips-gic.h |  3 --
 2 files changed, 82 deletions(-)

diff --git a/drivers/irqchip/irq-mips-gic.c b/drivers/irqchip/irq-mips-gic.c
index 764470375cd0..8a56965d5980 100644
--- a/drivers/irqchip/irq-mips-gic.c
+++ b/drivers/irqchip/irq-mips-gic.c
@@ -585,83 +585,6 @@ static void gic_irq_dispatch(struct irq_desc *desc)
gic_handle_shared_int(true);
 }
 
-#ifdef CONFIG_MIPS_GIC_IPI
-static int gic_resched_int_base;
-static int gic_call_int_base;
-
-unsigned int plat_ipi_resched_int_xlate(unsigned int cpu)
-{
-   return gic_resched_int_base + cpu;
-}
-
-unsigned int plat_ipi_call_int_xlate(unsigned int cpu)
-{
-   return gic_call_int_base + cpu;
-}
-
-static irqreturn_t ipi_resched_interrupt(int irq, void *dev_id)
-{
-   scheduler_ipi();
-
-   return IRQ_HANDLED;
-}
-
-static irqreturn_t ipi_call_interrupt(int irq, void *dev_id)
-{
-   generic_smp_call_function_interrupt();
-
-   return IRQ_HANDLED;
-}
-
-static struct irqaction irq_resched = {
-   .handler= ipi_resched_interrupt,
-   .flags  = IRQF_PERCPU,
-   .name   = "IPI resched"
-};
-
-static struct irqaction irq_call = {
-   .handler= ipi_call_interrupt,
-   .flags  = IRQF_PERCPU,
-   .name   = "IPI call"
-};
-
-static __init void gic_ipi_init_one(unsigned int intr, int cpu,
-   struct irqaction *action)
-{
-   int virq = irq_create_mapping(gic_irq_domain,
- GIC_SHARED_TO_HWIRQ(intr));
-   int i;
-
-   gic_map_to_vpe(intr, mips_cm_vp_id(cpu));
-   for (i = 0; i < NR_CPUS; i++)
-   clear_bit(intr, pcpu_masks[i].pcpu_mask);
-   set_bit(intr, pcpu_masks[cpu].pcpu_mask);
-
-   irq_set_irq_type(virq, IRQ_TYPE_EDGE_RISING);
-
-   irq_set_handler(virq, handle_percpu_irq);
-   setup_irq(virq, action);
-}
-
-static __init void gic_ipi_init(void)
-{
-   int i;
-
-   /* Use last 2 * NR_CPUS interrupts as IPIs */
-   gic_resched_int_base = gic_shared_intrs - nr_cpu_ids;
-   gic_call_int_base = gic_resched_int_base - nr_cpu_ids;
-
-   for (i = 0; i < nr_cpu_ids; i++) {
-   gic_ipi_init_one(gic_call_int_base + i, i, &irq_call);
-   gic_ipi_init_one(gic_resched_int_base + i, i, &irq_resched);
-   }
-}
-#else
-static inline void gic_ipi_init(void)
-{
-}
-#endif
-
 static void __init gic_basic_init(void)
 {
unsigned int i;
@@ -979,8 +902,6 @@ static void __init __gic_init(unsigned long gic_base_addr,
bitmap_set(ipi_resrv, gic_shared_intrs - 2 * NR_CPUS, 2 * NR_CPUS);
 
gic_basic_init();
-
-   gic_ipi_init();
 }
 
 void __init gic_init(unsigned long gic_base_addr,
diff --git a/include/linux/irqchip/mips-gic.h b/include/linux/irqchip/mips-gic.h
index 4e6861605050..321278767506 100644
--- a/include/linux/irqchip/mips-gic.h
+++ b/include/linux/irqchip/mips-gic.h
@@ -258,9 +258,6 @@ extern void gic_write_compare(cycle_t cnt);
 extern void gic_write_cpu_compare(cycle_t cnt, int cpu);
 extern void gic_start_count(void);
 extern void gic_stop_count(void);
-extern void gic_send_ipi(unsigned int intr);
-extern unsigned int plat_ipi_call_int_xlate(unsigned int);
-extern unsigned int plat_ipi_resched_int_xlate(unsigned int);
 extern int gic_get_c0_compare_int(void);
 extern int gic_get_c0_perfcount_int(void);
 extern int gic_get_c0_fdc_int(void);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC v2 PATCH 08/14] irq: implement irq_send_ipi

2015-10-13 Thread Qais Yousef

There are 2 variants. __irq_desc_send_ipi() is meant to be used by arch code to
save the desc lookup when doing SMP IPIs.

irq_send_ipi() is meant for drivers that want to send IPIs to coprocessors they
interact with.

Signed-off-by: Qais Yousef 
---
 include/linux/irq.h |  3 +++
 kernel/irq/manage.c | 44 
 2 files changed, 47 insertions(+)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index c3d0f26c3eff..32c740ac95b4 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -985,4 +985,7 @@ int irq_map_ipi(struct ipi_mapping *map,
 int irq_unmap_ipi(struct ipi_mapping *map,
  unsigned int cpu, irq_hw_number_t *hwirq);
 
+int __irq_desc_send_ipi(struct irq_desc *desc, const struct ipi_mask *dest);
+int irq_send_ipi(unsigned int virq, const struct ipi_mask *dest);
+
 #endif /* _LINUX_IRQ_H */
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 9a9bc0822c8f..f2425116a243 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1983,3 +1983,47 @@ int irq_unmap_ipi(struct ipi_mapping *map,
 
return 0;
 }
+
+int __irq_desc_send_ipi(struct irq_desc *desc, const struct ipi_mask *dest)
+{
+   struct irq_data *data = irq_desc_get_irq_data(desc);
+   struct irq_chip *chip = irq_data_get_irq_chip(data);
+
+   if (!chip || !chip->irq_send_ipi)
+   return -EINVAL;
+
+   /*
+* Do not validate the mask for IPIs marked global. These are
+* regular IPIs so we can avoid the operation as their target
+* mask is the cpu_possible_mask.
+*/
+   if (!dest->global) {
+   if (!bitmap_subset(dest->cpumask, data->ipi_mask.cpumask,
+  dest->nbits))
+   return -EINVAL;
+   }
+
+   chip->irq_send_ipi(data, dest);
+   return 0;
+}
+
+/**
+ * irq_send_ipi() - send an IPI to target CPU(s)
+ * @irq: linux irq number from irq_reserve_ipi()
+ * @dest: dest CPU(s), must be the same or a subset of the mask passed to
+ *   irq_reserve_ipi()
+ *
+ * Sends an IPI to all cpus in dest mask.
+ *
+ * Returns 0 on success and errno otherwise..
+ */
+int irq_send_ipi(unsigned int virq, const struct ipi_mask *dest)
+{
+   struct irq_desc *desc = irq_to_desc(virq);
+
+   if (!desc)
+   return -EINVAL;
+
+   return __irq_desc_send_ipi(desc, dest);
+}
+EXPORT_SYMBOL(irq_send_ipi);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC v2 PATCH 13/14] irqchip: mips-gic: implement the new irq_send_ipi

2015-10-13 Thread Qais Yousef

Now irqchip has a irq_send_ipi() function, implement gic_send_ipi()
to make use of it

Signed-off-by: Qais Yousef 
---
 drivers/irqchip/irq-mips-gic.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-mips-gic.c b/drivers/irqchip/irq-mips-gic.c
index 9c8ece1d90f2..764470375cd0 100644
--- a/drivers/irqchip/irq-mips-gic.c
+++ b/drivers/irqchip/irq-mips-gic.c
@@ -266,9 +266,24 @@ static void gic_bind_eic_interrupt(int irq, int set)
  GIC_VPE_EIC_SS(irq), set);
 }
 
-void gic_send_ipi(unsigned int intr)
+static void gic_send_ipi(struct irq_data *d, const struct ipi_mask *cpumask)
 {
-   gic_write(GIC_REG(SHARED, GIC_SH_WEDGE), GIC_SH_WEDGE_SET(intr));
+   struct ipi_mapping *map = irq_data_get_irq_chip_data(d);
+   irq_hw_number_t intr;
+   int vpe;
+
+   if (!map)
+   return;
+
+   vpe = find_first_bit(cpumask->cpumask, cpumask->nbits);
+   while (vpe != cpumask->nbits) {
+   intr = map->cpumap[vpe];
+   if (intr == INVALID_HWIRQ)
+   continue;
+   gic_write(GIC_REG(SHARED, GIC_SH_WEDGE), 
GIC_SH_WEDGE_SET(intr));
+
+   vpe = find_next_bit(cpumask->cpumask, cpumask->nbits, vpe + 1);
+   }
 }
 
 int gic_get_c0_compare_int(void)
@@ -476,6 +491,7 @@ static struct irq_chip gic_edge_irq_controller = {
 #ifdef CONFIG_SMP
.irq_set_affinity   =   gic_set_affinity,
 #endif
+   .irq_send_ipi   =   gic_send_ipi,
 };
 
 static void gic_handle_local_int(bool chained)
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC v2 PATCH 12/14] irqchip: mips-gic: add a IPI hierarchy domain

2015-10-13 Thread Qais Yousef

Add a new ipi domain on top of the normal domain.

We set it as the default domain to allow arch code to use it without looking up
DT.

Signed-off-by: Qais Yousef 
---
 drivers/irqchip/Kconfig|   2 +
 drivers/irqchip/irq-mips-gic.c | 126 ++---
 2 files changed, 121 insertions(+), 7 deletions(-)

diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
index 27b52c8729cd..d9a9c8c8cbe1 100644
--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -167,6 +167,8 @@ config KEYSTONE_IRQ
 
 config MIPS_GIC
bool
+   select GENERIC_IRQ_IPI
+   select IRQ_DOMAIN_HIERARCHY
select MIPS_CM
 
 config INGENIC_IRQ
diff --git a/drivers/irqchip/irq-mips-gic.c b/drivers/irqchip/irq-mips-gic.c
index aeaa061f0dbf..9c8ece1d90f2 100644
--- a/drivers/irqchip/irq-mips-gic.c
+++ b/drivers/irqchip/irq-mips-gic.c
@@ -33,11 +33,14 @@ static void __iomem *gic_base;
 static struct gic_pcpu_mask pcpu_masks[NR_CPUS];
 static DEFINE_SPINLOCK(gic_lock);
 static struct irq_domain *gic_irq_domain;
+static struct irq_domain *gic_ipi_domain;
 static int gic_shared_intrs;
 static int gic_vpes;
 static unsigned int gic_cpu_pin;
 static unsigned int timer_cpu_pin;
 static struct irq_chip gic_level_irq_controller, gic_edge_irq_controller;
+DECLARE_BITMAP(ipi_intrs, GIC_MAX_INTRS);
+DECLARE_BITMAP(ipi_resrv, GIC_MAX_INTRS);
 
 static void __gic_irq_dispatch(void);
 
@@ -335,8 +338,14 @@ static void gic_handle_shared_int(bool chained)
 
intr = find_first_bit(pending, gic_shared_intrs);
while (intr != gic_shared_intrs) {
-   virq = irq_linear_revmap(gic_irq_domain,
-GIC_SHARED_TO_HWIRQ(intr));
+   if (test_bit(intr, ipi_intrs)) {
+   virq = irq_linear_revmap(gic_ipi_domain,
+   GIC_SHARED_TO_HWIRQ(intr));
+   } else {
+   virq = irq_linear_revmap(gic_irq_domain,
+   GIC_SHARED_TO_HWIRQ(intr));
+   }
+
if (chained)
generic_handle_irq(virq);
else
@@ -741,7 +750,7 @@ static int gic_local_irq_domain_map(struct irq_domain *d, 
unsigned int virq,
 }
 
 static int gic_shared_irq_domain_map(struct irq_domain *d, unsigned int virq,
-irq_hw_number_t hw)
+irq_hw_number_t hw, unsigned int vpe)
 {
int intr = GIC_HWIRQ_TO_SHARED(hw);
unsigned long flags;
@@ -751,9 +760,8 @@ static int gic_shared_irq_domain_map(struct irq_domain *d, 
unsigned int virq,
 
spin_lock_irqsave(&gic_lock, flags);
gic_map_to_pin(intr, gic_cpu_pin);
-   /* Map to VPE 0 by default */
-   gic_map_to_vpe(intr, 0);
-   set_bit(intr, pcpu_masks[0].pcpu_mask);
+   gic_map_to_vpe(intr, vpe);
+   set_bit(intr, pcpu_masks[vpe].pcpu_mask);
spin_unlock_irqrestore(&gic_lock, flags);
 
return 0;
@@ -764,7 +772,7 @@ static int gic_irq_domain_map(struct irq_domain *d, 
unsigned int virq,
 {
if (GIC_HWIRQ_TO_LOCAL(hw) < GIC_NUM_LOCAL_INTRS)
return gic_local_irq_domain_map(d, virq, hw);
-   return gic_shared_irq_domain_map(d, virq, hw);
+   return gic_shared_irq_domain_map(d, virq, hw, 0);
 }
 
 static int gic_irq_domain_xlate(struct irq_domain *d, struct device_node 
*ctrlr,
@@ -791,6 +799,98 @@ static const struct irq_domain_ops gic_irq_domain_ops = {
.xlate = gic_irq_domain_xlate,
 };
 
+static int gic_ipi_domain_xlate(struct irq_domain *d, struct device_node 
*ctrlr,
+   const u32 *intspec, unsigned int intsize,
+   irq_hw_number_t *out_hwirq,
+   unsigned int *out_type)
+{
+   /*
+* There's nothing to translate here. hwirq is dynamically allocated and
+* the irq type is always edge triggered.
+* */
+   *out_hwirq = 0;
+   *out_type = IRQ_TYPE_EDGE_RISING;
+
+   return 0;
+}
+
+static int gic_ipi_domain_alloc(struct irq_domain *d, unsigned int virq,
+   unsigned int nr_irqs, void *arg)
+{
+   struct ipi_mask *cpumask = arg;
+   int cpu, ret;
+   irq_hw_number_t hwirq;
+   struct ipi_mapping *map;
+
+   map = irq_alloc_ipi_mapping(gic_vpes);
+   if (!map)
+   return -ENOMEM;
+
+   if (cpumask->nbits > map->nr_cpus)
+   return -EINVAL;
+
+   cpu = find_first_bit(cpumask->cpumask, cpumask->nbits);
+   while (cpu != cpumask->nbits) {
+   hwirq = find_first_bit(ipi_resrv, gic_shared_intrs);
+   if (hwirq == gic_shared_intrs) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   bitmap_clear(ipi_resrv, hwirq, 1);
+   bitmap_set(ipi_intrs, hwirq, 1);
+

[RFC v2 PATCH 09/14] MIPS: add support for generic SMP IPI support

2015-10-13 Thread Qais Yousef

Use the new generic IPI layer to provide generic SMP IPI support if the irqchip
supports it.

Signed-off-by: Qais Yousef 
---
 arch/mips/kernel/smp.c | 117 +
 1 file changed, 117 insertions(+)

diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
index bd4385a8e6e8..6cbadfd17439 100644
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -79,6 +80,11 @@ static cpumask_t cpu_core_setup_map;
 
 cpumask_t cpu_coherent_mask;
 
+#ifdef CONFIG_GENERIC_IRQ_IPI
+static struct irq_desc *call_desc;
+static struct irq_desc *sched_desc;
+#endif
+
 static inline void set_cpu_sibling_map(int cpu)
 {
int i;
@@ -145,6 +151,117 @@ void register_smp_ops(struct plat_smp_ops *ops)
mp_ops = ops;
 }
 
+#ifdef CONFIG_GENERIC_IRQ_IPI
+void generic_smp_send_ipi_single(int cpu, unsigned int action)
+{
+   generic_smp_send_ipi_mask(cpumask_of(cpu), action);
+}
+
+void generic_smp_send_ipi_mask(const struct cpumask *mask, unsigned int action)
+{
+   unsigned long flags;
+   unsigned int core;
+   int cpu;
+   struct ipi_mask ipi_mask;
+
+   ipi_mask.cpumask = ((struct cpumask *)mask)->bits;
+   ipi_mask.nbits = NR_CPUS;
+   ipi_mask.global = true;
+
+   local_irq_save(flags);
+
+   switch (action) {
+   case SMP_CALL_FUNCTION:
+   __irq_desc_send_ipi(call_desc, &ipi_mask);
+   break;
+
+   case SMP_RESCHEDULE_YOURSELF:
+   __irq_desc_send_ipi(sched_desc, &ipi_mask);
+   break;
+
+   default:
+   BUG();
+   }
+
+   if (mips_cpc_present() && (core != current_cpu_data.core)) {
+   for_each_cpu(cpu, mask) {
+   core = cpu_data[cpu].core;
+   while (!cpumask_test_cpu(cpu, &cpu_coherent_mask)) {
+   mips_cpc_lock_other(core);
+   write_cpc_co_cmd(CPC_Cx_CMD_PWRUP);
+   mips_cpc_unlock_other();
+   }
+   }
+   }
+
+   local_irq_restore(flags);
+}
+
+
+static irqreturn_t ipi_resched_interrupt(int irq, void *dev_id)
+{
+   scheduler_ipi();
+
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t ipi_call_interrupt(int irq, void *dev_id)
+{
+   smp_call_function_interrupt();
+
+   return IRQ_HANDLED;
+}
+
+static struct irqaction irq_resched = {
+   .handler= ipi_resched_interrupt,
+   .flags  = IRQF_PERCPU,
+   .name   = "IPI resched"
+};
+
+static struct irqaction irq_call = {
+   .handler= ipi_call_interrupt,
+   .flags  = IRQF_PERCPU,
+   .name   = "IPI call"
+};
+
+static __init void smp_ipi_init_one(unsigned int virq,
+   struct irqaction *action)
+{
+   int ret;
+
+   irq_set_handler(virq, handle_percpu_irq);
+   ret = setup_irq(virq, action);
+   BUG_ON(ret);
+}
+
+static int __init generic_smp_ipi_init(void)
+{
+   unsigned int call_virq, sched_virq;
+   struct ipi_mask ipi_mask;
+   int cpu;
+
+   ipi_mask.cpumask = ((struct cpumask *)cpu_possible_mask)->bits;
+   ipi_mask.nbits = NR_CPUS;
+
+   call_virq = irq_reserve_ipi(NULL, &ipi_mask);
+   BUG_ON(!call_virq);
+
+   sched_virq = irq_reserve_ipi(NULL, &ipi_mask);
+   BUG_ON(!sched_virq);
+
+   for_each_cpu(cpu, cpu_possible_mask) {
+   smp_ipi_init_one(call_virq + cpu, &irq_call);
+   smp_ipi_init_one(sched_virq + cpu, &irq_resched);
+   }
+
+   call_desc = irq_to_desc(call_virq);
+   sched_desc = irq_to_desc(sched_virq);
+
+   return 0;
+}
+early_initcall(generic_smp_ipi_init);
+#endif
+
 /*
  * First C code run on the secondary CPUs after being started up by
  * the master.
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC v2 PATCH 07/14] irq: add a new generic IPI reservation code to irq core

2015-10-13 Thread Qais Yousef

Add a generic mechanism to dynamically allocate an IPI.

With this change the user can call irq_reserve_ipi() to dynamically allocate an
IPI and use the associate virq to send one to 1 or more cpus.

Signed-off-by: Qais Yousef 
---
 include/linux/irqdomain.h |  6 
 kernel/irq/irqdomain.c| 84 +++
 2 files changed, 90 insertions(+)

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index 9b3dc6c2a3cc..f5003f5fd530 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -41,6 +41,7 @@ struct irq_domain;
 struct of_device_id;
 struct irq_chip;
 struct irq_data;
+struct ipi_mask;
 
 /* Number of irqs reserved for a legacy isa controller */
 #define NUM_ISA_INTERRUPTS 16
@@ -280,6 +281,11 @@ int irq_domain_xlate_onetwocell(struct irq_domain *d, 
struct device_node *ctrlr,
const u32 *intspec, unsigned int intsize,
irq_hw_number_t *out_hwirq, unsigned int *out_type);
 
+/* IPI functions */
+unsigned int irq_reserve_ipi(struct irq_domain *domain,
+const struct ipi_mask *dest);
+void irq_destroy_ipi(unsigned int irq);
+
 /* V2 interfaces to support hierarchy IRQ domains. */
 extern struct irq_data *irq_domain_get_irq_data(struct irq_domain *domain,
unsigned int virq);
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index dc9d27c0c158..781407f7d692 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -749,6 +749,90 @@ static int irq_domain_alloc_descs(int virq, unsigned int 
cnt,
return virq;
 }
 
+/**
+ * irq_reserve_ipi() - setup an IPI to destination cpumask
+ * @domain: IPI domain
+ * @dest: cpumask of cpus to receive the IPI
+ *
+ * Allocate a virq that can be used to send IPI to any CPU in dest mask.
+ *
+ * On success it'll return linux irq number and 0 on failure
+ */
+unsigned int irq_reserve_ipi(struct irq_domain *domain,
+const struct ipi_mask *dest)
+{
+   struct irq_data *data;
+   int virq;
+   unsigned int nr_irqs;
+
+   if (domain == NULL)
+   domain = irq_default_domain; /* need a separate 
ipi_default_domain? */
+
+   if (domain == NULL) {
+   pr_warn("Must provide a valid IPI domain!\n");
+   return 0;
+   }
+
+   if (!irq_domain_is_ipi(domain)) {
+   pr_warn("Not an IPI domain!\n");
+   return 0;
+   }
+
+   /* always allocate a virq per cpu */
+   nr_irqs = bitmap_weight(dest->cpumask, dest->nbits);;
+
+   virq = irq_domain_alloc_descs(-1, nr_irqs, 0, NUMA_NO_NODE);
+   if (virq <= 0) {
+   pr_warn("Can't reserve IPI, failed to alloc descs\n");
+   return 0;
+   }
+
+   /* we are reusing hierarchy alloc function, should we create another 
one? */
+   virq = __irq_domain_alloc_irqs(domain, virq, nr_irqs, NUMA_NO_NODE,
+   (void *) dest, true);
+   if (virq <= 0) {
+   pr_warn("Can't reserve IPI, failed to alloc irqs\n");
+   goto free_descs;
+   }
+
+   data = irq_get_irq_data(virq);
+   bitmap_copy(data->ipi_mask.cpumask, dest->cpumask, dest->nbits);
+   data->ipi_mask.nbits = dest->nbits;
+
+   return virq;
+
+free_descs:
+   irq_free_descs(virq, nr_irqs);
+   return 0;
+}
+
+/**
+ * irq_destroy_ipi() - unreserve an IPI that was previously allocated
+ * @irq: linux irq number to be destroyed
+ *
+ * Return an IPI allocated with irq_reserve_ipi() to the system.
+ */
+void irq_destroy_ipi(unsigned int irq)
+{
+   struct irq_data *data = irq_get_irq_data(irq);
+   struct irq_domain *domain;
+
+   if (!irq || !data)
+   return;
+
+   domain = data->domain;
+   if (WARN_ON(domain == NULL))
+   return;
+
+   if (!irq_domain_is_ipi(domain)) {
+   pr_warn("Not an IPI domain!\n");
+   return;
+   }
+
+   irq_domain_free_irqs(irq,
+   bitmap_weight(data->ipi_mask.cpumask, data->ipi_mask.nbits));
+}
+
 #ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
 /**
  * irq_domain_add_hierarchy - Add a irqdomain into the hierarchy
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC v2 PATCH 03/14] irq: add new struct ipi_mask

2015-10-13 Thread Qais Yousef

cpumask is limited to NR_CPUS. introduce ipi_mask which allows us to address
cpu range that is higher than NR_CPUS which is required for drivers to send
IPIs for coprocessor that are outside Linux CPU range.

Signed-off-by: Qais Yousef 
---
 include/linux/irq.h | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index 11bf09288ddb..4b537e4d393b 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -125,6 +125,21 @@ enum {
 struct msi_desc;
 struct irq_domain;
 
+ /**
+ * struct ipi_mask - IPI mask information
+ * @cpumask: bitmap of cpumasks
+ * @nbits: number of bits in cpumask
+ * @global: whether the mask is SMP IPI ie: subset of cpu_possible_mask or not
+ *
+ * ipi_mask is similar to cpumask, but it provides nbits that's configurable
+ * rather than fixed to NR_CPUS.
+ */
+struct ipi_mask {
+   unsigned long   *cpumask;
+   unsigned intnbits;
+   boolglobal;
+};
+
 /**
  * struct irq_common_data - per irq data shared by all irqchips
  * @state_use_accessors: status information for irq chip functions.
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC v2 PATCH 05/14] irq: add struct ipi_mask to irq_data

2015-10-13 Thread Qais Yousef

It has a similar role to affinity mask, but tracks the IPI affinity instead.

Signed-off-by: Qais Yousef 
---
 include/linux/irq.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index 504133671985..b000b217ea24 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -157,6 +157,7 @@ struct irq_common_data {
void*handler_data;
struct msi_desc *msi_desc;
cpumask_var_t   affinity;
+   struct ipi_mask ipi_mask;
 };
 
 /**
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v2 PATCH 03/14] irq: add new struct ipi_mask

2015-10-13 Thread Qais Yousef


On 10/13/2015 02:26 PM, Thomas Gleixner wrote:

On Tue, 13 Oct 2015, Qais Yousef wrote:

cpumask is limited to NR_CPUS. introduce ipi_mask which allows us to address
cpu range that is higher than NR_CPUS which is required for drivers to send
IPIs for coprocessor that are outside Linux CPU range.

Signed-off-by: Qais Yousef 
---
  include/linux/irq.h | 15 +++
  1 file changed, 15 insertions(+)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index 11bf09288ddb..4b537e4d393b 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -125,6 +125,21 @@ enum {
  struct msi_desc;
  struct irq_domain;
  
+ /**

+ * struct ipi_mask - IPI mask information
+ * @cpumask: bitmap of cpumasks
+ * @nbits: number of bits in cpumask
+ * @global: whether the mask is SMP IPI ie: subset of cpu_possible_mask or not
+ *
+ * ipi_mask is similar to cpumask, but it provides nbits that's configurable
+ * rather than fixed to NR_CPUS.
+ */
+struct ipi_mask {
+   unsigned long   *cpumask;
+   unsigned intnbits;
+   boolglobal;
+};

Can you make that:

struct ipi_mask {
unsigned intnbits;
boolglobal;
unsigned long   cpu_bitmap[];
};

That allows you to allocate the data structure in one go. So the
ipi_mask in irq_data_common becomes a pointer which is only filled in
when ipi_mask is actually used.

Note, I renamed cpumask to cpu_bitmap to avoid confusion with
cpumasks.

We also want a helper function

struct cpumask *irq_data_get_ipi_mask(struct irq_data *data);

so we can use normal cpumask operations for the majority of cases.




Will do.

Thanks,
Qais
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v2 PATCH 06/14] irq: add struct ipi_mapping and its helper functions

2015-10-13 Thread Qais Yousef


On 10/13/2015 02:31 PM, Thomas Gleixner wrote:

On Tue, 13 Oct 2015, Qais Yousef wrote:

+
+int irq_unmap_ipi(struct ipi_mapping *map,
+ unsigned int cpu, irq_hw_number_t *hwirq)
+{
+   if (cpu >= map->nr_cpus)
+   return -EINVAL;
+
+   if (map->cpumap[cpu] == INVALID_HWIRQ)
+   return -EINVAL;
+
+   if (hwirq)
+   *hwirq = map->cpumap[cpu];
Why do we store hwirq in unmap?


So that the irqchip driver can return the hwirq to its available IPI pool.



All these new functions lack kerneldoc comments.


Apologies about the missing docs here and in other places.

Thanks,
Qais
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v2 PATCH 07/14] irq: add a new generic IPI reservation code to irq core

2015-10-13 Thread Qais Yousef


On 10/13/2015 02:37 PM, Thomas Gleixner wrote:

On Tue, 13 Oct 2015, Qais Yousef wrote:


+   if (domain == NULL)
+   domain = irq_default_domain; /* need a separate 
ipi_default_domain? */

No tail comments please.

We should neither use irq_default_domain nor have an
ipi_default_domain.


OK though I understood that you were OK with using the irq_default_domain.

This means that arch code must parse the DT for an IPI domain. I think 
I've seen arch code using the root FDT to search for a specific node. 
I'll try to do something similar to search for an IPI domain.



+
+   virq = irq_domain_alloc_descs(-1, nr_irqs, 0, NUMA_NO_NODE);
+   if (virq <= 0) {
+   pr_warn("Can't reserve IPI, failed to alloc descs\n");
+   return 0;
+   }
+
+   /* we are reusing hierarchy alloc function, should we create another 
one? */
+   virq = __irq_domain_alloc_irqs(domain, virq, nr_irqs, NUMA_NO_NODE,
+   (void *) dest, true);
+   if (virq <= 0) {
+   pr_warn("Can't reserve IPI, failed to alloc irqs\n");
+   goto free_descs;
+   }
+
+   data = irq_get_irq_data(virq);
+   bitmap_copy(data->ipi_mask.cpumask, dest->cpumask, dest->nbits);
+   data->ipi_mask.nbits = dest->nbits;

This does only initialize the first virq data. What about the others?


Right I missed that. I'll fix it.





+   return virq;
+
+free_descs:
+   irq_free_descs(virq, nr_irqs);
+   return 0;
+}
+
+/**
+ * irq_destroy_ipi() - unreserve an IPI that was previously allocated
+ * @irq: linux irq number to be destroyed
+ *
+ * Return an IPI allocated with irq_reserve_ipi() to the system.

That wants to explain that it actually destroys a number of virqs not
just the primary one.




OK I'll expand on that.

Thanks,
Qais
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v2 PATCH 08/14] irq: implement irq_send_ipi

2015-10-13 Thread Qais Yousef


On 10/13/2015 02:40 PM, Thomas Gleixner wrote:

On Tue, 13 Oct 2015, Qais Yousef wrote:

Lacks kerneldoc


+int __irq_desc_send_ipi(struct irq_desc *desc, const struct ipi_mask *dest)
+{
+   struct irq_data *data = irq_desc_get_irq_data(desc);
+   struct irq_chip *chip = irq_data_get_irq_chip(data);
+
+   if (!chip || !chip->irq_send_ipi)
+   return -EINVAL;
+
+   /*
+* Do not validate the mask for IPIs marked global. These are
+* regular IPIs so we can avoid the operation as their target
+* mask is the cpu_possible_mask.
+*/
+   if (!dest->global) {
+   if (!bitmap_subset(dest->cpumask, data->ipi_mask.cpumask,
+  dest->nbits))
+   return -EINVAL;
+   }

This looks half thought out. You rely on the caller getting the global
bit right. There should be a sanity check for this versus
data->ipi_mask and also you need to validate nbits.


Yes I might have rushed this part as I did it last. I'll improve it.




+EXPORT_SYMBOL(irq_send_ipi);

EXPORT_SYMBOL_GPL please




OK.

Thanks,
Qais
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v2 PATCH 00/14] Implement generic IPI support mechanism

2015-10-13 Thread Qais Yousef


On 10/13/2015 02:53 PM, Thomas Gleixner wrote:

On Tue, 13 Oct 2015, Qais Yousef wrote:


This series is based on Linus tree. I couldn't compile test it
because MIPS compilation was broken due to other reasons. I expect
some brokeness because of the introduction of struct irq_common_data
which is not present on the 4.1 tree I was testing my code on before
porting it to Linus tip. I will fix these issues and introduce
proper accessors for accessing struct ipi_mask given that the
concept is approved.

Please base it on 4.1-rc5 + irq/core.
  

   irq: add new IRQ_DOMAIN_FLAGS_IPI

The proper prefix for the core parts is 'genirq:'. Please start the
sentence after the prefix with an uppercase letter


   irq: add GENERIC_IRQ_IPI Kconfig symbol
   irq: add new struct ipi_mask
   irq: add a new irq_send_ipi() to irq_chip
   irq: add struct ipi_mask to irq_data
   irq: add struct ipi_mapping and its helper functions
   irq: add a new generic IPI reservation code to irq core
   irq: implement irq_send_ipi
   MIPS: add support for generic SMP IPI support
   MIPS: make smp CMP, CPS and MT use the new generic IPI functions
   MIPS: delete smp-gic.c
   irqchip: mips-gic: add a IPI hierarchy domain

Please make that

irqchip/mips-gic: Add 





Will do. Thanks a lot for the review and all the pointers. I need to 
revive the DT binding discussion now, in the proper list this time.


Thanks,
Qais
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v2 PATCH 09/14] MIPS: add support for generic SMP IPI support

2015-10-13 Thread Qais Yousef


On 10/13/2015 02:48 PM, Thomas Gleixner wrote:

On Tue, 13 Oct 2015, Qais Yousef wrote:
  
+#ifdef CONFIG_GENERIC_IRQ_IPI

+void generic_smp_send_ipi_single(int cpu, unsigned int action)

Please use a mips name space. This suggests that it s completely
generic, which is not true.


+{
+   generic_smp_send_ipi_mask(cpumask_of(cpu), action);
+}
+
+void generic_smp_send_ipi_mask(const struct cpumask *mask, unsigned int action)

Ditto.


OK.


+{
+   unsigned long flags;
+   unsigned int core;
+   int cpu;
+   struct ipi_mask ipi_mask;
+
+   ipi_mask.cpumask = ((struct cpumask *)mask)->bits;

We have accessors for that. Hmm, so for this case we must make the
ipi_mask different:

struct ipi_mask {
unsigned intnbits;
boolglobal;
union {
 struct cpumask *mask;
 unsigned long  cpu_bitmap[];
};
};



Right. This looks cleaner for sure. Not sure why I haven't though about 
this.


Thanks,
Qais
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v2 PATCH 10/14] MIPS: make smp CMP, CPS and MT use the new generic IPI functions

2015-10-13 Thread Qais Yousef


On 10/13/2015 02:49 PM, Thomas Gleixner wrote:

On Tue, 13 Oct 2015, Qais Yousef wrote:


Only the SMP variants that use GIC were converted as it's the only irqchip that
will have the support for generic IPI for now which will be introduced in
the following patches.

You break bisectability. Introduce the new gic magic first and then
switch it over.




Yes I was trying to make the review easier by grouping things together. 
I might need to squash some of these last patches together to avoid 
breaking bisectability.


Thanks,
Qais
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Generic DT binding for IPIs

2015-10-14 Thread Qais Yousef


Hi,

This is an attempt to revive a discussion on the right list this time 
with all the correct people hopefully on CC.


While trying to upstream a driver, Thomas and Marc Zyngier pointed out 
the need for a generic IPI support in the kernel to allow driver to 
reserve and send ones. Hopefully my latest RFC patch will help to 
clarify what's being done.


https://lkml.org/lkml/2015/10/13/227

We need a generic DT binding support to accompany that to allow a driver 
to reserve an IPI using this new mechanism.


MarcZ had the following suggestion:

https://lkml.org/lkml/2015/8/24/628

Which in summary is

mydevice@f000 {
interrupt-source = <&intc INT_SPEC 2 &inttarg1 &inttarg1>;
};

inttarg1: mydevice@f100 {
interrupt-sink = <&intc HWAFFINITY1>;
};

inttarg2: cpu@1 {
interrupt-sink = <&intc HWAFFINITY2>;
};


interrupt-sink requests to reserve an IPI that it will receive at 
HWAFFINITY cpumask. interrupt-source will not do any reservation. It 
will simply connect an IPI reserved by interrupt-sink to the device that 
will be responsible for generating that IPI. This description should 
allow connecting any 2 devices.

Correct me Marc if I got it wrong please.

I suggested a simplification by assuming that IPIs will only be between 
host OS and a coprocessor which would gives us this form which I think 
is easier to deal with


coprocessor {
 interrupt-source = <&intc INT_SPEC COP_HWAFFINITY>;
 interrupt-sink = <&intc INT_SPEC CPU_HWAFFINITY>;
}


interrupt-source here reserves an IPI to be sent from host OS to 
coprocessor at COP_HWAFFINITY. interrupt-sink will reserve an IPI to be 
received by host OS at CPU_HWAFFINITY. Less generic but I don't know how 
important it is for host OS to setup IPIs between 2 external 
coprocessors and whether it should really be doing that.


What do the DT experts think? Any preference or a better suggestion?

I tried to keep this short and simple, please let me know if you need 
more info or if there's anything that needs more clarification.


Thanks,
Qais
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v2 PATCH 05/14] irq: add struct ipi_mask to irq_data

2015-10-14 Thread Qais Yousef


On 10/14/2015 03:50 PM, Davidlohr Bueso wrote:

On Tue, 13 Oct 2015, Qais Yousef wrote:

It has a similar role to affinity mask, but tracks the IPI affinity 
instead.


Signed-off-by: Qais Yousef 
---
include/linux/irq.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index 504133671985..b000b217ea24 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -157,6 +157,7 @@ struct irq_common_data {
void*handler_data;
struct msi_desc*msi_desc;
cpumask_var_taffinity;
+struct ipi_maskipi_mask;
};


This should be folded into patch 3, no?


For me they are 2 separate changes, hence the 2 commits. They could be 
folded but I prefer the septation unless there's a strong opinion not to.


Thanks,
Qais
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v2 PATCH 00/14] Implement generic IPI support mechanism

2015-10-14 Thread Qais Yousef


On 10/14/2015 04:04 PM, Davidlohr Bueso wrote:

On Tue, 13 Oct 2015, Qais Yousef wrote:


Qais Yousef (14):
 irq: add new IRQ_DOMAIN_FLAGS_IPI
 irq: add GENERIC_IRQ_IPI Kconfig symbol
 irq: add new struct ipi_mask
 irq: add a new irq_send_ipi() to irq_chip
 irq: add struct ipi_mask to irq_data
 irq: add struct ipi_mapping and its helper functions
 irq: add a new generic IPI reservation code to irq core
 irq: implement irq_send_ipi
 MIPS: add support for generic SMP IPI support
 MIPS: make smp CMP, CPS and MT use the new generic IPI functions
 MIPS: delete smp-gic.c
 irqchip: mips-gic: add a IPI hierarchy domain
 irqchip: mips-gic: implement the new irq_send_ipi
 irqchip: mips-gic: remove IPI init code

arch/mips/Kconfig|   6 --
arch/mips/include/asm/smp-ops.h  |   5 +-
arch/mips/kernel/Makefile|   1 -
arch/mips/kernel/smp-cmp.c   |   4 +-
arch/mips/kernel/smp-cps.c   |   4 +-
arch/mips/kernel/smp-gic.c   |  64 ---
arch/mips/kernel/smp-mt.c|   2 +-
arch/mips/kernel/smp.c   | 117 
drivers/irqchip/Kconfig  |   2 +
drivers/irqchip/irq-mips-gic.c   | 225 
---

include/linux/irq.h  |  43 
include/linux/irqchip/mips-gic.h |   3 -
include/linux/irqdomain.h|  19 
kernel/irq/Kconfig   |   4 +
kernel/irq/irqdomain.c   |  84 +++
kernel/irq/manage.c  | 103 ++
16 files changed, 517 insertions(+), 169 deletions(-)
delete mode 100644 arch/mips/kernel/smp-gic.c


It strikes me that Documentation/ should at least get _some_ love. 
Perhaps IRQ-ipi.txt? I dunno...


Since this was an RFC I didn't update documentation without first making 
sure the changes are OK. In the next series I'll add the documentation 
changes.


Thanks for pointing it out though. I could have missed it in the next 
series to be honest as I'm getting more focused on the small details :)


Thanks,
Qais
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 2/3] irqchip: irq-mips-gic: Provide function to map GIC user section

2015-10-15 Thread Qais Yousef


On 10/12/2015 11:16 AM, Thomas Gleixner wrote:

On Mon, 12 Oct 2015, Marc Zyngier wrote:

On 12/10/15 10:40, Markos Chandras wrote:

From: Alex Smith 

The GIC provides a "user-mode visible" section containing a mirror of
the counter registers which can be mapped into user memory. This will
be used by the VDSO time function implementations, so provide a
function to map it in.


  

This looks much better than the previous version (though I cannot find
the two other patches on LKML just yet).

Yes, it looks better. But I really have to ask the question why we are
trying to pack the world and somemore into an irq chip driver. We
already have the completely misplaced gic_read_count() there.


This code has a bad history. It was scattered all over the place in arch 
code. Andrew Bresticker did a good job cleaning it up and moved it to 
this irqchip driver.


https://lkml.org/lkml/2014/9/18/487
https://lkml.org/lkml/2014/10/20/481



While I understand that all of this is in the GIC block at least
according to the documentation, technically it's different hardware
blocks. And logically its different as well.


Yes but they're exposed through the same register interface.



So why not describe the various blocks (interrupt controller, timer,
shadow timer) as separate entities in the device tree and let each
subsystem look them up on their own. This cross subsystem hackery is
just horrible and does not buy anything except merge dependencies and
other avoidable hassle.


There's a mips-gic-timer driver in drivers/clocksource. But in device 
tree it's a subnode of the irqchip driver.


http://lxr.free-electrons.com/source/drivers/clocksource/mips-gic-timer.c
http://lxr.free-electrons.com/source/Documentation/devicetree/bindings/interrupt-controller/mips-gic.txt



Thoughts?



It could be refactored but the DT binding already specifies the GIC 
timer as a subnode of GIC. Exposing this usermode register is the only 
thing left in the register set that GIC driver wasn't dealing with.


Little gain in changing all of this now I think?

Thanks,
Qais
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Generic DT binding for IPIs

2015-10-22 Thread Qais Yousef

Is there anything more I can do to get more attention about this? I 
think Marc's suggestion is more generic and future proof, if I send RFC 
patches for that would this be better?


Thanks,
Qais

On 10/14/2015 11:18 AM, Qais Yousef wrote:

Hi,

This is an attempt to revive a discussion on the right list this time 
with all the correct people hopefully on CC.


While trying to upstream a driver, Thomas and Marc Zyngier pointed out 
the need for a generic IPI support in the kernel to allow driver to 
reserve and send ones. Hopefully my latest RFC patch will help to 
clarify what's being done.


https://lkml.org/lkml/2015/10/13/227

We need a generic DT binding support to accompany that to allow a 
driver to reserve an IPI using this new mechanism.


MarcZ had the following suggestion:

https://lkml.org/lkml/2015/8/24/628

Which in summary is

mydevice@f000 {
interrupt-source = <&intc INT_SPEC 2 &inttarg1 &inttarg1>;
};

inttarg1: mydevice@f100 {
interrupt-sink = <&intc HWAFFINITY1>;
};

inttarg2: cpu@1 {
interrupt-sink = <&intc HWAFFINITY2>;
};


interrupt-sink requests to reserve an IPI that it will receive at 
HWAFFINITY cpumask. interrupt-source will not do any reservation. It 
will simply connect an IPI reserved by interrupt-sink to the device 
that will be responsible for generating that IPI. This description 
should allow connecting any 2 devices.

Correct me Marc if I got it wrong please.

I suggested a simplification by assuming that IPIs will only be 
between host OS and a coprocessor which would gives us this form which 
I think is easier to deal with


coprocessor {
 interrupt-source = <&intc INT_SPEC COP_HWAFFINITY>;
 interrupt-sink = <&intc INT_SPEC CPU_HWAFFINITY>;
}


interrupt-source here reserves an IPI to be sent from host OS to 
coprocessor at COP_HWAFFINITY. interrupt-sink will reserve an IPI to 
be received by host OS at CPU_HWAFFINITY. Less generic but I don't 
know how important it is for host OS to setup IPIs between 2 external 
coprocessors and whether it should really be doing that.


What do the DT experts think? Any preference or a better suggestion?

I tried to keep this short and simple, please let me know if you need 
more info or if there's anything that needs more clarification.


Thanks,
Qais


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 01/10] irqchip: irq-mips-gic: export gic_send_ipi

2015-09-02 Thread Qais Yousef


On 08/28/2015 03:22 PM, Thomas Gleixner wrote:

On Fri, 28 Aug 2015, Qais Yousef wrote:

Thanks a lot for the detailed explanation. I wasn't looking for a quick and
dirty solution but my view of the problem is much simpler than yours so my
idea of a solution would look quick and dirty. I have a better appreciation of
the problem now and a way to approach it :-)

 From DT point of view are we OK with this form then

 coprocessor {
 interrupt-source = <&intc INT_SPEC COP_HWAFFINITY>;
 interrupt-sink = <&intc INT_SPEC CPU_HWAFFINITY>;
 }

and if the root controller sends normal IPI as it sends normal device
interrupts then interrupt-sink can be a standard interrupts property (like in
my case)

 coprocessor {
 interrupt-source = <&intc INT_SPEC COP_HWAFFINITY>;
 interrupts = ;
 }

Does this look right to you? Is there something else that needs to be covered
still?

I'm not an DT wizard. I leave that to the DT experts.
  


Hi Marc Zyngier, Mark Rutland,

Any comments about the DT binding for the IPIs?

To recap, the proposal which is based on Marc Zyngier's is to use 
interrupt-source to represent an IPI from Linux CPU to a coprocessor and 
interrupt-sink to receive an IPI from coprocessor to Linux CPU. 
Hopefully the description above is self explanatory. Please let me know 
if you need more info. Thomas covered the routing, synthesising, and 
requesting parts in the core code. The remaining (high level) issue is 
how to describe the IPIs in DT.


Thanks,
Qais
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 01/10] irqchip: irq-mips-gic: export gic_send_ipi

2015-09-02 Thread Qais Yousef


On 09/02/2015 10:55 AM, Marc Zyngier wrote:

On 02/09/15 10:33, Qais Yousef wrote:

On 08/28/2015 03:22 PM, Thomas Gleixner wrote:

On Fri, 28 Aug 2015, Qais Yousef wrote:

Thanks a lot for the detailed explanation. I wasn't looking for a quick and
dirty solution but my view of the problem is much simpler than yours so my
idea of a solution would look quick and dirty. I have a better appreciation of
the problem now and a way to approach it :-)

  From DT point of view are we OK with this form then

  coprocessor {
  interrupt-source = <&intc INT_SPEC COP_HWAFFINITY>;
  interrupt-sink = <&intc INT_SPEC CPU_HWAFFINITY>;
  }

and if the root controller sends normal IPI as it sends normal device
interrupts then interrupt-sink can be a standard interrupts property (like in
my case)

  coprocessor {
  interrupt-source = <&intc INT_SPEC COP_HWAFFINITY>;
  interrupts = ;
  }

Does this look right to you? Is there something else that needs to be covered
still?

I'm not an DT wizard. I leave that to the DT experts.
   

Hi Marc Zyngier, Mark Rutland,

Any comments about the DT binding for the IPIs?

To recap, the proposal which is based on Marc Zyngier's is to use
interrupt-source to represent an IPI from Linux CPU to a coprocessor and
interrupt-sink to receive an IPI from coprocessor to Linux CPU.
Hopefully the description above is self explanatory. Please let me know
if you need more info. Thomas covered the routing, synthesising, and
requesting parts in the core code. The remaining (high level) issue is
how to describe the IPIs in DT.

I'm definitely *not* a DT expert! ;-) My initial binding proposal was
only for wired interrupts, not for IPIs. There is definitely some common
aspects, except for one part:

Who decides on the IPI number? So far, we've avoided encoding IPI
numbers in the DT just like we don't encode MSIs, because they are
programmable things. My feeling is that we shouldn't put the IPI number
in the DT because the rest of the kernel uses them as well and could
decide to use this particular IPI number for its own use: *clash*.


I think this is covered in Thomas proposal to reserve IPIs. His thoughts 
is to use a separate irq-domain for IPIs and use irq_reserve_ipi() and 
irq_destroy_ipi() to get and release IPIs.




The way I see it would be to have a pool of IPI numbers that the kernel
requests for its own use first, leaving whatever remains to drivers.


That's what Thomas thinks too and he covered this by using 
irq_reserve_ipi() and irq_destroy_ipi().


https://lkml.org/lkml/2015/8/26/713

It's worth noting in the light of this that INT_SPEC should be optional 
since for hardware similar to mine there's not much to tell the 
controller if it's all dynamic except where we want the IPI to be routed 
to - the INT_SPEC is implicitly defined by the notion it's an IPI.


Thanks,
Qais



Mark (as *you* are the expert ;-), what do you think?

M.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 01/10] irqchip: irq-mips-gic: export gic_send_ipi

2015-09-02 Thread Qais Yousef


On 09/02/2015 12:53 PM, Marc Zyngier wrote:

On 02/09/15 11:48, Qais Yousef wrote:

It's worth noting in the light of this that INT_SPEC should be optional
since for hardware similar to mine there's not much to tell the
controller if it's all dynamic except where we want the IPI to be routed
to - the INT_SPEC is implicitly defined by the notion it's an IPI.

Well, I'd think that the INT_SPEC should say that it is an IPI, and I
don't believe we should omit it. On the ARM GIC side, our interrupts are
typed (type 0 is a normal wired interrupt, type 1 a per-cpu interrupt,
and we could allocate type 2 to identify an IPI).


I didn't mean to omit it completely, but just being optional so it's 
specified if the intc needs this info only. I'm assuming that INT_SPEC 
is interrupt controller specific. If not, then ignore me :-)




But we do need to identify it properly, as we should be able to cover
both IPIs and normal wired interrupts.


I'm a bit confused here. What do you mean by normal wired interrupts? I 
thought this DT binding is only to describe IPIs that needs reserving 
and routing. What am I missing?


Thanks,
Qais
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Generic DT binding for IPIs

2015-10-23 Thread Qais Yousef


On 10/22/2015 02:43 PM, Rob Herring wrote:

On Wed, Oct 14, 2015 at 5:18 AM, Qais Yousef  wrote:

Hi,

This is an attempt to revive a discussion on the right list this time with
all the correct people hopefully on CC.

devicetree-spec would be more appropriate list for something like this.


Thanks I didn't know about it.




While trying to upstream a driver, Thomas and Marc Zyngier pointed out the
need for a generic IPI support in the kernel to allow driver to reserve and
send ones. Hopefully my latest RFC patch will help to clarify what's being
done.

 https://lkml.org/lkml/2015/10/13/227

We need a generic DT binding support to accompany that to allow a driver to
reserve an IPI using this new mechanism.

MarcZ had the following suggestion:

 https://lkml.org/lkml/2015/8/24/628

Which in summary is

 mydevice@f000 {
 interrupt-source = <&intc INT_SPEC 2 &inttarg1 &inttarg1>;

What is INT_SPEC and "2"? A drawing of the h/w connections and then
what the binding looks like would be helpful.


INT_SPEC are the interrupt controller specific parameters. I think 2 
refers to how many 'interrupt-sink' references it's being passed.





 };

 inttarg1: mydevice@f100 {
 interrupt-sink = <&intc HWAFFINITY1>;

What is HWAFFINITY1? I want to be able to see if say this value is 1,
then the affinity is for cpu0.


HWAFFINITY is the CPU to map the IPI to. The actual value would be 
specific to the interrupt controller. The controller implementation can 
always provide a set of defines if the mapping is not straightforward.





 };

 inttarg2: cpu@1 {
 interrupt-sink = <&intc HWAFFINITY2>;
 };


interrupt-sink requests to reserve an IPI that it will receive at HWAFFINITY
cpumask. interrupt-source will not do any reservation. It will simply
connect an IPI reserved by interrupt-sink to the device that will be
responsible for generating that IPI. This description should allow
connecting any 2 devices.
Correct me Marc if I got it wrong please.

I suggested a simplification by assuming that IPIs will only be between host
OS and a coprocessor which would gives us this form which I think is easier
to deal with

 coprocessor {
  interrupt-source = <&intc INT_SPEC COP_HWAFFINITY>;
  interrupt-sink = <&intc INT_SPEC CPU_HWAFFINITY>;
 }


interrupt-source here reserves an IPI to be sent from host OS to coprocessor
at COP_HWAFFINITY. interrupt-sink will reserve an IPI to be received by host
OS at CPU_HWAFFINITY. Less generic but I don't know how important it is for
host OS to setup IPIs between 2 external coprocessors and whether it should
really be doing that.

Could we use the existing interrupts binding for interrupt-sink?


No. interrupt-sink will cause an IPI to be dynamically allocated and 
mapped to that processor. Of course, if the connection is hardwired then 
interrupts property is the right thing to use.





What do the DT experts think? Any preference or a better suggestion?

Depends how you would assign coproc to coproc IPIs in a system. It may
be fixed in firmware, or more complex coprocs may read the dtb.


OK. I'll try to cook some RFC patches that implement this which 
hopefully will make it easier to review and build on.


Thanks,
Qais
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 3/4] sched/topology: remove smt_gain

2018-08-29 Thread Qais Yousef


On 29/08/18 14:19, Vincent Guittot wrote:

smt_gain is used to compute the capacity of CPUs of a SMT core with the
constraint 1 < smt_gain < 2 in order to be able to compute number of CPUs
per core. The field has_free_capacity of struct numa_stat, which was the
last user of this computation of number of CPUs per core, has been removed
by :

   commit 2d4056fafa19 ("sched/numa: Remove numa_has_capacity()")

We can now remove this constraint on core capacity and use the defautl value
SCHED_CAPACITY_SCALE for SMT CPUs. With this remove, SCHED_CAPACITY_SCALE
becomes the maximum compute capacity of CPUs on every systems. This should
help to simplify some code and remove fields like rd->max_cpu_capacity

Furthermore, arch_scale_cpu_capacity() is used with a NULL sd in several other
places in the code when it wants the capacity of a CPUs to scale
some metrics like in pelt, deadline or schedutil. In case on SMT, the value
returned is not the capacity of SMT CPUs but default SCHED_CAPACITY_SCALE.

Remove the smt_gain field from sched_domain struct


You beat me to it, I got confused by smt_gain recently when I stumbled 
on it as I found out on ARM it's not used and had to spend sometime to 
convince myself it's not really necessary to use it.


It was hard to track the history of this and *why* it's needed.

The only 'theoretical' case I found smt_gain can be useful is when you 
have asymmetric system, for example:


Node_A: 1 Core 2 Threads
Node_B: 1 Core 4 Threads

Then with smt_gain the group_capacity at the core level will be limited 
to smt_gain. But without smt_gain Node_B will look twice as powerful as 
Node_A - which will affect balancing AFAICT causing Node_B's single core 
to be oversubscribed as the 4 threads will still have to share the same 
underlying hardware resources. I don't think in practice such systems 
exists, or even make sense, though.


So +1 from my side for the removal.


Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: linux-kernel@vger.kernel.org (open list)
Signed-off-by: Vincent Guittot 
---
  include/linux/sched/topology.h | 1 -
  kernel/sched/sched.h   | 3 ---
  kernel/sched/topology.c| 2 --
  3 files changed, 6 deletions(-)

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 2634774..212792b 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -89,7 +89,6 @@ struct sched_domain {
unsigned int newidle_idx;
unsigned int wake_idx;
unsigned int forkexec_idx;
-   unsigned int smt_gain;
  
  	int nohz_idle;			/* NOHZ IDLE status */

int flags;  /* See SD_* */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 4a2e8ca..b1715b8 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1758,9 +1758,6 @@ unsigned long arch_scale_freq_capacity(int cpu)
  static __always_inline
  unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
  {
-   if (sd && (sd->flags & SD_SHARE_CPUCAPACITY) && (sd->span_weight > 1))
-   return sd->smt_gain / sd->span_weight;
-
return SCHED_CAPACITY_SCALE;
  }
  #endif
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 56a0fed..069c924 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1129,7 +1129,6 @@ sd_init(struct sched_domain_topology_level *tl,
  
  		.last_balance		= jiffies,

.balance_interval   = sd_weight,
-   .smt_gain   = 0,
.max_newidle_lb_cost= 0,
.next_decay_max_lb_cost = jiffies,
.child  = child,
@@ -1155,7 +1154,6 @@ sd_init(struct sched_domain_topology_level *tl,
if (sd->flags & SD_SHARE_CPUCAPACITY) {
sd->flags |= SD_PREFER_SIBLING;
sd->imbalance_pct = 110;
-   sd->smt_gain = 1178; /* ~15% */
  
  	} else if (sd->flags & SD_SHARE_PKG_RESOURCES) {

sd->flags |= SD_PREFER_SIBLING;



--
Qais Yousef

Re: [PATCH 1/4] sched/topology: SD_ASYM_CPUCAPACITY flag detection

2018-07-23 Thread Qais Yousef

ched_domain *sd = sd_init(tl, cpu_map, child, cpu);
+   struct sched_domain *sd = sd_init(tl, cpu_map, child, dflags, cpu);
  
  	if (child) {

sd->level = child->level + 1;
@@ -1636,6 +1637,65 @@ static struct sched_domain *build_sched_domain(struct 
sched_domain_topology_leve
  }
  
  /*

+ * Find the sched_domain_topology_level where all cpu capacities are visible
+ * for all cpus.
+ */
+static struct sched_domain_topology_level
+*asym_cpu_capacity_level(const struct cpumask *cpu_map)
+{
+   int i, j, asym_level = 0;
+   bool asym = false;
+   struct sched_domain_topology_level *tl, *asym_tl = NULL;
+   unsigned long cap;
+
+   /* Is there any asymmetry? */
+   cap = arch_scale_cpu_capacity(NULL, cpumask_first(cpu_map));
+
+   for_each_cpu(i, cpu_map) {
+   if (arch_scale_cpu_capacity(NULL, i) != cap) {
+   asym = true;
+   break;
+   }
+   }
+
+   if (!asym)
+   return NULL;
+
+   /*
+* Examine topology from all cpu's point of views to detect the lowest
+* sched_domain_topology_level where a highest capacity cpu is visible
+* to everyone.
+*/
+   for_each_cpu(i, cpu_map) {
+   unsigned long max_capacity = arch_scale_cpu_capacity(NULL, i);
+   int tl_id = 0;
+
+   for_each_sd_topology(tl) {
+   if (tl_id < asym_level)
+   goto next_level;
+


I think if you increment and then continue here you might save the extra 
branch. I didn't look at any disassembly though to verify the generated 
code.


I wonder if we can introduce for_each_sd_topology_from(tl, 
starting_level) so that you can start searching from a provided level - 
which will make this skipping logic unnecessary? So the code will look like


            for_each_sd_topology_from(tl, asymc_level) {
                ...
            }


+   for_each_cpu_and(j, tl->mask(i), cpu_map) {
+   unsigned long capacity;
+
+   capacity = arch_scale_cpu_capacity(NULL, j);
+
+   if (capacity <= max_capacity)
+   continue;
+
+   max_capacity = capacity;
+   asym_level = tl_id;
+   asym_tl = tl;
+   }
+next_level:
+   tl_id++;
+   }
+   }
+
+   return asym_tl;
+}
+
+
+/*
   * Build sched domains for a given set of CPUs and attach the sched domains
   * to the individual CPUs
   */
@@ -1647,18 +1707,27 @@ build_sched_domains(const struct cpumask *cpu_map, 
struct sched_domain_attr *att
struct s_data d;
struct rq *rq = NULL;
int i, ret = -ENOMEM;
+   struct sched_domain_topology_level *tl_asym;
  
  	alloc_state = __visit_domain_allocation_hell(&d, cpu_map);

if (alloc_state != sa_rootdomain)
goto error;
  
+	tl_asym = asym_cpu_capacity_level(cpu_map);

+


Or maybe this is not a hot path and we don't care that much about 
optimizing the search since you call it unconditionally here even for 
systems that don't care?



/* Set up domains for CPUs specified by the cpu_map: */
for_each_cpu(i, cpu_map) {
struct sched_domain_topology_level *tl;
  
  		sd = NULL;

for_each_sd_topology(tl) {
-   sd = build_sched_domain(tl, cpu_map, attr, sd, i);
+   int dflags = 0;
+
+   if (tl == tl_asym)
+   dflags |= SD_ASYM_CPUCAPACITY;
+
+   sd = build_sched_domain(tl, cpu_map, attr, sd, dflags, 
i);
+
if (tl == sched_domain_topology)
*per_cpu_ptr(d.sd, i) = sd;
if (tl->flags & SDTL_OVERLAP)



--
Qais Yousef

Re: [PATCH 1/4] sched/topology: SD_ASYM_CPUCAPACITY flag detection

2018-07-23 Thread Qais Yousef


On 23/07/18 16:27, Morten Rasmussen wrote:

[...]


+   /*
+* Examine topology from all cpu's point of views to detect the lowest
+* sched_domain_topology_level where a highest capacity cpu is visible
+* to everyone.
+*/
+   for_each_cpu(i, cpu_map) {
+   unsigned long max_capacity = arch_scale_cpu_capacity(NULL, i);
+   int tl_id = 0;
+
+   for_each_sd_topology(tl) {
+   if (tl_id < asym_level)
+   goto next_level;
+

I think if you increment and then continue here you might save the extra
branch. I didn't look at any disassembly though to verify the generated
code.

I wonder if we can introduce for_each_sd_topology_from(tl, starting_level)
so that you can start searching from a provided level - which will make this
skipping logic unnecessary? So the code will look like

             for_each_sd_topology_from(tl, asymc_level) {
                 ...
             }

Both options would work. Increment+contrinue instead of goto would be
slightly less readable I think since we would still have the increment
at the end of the loop, but easy to do. Introducing
for_each_sd_topology_from() improve things too, but I wonder if it is
worth it.


I don't mind the current form to be honest. I agree it's not worth it if 
it is called infrequent enough.



@@ -1647,18 +1707,27 @@ build_sched_domains(const struct cpumask *cpu_map, 
struct sched_domain_attr *att
struct s_data d;
struct rq *rq = NULL;
int i, ret = -ENOMEM;
+   struct sched_domain_topology_level *tl_asym;
alloc_state = __visit_domain_allocation_hell(&d, cpu_map);
if (alloc_state != sa_rootdomain)
goto error;
+   tl_asym = asym_cpu_capacity_level(cpu_map);
+

Or maybe this is not a hot path and we don't care that much about optimizing
the search since you call it unconditionally here even for systems that
don't care?

It does increase the cost of things like hotplug slightly and
repartitioning of root_domains a slightly but I don't see how we can
avoid it if we want generic code to set this flag. If the costs are not
acceptable I think the only option is to make the detection architecture
specific.


I think hotplug is already expensive and this overhead would be small in 
comparison. But this could be called when frequency changes if I 
understood correctly - this is the one I wasn't sure how 'hot' it could 
be. I wouldn't expect frequency changes at a very high rate because it's 
relatively expensive too..



In any case, AFAIK rebuilding the sched_domain hierarchy shouldn't be a
normal and common thing to do. If checking for the flag is not
acceptable on SMP-only architectures, I can move it under arch/arm[,64]
although it is not as clean.



I like the approach and I think it's nice and clean. If it actually 
appears in some profiles I think we have room to optimize it.


--
Qais Yousef

Re: [PATCH 1/4] sched/topology: SD_ASYM_CPUCAPACITY flag detection

2018-07-24 Thread Qais Yousef


On 24/07/18 09:37, Morten Rasmussen wrote:

On Mon, Jul 23, 2018 at 05:07:50PM +0100, Qais Yousef wrote:

On 23/07/18 16:27, Morten Rasmussen wrote:

It does increase the cost of things like hotplug slightly and
repartitioning of root_domains a slightly but I don't see how we can
avoid it if we want generic code to set this flag. If the costs are not
acceptable I think the only option is to make the detection architecture
specific.

I think hotplug is already expensive and this overhead would be small in
comparison. But this could be called when frequency changes if I understood
correctly - this is the one I wasn't sure how 'hot' it could be. I wouldn't
expect frequency changes at a very high rate because it's relatively
expensive too..

A frequency change shouldn't lead to a flag change or a rebuild of the
sched_domain hierarhcy. The situations where the hierarchy should be
rebuild to update the flag is during boot as we only know the amount of
asymmetry once cpufreq has been initialized, when cpus are hotplugged
in/out, and when root_domains change due to cpuset reconfiguration. So
it should be a relatively rare event.



Ah OK I misunderstood that part then.

The series LGTM then.

--
Qais Yousef

[PATCH] sched/rt: Clean up usage of rt_task()

2024-05-14 Thread Qais Yousef

rt_task() checks if a task has RT priority. But depends on your
dictionary, this could mean it belongs to RT class, or is a 'realtime'
task, which includes RT and DL classes.

Since this has caused some confusion already on discussion [1], it
seemed a clean up is due.

I define the usage of rt_task() to be tasks that belong to RT class.
Make sure that it returns true only for RT class and audit the users and
replace them with the new realtime_task() which returns true for RT and
DL classes - the old behavior. Introduce similar realtime_prio() to
create similar distinction to rt_prio() and update the users.

Move MAX_DL_PRIO to prio.h so it can be used in the new definitions.

Document the functions to make it more obvious what is the difference
between them. PI-boosted tasks is a factor that must be taken into
account when choosing which function to use.

Rename task_is_realtime() to task_has_realtime_policy() as the old name
is confusing against the new realtime_task().

No functional changes were intended.

[1] 
https://lore.kernel.org/lkml/20240506100509.gl40...@noisy.programming.kicks-ass.net/

Signed-off-by: Qais Yousef 
---
 fs/select.c   |  2 +-
 include/linux/ioprio.h|  2 +-
 include/linux/sched/deadline.h|  6 --
 include/linux/sched/prio.h|  1 +
 include/linux/sched/rt.h  | 27 ++-
 kernel/locking/rtmutex.c  |  4 ++--
 kernel/locking/rwsem.c|  4 ++--
 kernel/locking/ww_mutex.h |  2 +-
 kernel/sched/core.c   |  6 +++---
 kernel/time/hrtimer.c |  6 +++---
 kernel/trace/trace_sched_wakeup.c |  2 +-
 mm/page-writeback.c   |  4 ++--
 mm/page_alloc.c   |  2 +-
 13 files changed, 48 insertions(+), 20 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index 9515c3fa1a03..8d5c1419416c 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -82,7 +82,7 @@ u64 select_estimate_accuracy(struct timespec64 *tv)
 * Realtime tasks get a slack of 0 for obvious reasons.
 */
 
-   if (rt_task(current))
+   if (realtime_task(current))
return 0;
 
ktime_get_ts64(&now);
diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index db1249cd9692..6c00342b6166 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -40,7 +40,7 @@ static inline int task_nice_ioclass(struct task_struct *task)
 {
if (task->policy == SCHED_IDLE)
return IOPRIO_CLASS_IDLE;
-   else if (task_is_realtime(task))
+   else if (task_has_realtime_policy(task))
return IOPRIO_CLASS_RT;
else
return IOPRIO_CLASS_BE;
diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
index df3aca89d4f5..5cb88b748ad6 100644
--- a/include/linux/sched/deadline.h
+++ b/include/linux/sched/deadline.h
@@ -10,8 +10,6 @@
 
 #include 
 
-#define MAX_DL_PRIO0
-
 static inline int dl_prio(int prio)
 {
if (unlikely(prio < MAX_DL_PRIO))
@@ -19,6 +17,10 @@ static inline int dl_prio(int prio)
return 0;
 }
 
+/*
+ * Returns true if a task has a priority that belongs to DL class. PI-boosted
+ * tasks will return true. Use dl_policy() to ignore PI-boosted tasks.
+ */
 static inline int dl_task(struct task_struct *p)
 {
return dl_prio(p->prio);
diff --git a/include/linux/sched/prio.h b/include/linux/sched/prio.h
index ab83d85e1183..6ab43b4f72f9 100644
--- a/include/linux/sched/prio.h
+++ b/include/linux/sched/prio.h
@@ -14,6 +14,7 @@
  */
 
 #define MAX_RT_PRIO100
+#define MAX_DL_PRIO0
 
 #define MAX_PRIO   (MAX_RT_PRIO + NICE_WIDTH)
 #define DEFAULT_PRIO   (MAX_RT_PRIO + NICE_WIDTH / 2)
diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
index b2b9e6eb9683..b31be3c50152 100644
--- a/include/linux/sched/rt.h
+++ b/include/linux/sched/rt.h
@@ -7,18 +7,43 @@
 struct task_struct;
 
 static inline int rt_prio(int prio)
+{
+   if (unlikely(prio < MAX_RT_PRIO && prio >= MAX_DL_PRIO))
+   return 1;
+   return 0;
+}
+
+static inline int realtime_prio(int prio)
 {
if (unlikely(prio < MAX_RT_PRIO))
return 1;
return 0;
 }
 
+/*
+ * Returns true if a task has a priority that belongs to RT class. PI-boosted
+ * tasks will return true. Use rt_policy() to ignore PI-boosted tasks.
+ */
 static inline int rt_task(struct task_struct *p)
 {
return rt_prio(p->prio);
 }
 
-static inline bool task_is_realtime(struct task_struct *tsk)
+/*
+ * Returns true if a task has a priority that belongs to RT or DL classes.
+ * PI-boosted tasks will return true. Use task_has_realtime_policy() to ignore
+ * PI-boosted tasks.
+ */
+static inline int realtime_task(struct task_struct *p)
+{
+   return realtime_prio(p->prio);
+}
+
+/*
+ * Returns true if a task has a policy that belongs to RT or DL classes.
+ * PI-booste

Re: [PATCH] sched/rt: Clean up usage of rt_task()

2024-05-15 Thread Qais Yousef

On 05/15/24 10:32, Peter Zijlstra wrote:
> On Tue, May 14, 2024 at 07:58:51PM -0400, Phil Auld wrote:
> > 
> > Hi Qais,
> > 
> > On Wed, May 15, 2024 at 12:41:12AM +0100 Qais Yousef wrote:
> > > rt_task() checks if a task has RT priority. But depends on your
> > > dictionary, this could mean it belongs to RT class, or is a 'realtime'
> > > task, which includes RT and DL classes.
> > > 
> > > Since this has caused some confusion already on discussion [1], it
> > > seemed a clean up is due.
> > > 
> > > I define the usage of rt_task() to be tasks that belong to RT class.
> > > Make sure that it returns true only for RT class and audit the users and
> > > replace them with the new realtime_task() which returns true for RT and
> > > DL classes - the old behavior. Introduce similar realtime_prio() to
> > > create similar distinction to rt_prio() and update the users.
> > 
> > I think making the difference clear is good. However, I think rt_task() is
> > a better name. We have dl_task() still.  And rt tasks are things managed
> > by rt.c, basically. Not realtime.c :)  I know that doesn't work for 
> > deadline.c
> > and dl_ but this change would be the reverse of that pattern.
> 
> It's going to be a mess either way around, but I think rt_task() and
> dl_task() being distinct is more sensible than the current overlap.

Judging by some of the users I've seen, I think there were some users not
expecting they're not distinct as they were checking for !dl_task() &&
!rt_task() which I replaced with !realtime_task(). Similar users checking for
dl_prio() and rt_prio() in places, and others using rt_prio() to encompass
dl_prio(). There were BUG_ON(!rt_task())/WARN_ON(!rt_prio()) in rt.c which
I don't think it in intended to encompass dl there.

> 
> > > Move MAX_DL_PRIO to prio.h so it can be used in the new definitions.
> > > 
> > > Document the functions to make it more obvious what is the difference
> > > between them. PI-boosted tasks is a factor that must be taken into
> > > account when choosing which function to use.
> > > 
> > > Rename task_is_realtime() to task_has_realtime_policy() as the old name
> > > is confusing against the new realtime_task().
> 
> realtime_task_policy() perhaps?

Better yes. Updated.

Thanks!

Re: [PATCH] sched/rt: Clean up usage of rt_task()

2024-05-15 Thread Qais Yousef

On 05/15/24 07:20, Phil Auld wrote:
> On Wed, May 15, 2024 at 10:32:38AM +0200 Peter Zijlstra wrote:
> > On Tue, May 14, 2024 at 07:58:51PM -0400, Phil Auld wrote:
> > > 
> > > Hi Qais,
> > > 
> > > On Wed, May 15, 2024 at 12:41:12AM +0100 Qais Yousef wrote:
> > > > rt_task() checks if a task has RT priority. But depends on your
> > > > dictionary, this could mean it belongs to RT class, or is a 'realtime'
> > > > task, which includes RT and DL classes.
> > > > 
> > > > Since this has caused some confusion already on discussion [1], it
> > > > seemed a clean up is due.
> > > > 
> > > > I define the usage of rt_task() to be tasks that belong to RT class.
> > > > Make sure that it returns true only for RT class and audit the users and
> > > > replace them with the new realtime_task() which returns true for RT and
> > > > DL classes - the old behavior. Introduce similar realtime_prio() to
> > > > create similar distinction to rt_prio() and update the users.
> > > 
> > > I think making the difference clear is good. However, I think rt_task() is
> > > a better name. We have dl_task() still.  And rt tasks are things managed
> > > by rt.c, basically. Not realtime.c :)  I know that doesn't work for 
> > > deadline.c
> > > and dl_ but this change would be the reverse of that pattern.
> > 
> > It's going to be a mess either way around, but I think rt_task() and
> > dl_task() being distinct is more sensible than the current overlap.
> >
> 
> Yes, indeed.
> 
> My point was just to call it rt_task() still. 

It is called rt_task() still. I just added a new realtime_task() to return true
for RT and DL. rt_task() will return true only for RT now.

How do you see this should be done instead? I'm not seeing the problem.

Re: [PATCH] sched/rt: Clean up usage of rt_task()

2024-05-15 Thread Qais Yousef

On 05/15/24 08:50, Phil Auld wrote:

> > > My point was just to call it rt_task() still. 
> > 
> > It is called rt_task() still. I just added a new realtime_task() to return 
> > true
> > for RT and DL. rt_task() will return true only for RT now.
> > 
> > How do you see this should be done instead? I'm not seeing the problem.
> >
> 
> Right, sorry. I misread your commit message completely and then all the
> places where you changed rt_task() to realtime_task() fit my misreading.
> 
> rt_task() means rt class and realtime_task does what rt_task() used to do.
> That's how I would do it, too :)

Ah, I see. I updated the commit message to hopefully read better :)

"""
I define the usage of rt_task() to be tasks that belong to RT class.
Make sure that it returns true only for RT class and audit the users and
replace the ones required the old behavior with the new realtime_task()
which returns true for RT and DL classes. Introduce similar
realtime_prio() to create similar distinction to rt_prio() and update
the users that required the old behavior to use the new function.
"""

> Reviewed-by: Phil Auld 

Thanks for having a look!

Cheers

--
Qais Yousef

[PATCH v2] sched/rt: Clean up usage of rt_task()

2024-05-15 Thread Qais Yousef

rt_task() checks if a task has RT priority. But depends on your
dictionary, this could mean it belongs to RT class, or is a 'realtime'
task, which includes RT and DL classes.

Since this has caused some confusion already on discussion [1], it
seemed a clean up is due.

I define the usage of rt_task() to be tasks that belong to RT class.
Make sure that it returns true only for RT class and audit the users and
replace the ones required the old behavior with the new realtime_task()
which returns true for RT and DL classes. Introduce similar
realtime_prio() to create similar distinction to rt_prio() and update
the users that required the old behavior to use the new function.

Move MAX_DL_PRIO to prio.h so it can be used in the new definitions.

Document the functions to make it more obvious what is the difference
between them. PI-boosted tasks is a factor that must be taken into
account when choosing which function to use.

Rename task_is_realtime() to realtime_task_policy() as the old name is
confusing against the new realtime_task().

No functional changes were intended.

[1] 
https://lore.kernel.org/lkml/20240506100509.gl40...@noisy.programming.kicks-ass.net/

Reviewed-by: Phil Auld 
Signed-off-by: Qais Yousef 
---

Changes since v1:

* Use realtime_task_policy() instead task_has_realtime_policy() (Peter)
* Improve commit message readability about replace some rt_task()
  users.

v1 discussion: 
https://lore.kernel.org/lkml/20240514234112.792989-1-qyou...@layalina.io/

 fs/select.c   |  2 +-
 include/linux/ioprio.h|  2 +-
 include/linux/sched/deadline.h|  6 --
 include/linux/sched/prio.h|  1 +
 include/linux/sched/rt.h  | 27 ++-
 kernel/locking/rtmutex.c  |  4 ++--
 kernel/locking/rwsem.c|  4 ++--
 kernel/locking/ww_mutex.h |  2 +-
 kernel/sched/core.c   |  6 +++---
 kernel/time/hrtimer.c |  6 +++---
 kernel/trace/trace_sched_wakeup.c |  2 +-
 mm/page-writeback.c   |  4 ++--
 mm/page_alloc.c   |  2 +-
 13 files changed, 48 insertions(+), 20 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index 9515c3fa1a03..8d5c1419416c 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -82,7 +82,7 @@ u64 select_estimate_accuracy(struct timespec64 *tv)
 * Realtime tasks get a slack of 0 for obvious reasons.
 */
 
-   if (rt_task(current))
+   if (realtime_task(current))
return 0;
 
ktime_get_ts64(&now);
diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index db1249cd9692..75859b78d540 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -40,7 +40,7 @@ static inline int task_nice_ioclass(struct task_struct *task)
 {
if (task->policy == SCHED_IDLE)
return IOPRIO_CLASS_IDLE;
-   else if (task_is_realtime(task))
+   else if (realtime_task_policy(task))
return IOPRIO_CLASS_RT;
else
return IOPRIO_CLASS_BE;
diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
index df3aca89d4f5..5cb88b748ad6 100644
--- a/include/linux/sched/deadline.h
+++ b/include/linux/sched/deadline.h
@@ -10,8 +10,6 @@
 
 #include 
 
-#define MAX_DL_PRIO0
-
 static inline int dl_prio(int prio)
 {
if (unlikely(prio < MAX_DL_PRIO))
@@ -19,6 +17,10 @@ static inline int dl_prio(int prio)
return 0;
 }
 
+/*
+ * Returns true if a task has a priority that belongs to DL class. PI-boosted
+ * tasks will return true. Use dl_policy() to ignore PI-boosted tasks.
+ */
 static inline int dl_task(struct task_struct *p)
 {
return dl_prio(p->prio);
diff --git a/include/linux/sched/prio.h b/include/linux/sched/prio.h
index ab83d85e1183..6ab43b4f72f9 100644
--- a/include/linux/sched/prio.h
+++ b/include/linux/sched/prio.h
@@ -14,6 +14,7 @@
  */
 
 #define MAX_RT_PRIO100
+#define MAX_DL_PRIO0
 
 #define MAX_PRIO   (MAX_RT_PRIO + NICE_WIDTH)
 #define DEFAULT_PRIO   (MAX_RT_PRIO + NICE_WIDTH / 2)
diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
index b2b9e6eb9683..a055dd68a77c 100644
--- a/include/linux/sched/rt.h
+++ b/include/linux/sched/rt.h
@@ -7,18 +7,43 @@
 struct task_struct;
 
 static inline int rt_prio(int prio)
+{
+   if (unlikely(prio < MAX_RT_PRIO && prio >= MAX_DL_PRIO))
+   return 1;
+   return 0;
+}
+
+static inline int realtime_prio(int prio)
 {
if (unlikely(prio < MAX_RT_PRIO))
return 1;
return 0;
 }
 
+/*
+ * Returns true if a task has a priority that belongs to RT class. PI-boosted
+ * tasks will return true. Use rt_policy() to ignore PI-boosted tasks.
+ */
 static inline int rt_task(struct task_struct *p)
 {
return rt_prio(p->prio);
 }
 
-static inline bool task_is_realtime(struct task_struct *tsk)
+/*
+ * Returns true

Re: [PATCH v2] sched/rt: Clean up usage of rt_task()

2024-05-27 Thread Qais Yousef

On 05/21/24 13:00, Sebastian Andrzej Siewior wrote:
> On 2024-05-15 23:05:36 [+0100], Qais Yousef wrote:
> > rt_task() checks if a task has RT priority. But depends on your
> > dictionary, this could mean it belongs to RT class, or is a 'realtime'
> > task, which includes RT and DL classes.
> > 
> > Since this has caused some confusion already on discussion [1], it
> > seemed a clean up is due.
> > 
> > I define the usage of rt_task() to be tasks that belong to RT class.
> > Make sure that it returns true only for RT class and audit the users and
> > replace the ones required the old behavior with the new realtime_task()
> > which returns true for RT and DL classes. Introduce similar
> > realtime_prio() to create similar distinction to rt_prio() and update
> > the users that required the old behavior to use the new function.
> > 
> > Move MAX_DL_PRIO to prio.h so it can be used in the new definitions.
> > 
> > Document the functions to make it more obvious what is the difference
> > between them. PI-boosted tasks is a factor that must be taken into
> > account when choosing which function to use.
> > 
> > Rename task_is_realtime() to realtime_task_policy() as the old name is
> > confusing against the new realtime_task().
> 
> I *think* everyone using rt_task() means to include DL tasks. And
> everyone means !SCHED-people since they know when the difference matters.

yes, this makes sense

> 
> > No functional changes were intended.
> > 
> > [1] 
> > https://lore.kernel.org/lkml/20240506100509.gl40...@noisy.programming.kicks-ass.net/
> > 
> > Reviewed-by: Phil Auld 
> > Signed-off-by: Qais Yousef 
> > ---
> > 
> > Changes since v1:
> > 
> > * Use realtime_task_policy() instead task_has_realtime_policy() (Peter)
> > * Improve commit message readability about replace some rt_task()
> >   users.
> > 
> > v1 discussion: 
> > https://lore.kernel.org/lkml/20240514234112.792989-1-qyou...@layalina.io/
> > 
> >  fs/select.c   |  2 +-
> 
> fs/bcachefs/six.c
> six_owner_running() has rt_task(). But imho should have realtime_task()
> to consider DL. But I think it is way worse that it has its own locking
> rather than using what everyone else but then again it wouldn't be the
> new hot thing…

I think I missed this one. Converted now. Thanks!

> 
> >  include/linux/ioprio.h|  2 +-
> >  include/linux/sched/deadline.h|  6 --
> >  include/linux/sched/prio.h|  1 +
> >  include/linux/sched/rt.h  | 27 ++-
> >  kernel/locking/rtmutex.c  |  4 ++--
> >  kernel/locking/rwsem.c|  4 ++--
> >  kernel/locking/ww_mutex.h |  2 +-
> >  kernel/sched/core.c   |  6 +++---
> >  kernel/time/hrtimer.c |  6 +++---
> >  kernel/trace/trace_sched_wakeup.c |  2 +-
> >  mm/page-writeback.c   |  4 ++--
> >  mm/page_alloc.c   |  2 +-
> >  13 files changed, 48 insertions(+), 20 deletions(-)
> …
> > diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
> > index 70625dff62ce..08b95e0a41ab 100644
> > --- a/kernel/time/hrtimer.c
> > +++ b/kernel/time/hrtimer.c
> > @@ -1996,7 +1996,7 @@ static void __hrtimer_init_sleeper(struct 
> > hrtimer_sleeper *sl,
> >  * expiry.
> >  */
> > if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
> > -   if (task_is_realtime(current) && !(mode & HRTIMER_MODE_SOFT))
> > +   if (realtime_task_policy(current) && !(mode & 
> > HRTIMER_MODE_SOFT))
> > mode |= HRTIMER_MODE_HARD;
> > }
> >  
> > @@ -2096,7 +2096,7 @@ long hrtimer_nanosleep(ktime_t rqtp, const enum 
> > hrtimer_mode mode,
> > u64 slack;
> >  
> > slack = current->timer_slack_ns;
> > -   if (rt_task(current))
> > +   if (realtime_task(current))
> > slack = 0;
> >  
> > hrtimer_init_sleeper_on_stack(&t, clockid, mode);
> > @@ -2301,7 +2301,7 @@ schedule_hrtimeout_range_clock(ktime_t *expires, u64 
> > delta,
> >  * Override any slack passed by the user if under
> >  * rt contraints.
> >  */
> > -   if (rt_task(current))
> > +   if (realtime_task(current))
> > delta = 0;
> 
> I know this is just converting what is already here but…
> __hrtimer_init_sleeper() looks at the policy to figure out if the task
> is realtime do decide if should expire in HARD-IRQ context. This is
> correct, a booste

Re: [PATCH v2] sched/rt: Clean up usage of rt_task()

2024-05-27 Thread Qais Yousef

On 05/23/24 11:45, Steven Rostedt wrote:
> On Wed, 15 May 2024 23:05:36 +0100
> Qais Yousef  wrote:
> > diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
> > index df3aca89d4f5..5cb88b748ad6 100644
> > --- a/include/linux/sched/deadline.h
> > +++ b/include/linux/sched/deadline.h
> > @@ -10,8 +10,6 @@
> >  
> >  #include 
> >  
> > -#define MAX_DL_PRIO0
> > -
> >  static inline int dl_prio(int prio)
> >  {
> > if (unlikely(prio < MAX_DL_PRIO))
> > @@ -19,6 +17,10 @@ static inline int dl_prio(int prio)
> > return 0;
> >  }
> >  
> > +/*
> > + * Returns true if a task has a priority that belongs to DL class. 
> > PI-boosted
> > + * tasks will return true. Use dl_policy() to ignore PI-boosted tasks.
> > + */
> >  static inline int dl_task(struct task_struct *p)
> >  {
> > return dl_prio(p->prio);
> > diff --git a/include/linux/sched/prio.h b/include/linux/sched/prio.h
> > index ab83d85e1183..6ab43b4f72f9 100644
> > --- a/include/linux/sched/prio.h
> > +++ b/include/linux/sched/prio.h
> > @@ -14,6 +14,7 @@
> >   */
> >  
> >  #define MAX_RT_PRIO100
> > +#define MAX_DL_PRIO0
> >  
> >  #define MAX_PRIO   (MAX_RT_PRIO + NICE_WIDTH)
> >  #define DEFAULT_PRIO   (MAX_RT_PRIO + NICE_WIDTH / 2)
> > diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
> > index b2b9e6eb9683..a055dd68a77c 100644
> > --- a/include/linux/sched/rt.h
> > +++ b/include/linux/sched/rt.h
> > @@ -7,18 +7,43 @@
> >  struct task_struct;
> >  
> >  static inline int rt_prio(int prio)
> > +{
> > +   if (unlikely(prio < MAX_RT_PRIO && prio >= MAX_DL_PRIO))
> > +   return 1;
> > +   return 0;
> > +}
> > +
> > +static inline int realtime_prio(int prio)
> >  {
> > if (unlikely(prio < MAX_RT_PRIO))
> > return 1;
> > return 0;
> >  }
> 
> I'm thinking we should change the above to bool (separate patch), as
> returning an int may give one the impression that it returns the actual
> priority number. Having it return bool will clear that up.
> 
> In fact, if we are touching theses functions, might as well change all of
> them to bool when returning true/false. Just to make it easier to
> understand what they are doing.

I can add a patch on top, sure.

> 
> >  
> > +/*
> > + * Returns true if a task has a priority that belongs to RT class. 
> > PI-boosted
> > + * tasks will return true. Use rt_policy() to ignore PI-boosted tasks.
> > + */
> >  static inline int rt_task(struct task_struct *p)
> >  {
> > return rt_prio(p->prio);
> >  }
> >  
> > -static inline bool task_is_realtime(struct task_struct *tsk)
> > +/*
> > + * Returns true if a task has a priority that belongs to RT or DL classes.
> > + * PI-boosted tasks will return true. Use realtime_task_policy() to ignore
> > + * PI-boosted tasks.
> > + */
> > +static inline int realtime_task(struct task_struct *p)
> > +{
> > +   return realtime_prio(p->prio);
> > +}
> > +
> > +/*
> > + * Returns true if a task has a policy that belongs to RT or DL classes.
> > + * PI-boosted tasks will return false.
> > + */
> > +static inline bool realtime_task_policy(struct task_struct *tsk)
> >  {
> > int policy = tsk->policy;
> >  
> 
> 
> 
> > diff --git a/kernel/trace/trace_sched_wakeup.c 
> > b/kernel/trace/trace_sched_wakeup.c
> > index 0469a04a355f..19d737742e29 100644
> > --- a/kernel/trace/trace_sched_wakeup.c
> > +++ b/kernel/trace/trace_sched_wakeup.c
> > @@ -545,7 +545,7 @@ probe_wakeup(void *ignore, struct task_struct *p)
> >  *  - wakeup_dl handles tasks belonging to sched_dl class only.
> >  */
> > if (tracing_dl || (wakeup_dl && !dl_task(p)) ||
> > -   (wakeup_rt && !dl_task(p) && !rt_task(p)) ||
> > +   (wakeup_rt && !realtime_task(p)) ||
> > (!dl_task(p) && (p->prio >= wakeup_prio || p->prio >= 
> > current->prio)))
> > return;
> >  
> 
> Reviewed-by: Steven Rostedt (Google) 

Thanks!

--
Qais Yousef

> 
>

[PATCH v3 0/3] Clean up usage of rt_task()

2024-05-27 Thread Qais Yousef

Make rt_task() return true only for RT class and add new realtime_task() to
return true for RT and DL classes to avoid some confusion the old API can
cause.

No functional changes intended in patch 1. Patch 2 changes hrtimer users as
suggested by Sebastian. Patch 3 cleans up the return type as suggested by
Steve.

Changes since v2:

* Fix one user that should use realtime_task() but remained using
  rt_task() (Sebastian)
* New patch to convert all hrtimer users to use realtime_task_policy()
  (Sebastian)
* Add a new patch to convert return type to bool (Steve)
* Rebase on tip/sched/core and handle a conflict with code shuffle to
  syscalls.c
* Add Reviewed-by Steve

Changes since v1:

* Use realtime_task_policy() instead task_has_realtime_policy() (Peter)
* Improve commit message readability about replace some rt_task()
  users.

v1 discussion: 
https://lore.kernel.org/lkml/20240514234112.792989-1-qyou...@layalina.io/
v2 discussion: 
https://lore.kernel.org/lkml/20240515220536.823145-1-qyou...@layalina.io/

Qais Yousef (3):
  sched/rt: Clean up usage of rt_task()
  hrtimer: Convert realtime_task() to realtime_task_policy()
  sched/rt, dl: Convert functions to return bool

 fs/bcachefs/six.c |  2 +-
 fs/select.c   |  2 +-
 include/linux/ioprio.h|  2 +-
 include/linux/sched/deadline.h| 10 ++
 include/linux/sched/prio.h|  1 +
 include/linux/sched/rt.h  | 31 ---
 kernel/locking/rtmutex.c  |  4 ++--
 kernel/locking/rwsem.c|  4 ++--
 kernel/locking/ww_mutex.h |  2 +-
 kernel/sched/core.c   |  4 ++--
 kernel/sched/syscalls.c   |  2 +-
 kernel/time/hrtimer.c |  6 +++---
 kernel/trace/trace_sched_wakeup.c |  2 +-
 mm/page-writeback.c   |  4 ++--
 mm/page_alloc.c   |  2 +-
 15 files changed, 53 insertions(+), 25 deletions(-)

-- 
2.34.1

[PATCH v3 1/3] sched/rt: Clean up usage of rt_task()

2024-05-27 Thread Qais Yousef

rt_task() checks if a task has RT priority. But depends on your
dictionary, this could mean it belongs to RT class, or is a 'realtime'
task, which includes RT and DL classes.

Since this has caused some confusion already on discussion [1], it
seemed a clean up is due.

I define the usage of rt_task() to be tasks that belong to RT class.
Make sure that it returns true only for RT class and audit the users and
replace the ones required the old behavior with the new realtime_task()
which returns true for RT and DL classes. Introduce similar
realtime_prio() to create similar distinction to rt_prio() and update
the users that required the old behavior to use the new function.

Move MAX_DL_PRIO to prio.h so it can be used in the new definitions.

Document the functions to make it more obvious what is the difference
between them. PI-boosted tasks is a factor that must be taken into
account when choosing which function to use.

Rename task_is_realtime() to realtime_task_policy() as the old name is
confusing against the new realtime_task().

No functional changes were intended.

[1] 
https://lore.kernel.org/lkml/20240506100509.gl40...@noisy.programming.kicks-ass.net/

Reviewed-by: Phil Auld 
Reviewed-by: Steven Rostedt (Google) 
Signed-off-by: Qais Yousef 
---
 fs/bcachefs/six.c |  2 +-
 fs/select.c   |  2 +-
 include/linux/ioprio.h|  2 +-
 include/linux/sched/deadline.h|  6 --
 include/linux/sched/prio.h|  1 +
 include/linux/sched/rt.h  | 27 ++-
 kernel/locking/rtmutex.c  |  4 ++--
 kernel/locking/rwsem.c|  4 ++--
 kernel/locking/ww_mutex.h |  2 +-
 kernel/sched/core.c   |  4 ++--
 kernel/sched/syscalls.c   |  2 +-
 kernel/time/hrtimer.c |  6 +++---
 kernel/trace/trace_sched_wakeup.c |  2 +-
 mm/page-writeback.c   |  4 ++--
 mm/page_alloc.c   |  2 +-
 15 files changed, 49 insertions(+), 21 deletions(-)

diff --git a/fs/bcachefs/six.c b/fs/bcachefs/six.c
index 3a494c5d1247..b30870bf7e4a 100644
--- a/fs/bcachefs/six.c
+++ b/fs/bcachefs/six.c
@@ -335,7 +335,7 @@ static inline bool six_owner_running(struct six_lock *lock)
 */
rcu_read_lock();
struct task_struct *owner = READ_ONCE(lock->owner);
-   bool ret = owner ? owner_on_cpu(owner) : !rt_task(current);
+   bool ret = owner ? owner_on_cpu(owner) : !realtime_task(current);
rcu_read_unlock();
 
return ret;
diff --git a/fs/select.c b/fs/select.c
index 9515c3fa1a03..8d5c1419416c 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -82,7 +82,7 @@ u64 select_estimate_accuracy(struct timespec64 *tv)
 * Realtime tasks get a slack of 0 for obvious reasons.
 */
 
-   if (rt_task(current))
+   if (realtime_task(current))
return 0;
 
ktime_get_ts64(&now);
diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index db1249cd9692..75859b78d540 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -40,7 +40,7 @@ static inline int task_nice_ioclass(struct task_struct *task)
 {
if (task->policy == SCHED_IDLE)
return IOPRIO_CLASS_IDLE;
-   else if (task_is_realtime(task))
+   else if (realtime_task_policy(task))
return IOPRIO_CLASS_RT;
else
return IOPRIO_CLASS_BE;
diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
index df3aca89d4f5..5cb88b748ad6 100644
--- a/include/linux/sched/deadline.h
+++ b/include/linux/sched/deadline.h
@@ -10,8 +10,6 @@
 
 #include 
 
-#define MAX_DL_PRIO0
-
 static inline int dl_prio(int prio)
 {
if (unlikely(prio < MAX_DL_PRIO))
@@ -19,6 +17,10 @@ static inline int dl_prio(int prio)
return 0;
 }
 
+/*
+ * Returns true if a task has a priority that belongs to DL class. PI-boosted
+ * tasks will return true. Use dl_policy() to ignore PI-boosted tasks.
+ */
 static inline int dl_task(struct task_struct *p)
 {
return dl_prio(p->prio);
diff --git a/include/linux/sched/prio.h b/include/linux/sched/prio.h
index ab83d85e1183..6ab43b4f72f9 100644
--- a/include/linux/sched/prio.h
+++ b/include/linux/sched/prio.h
@@ -14,6 +14,7 @@
  */
 
 #define MAX_RT_PRIO100
+#define MAX_DL_PRIO0
 
 #define MAX_PRIO   (MAX_RT_PRIO + NICE_WIDTH)
 #define DEFAULT_PRIO   (MAX_RT_PRIO + NICE_WIDTH / 2)
diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
index b2b9e6eb9683..a055dd68a77c 100644
--- a/include/linux/sched/rt.h
+++ b/include/linux/sched/rt.h
@@ -7,18 +7,43 @@
 struct task_struct;
 
 static inline int rt_prio(int prio)
+{
+   if (unlikely(prio < MAX_RT_PRIO && prio >= MAX_DL_PRIO))
+   return 1;
+   return 0;
+}
+
+static inline int realtime_prio(int prio)
 {
if (unlikely(prio < MAX_RT_PRIO))
return 1;
re

[PATCH v3 2/3] hrtimer: Convert realtime_task() to realtime_task_policy()

2024-05-27 Thread Qais Yousef

As Sebastian explained in [1], We need only look at the policy to decide
if we need to remove the slack because PI-boosted tasks should not
sleep.

[1] https://lore.kernel.org/lkml/20240521110035.kriwl...@linutronix.de/

Suggested-by: Sebastian Andrzej Siewior 
Signed-off-by: Qais Yousef 
---
 kernel/time/hrtimer.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 89d4da59059d..36086ab46d08 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -2073,7 +2073,7 @@ long hrtimer_nanosleep(ktime_t rqtp, const enum 
hrtimer_mode mode,
u64 slack;
 
slack = current->timer_slack_ns;
-   if (realtime_task(current))
+   if (realtime_task_policy(current))
slack = 0;
 
hrtimer_init_sleeper_on_stack(&t, clockid, mode);
@@ -2278,7 +2278,7 @@ schedule_hrtimeout_range_clock(ktime_t *expires, u64 
delta,
 * Override any slack passed by the user if under
 * rt contraints.
 */
-   if (realtime_task(current))
+   if (realtime_task_policy(current))
delta = 0;
 
hrtimer_init_sleeper_on_stack(&t, clock_id, mode);
-- 
2.34.1

[PATCH v3 3/3] sched/rt, dl: Convert functions to return bool

2024-05-27 Thread Qais Yousef

{rt, realtime, dl}_{task, prio}() functions return value is actually
a bool.  Convert their return type to reflect that.

Suggested-by: Steven Rostedt (Google) 
Signed-off-by: Qais Yousef 
---
 include/linux/sched/deadline.h | 4 ++--
 include/linux/sched/rt.h   | 8 
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
index 5cb88b748ad6..87d2370dd3db 100644
--- a/include/linux/sched/deadline.h
+++ b/include/linux/sched/deadline.h
@@ -10,7 +10,7 @@
 
 #include 
 
-static inline int dl_prio(int prio)
+static inline bool dl_prio(int prio)
 {
if (unlikely(prio < MAX_DL_PRIO))
return 1;
@@ -21,7 +21,7 @@ static inline int dl_prio(int prio)
  * Returns true if a task has a priority that belongs to DL class. PI-boosted
  * tasks will return true. Use dl_policy() to ignore PI-boosted tasks.
  */
-static inline int dl_task(struct task_struct *p)
+static inline bool dl_task(struct task_struct *p)
 {
return dl_prio(p->prio);
 }
diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
index a055dd68a77c..78da97cdac48 100644
--- a/include/linux/sched/rt.h
+++ b/include/linux/sched/rt.h
@@ -6,14 +6,14 @@
 
 struct task_struct;
 
-static inline int rt_prio(int prio)
+static inline bool rt_prio(int prio)
 {
if (unlikely(prio < MAX_RT_PRIO && prio >= MAX_DL_PRIO))
return 1;
return 0;
 }
 
-static inline int realtime_prio(int prio)
+static inline bool realtime_prio(int prio)
 {
if (unlikely(prio < MAX_RT_PRIO))
return 1;
@@ -24,7 +24,7 @@ static inline int realtime_prio(int prio)
  * Returns true if a task has a priority that belongs to RT class. PI-boosted
  * tasks will return true. Use rt_policy() to ignore PI-boosted tasks.
  */
-static inline int rt_task(struct task_struct *p)
+static inline bool rt_task(struct task_struct *p)
 {
return rt_prio(p->prio);
 }
@@ -34,7 +34,7 @@ static inline int rt_task(struct task_struct *p)
  * PI-boosted tasks will return true. Use realtime_task_policy() to ignore
  * PI-boosted tasks.
  */
-static inline int realtime_task(struct task_struct *p)
+static inline bool realtime_task(struct task_struct *p)
 {
return realtime_prio(p->prio);
 }
-- 
2.34.1

Re: [PATCH v3 3/3] sched/rt, dl: Convert functions to return bool

2024-05-29 Thread Qais Yousef

On 05/29/24 09:34, Sebastian Andrzej Siewior wrote:
> On 2024-05-28 00:45:08 [+0100], Qais Yousef wrote:
> > diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
> > index 5cb88b748ad6..87d2370dd3db 100644
> > --- a/include/linux/sched/deadline.h
> > +++ b/include/linux/sched/deadline.h
> > @@ -10,7 +10,7 @@
> >  
> >  #include 
> >  
> > -static inline int dl_prio(int prio)
> > +static inline bool dl_prio(int prio)
> >  {
> > if (unlikely(prio < MAX_DL_PRIO))
> > return 1;
> 
> if we return a bool we should return true/ false.

Right. Fixed.

Thanks!

--
Qais Yousef

Re: [PATCH v2] sched/rt: Clean up usage of rt_task()

2024-05-29 Thread Qais Yousef

On 05/29/24 10:29, Sebastian Andrzej Siewior wrote:
> On 2024-05-27 18:26:50 [+0100], Qais Yousef wrote:
> > > In order to be PI-boosted you need to acquire a lock and the only lock
> > > you can sleep while acquired without generating a warning is a mutex_t
> > > (or equivalent sleeping lock) on PREEMPT_RT. 
> > 
> > Note we care about the behavior for !PREEMPT_RT. PI issues are important 
> > there
> > too. I assume the fact the PREEMPT_RT changes the locks behavior is what 
> > you're
> > referring to here and not applicable to normal case.
> 
> So for !PREEMPT_RT you need a rtmutex for PI. RCU and i2c is using it
> within the kernel and this shouldn't go via the `slack' API.
> 
> The FUTEX API on the other hand is a different story and it might
> matter. So you have one task running SCHED_OTHER and acquiring a lock in
> userspace (pthread_mutex_t, PTHREAD_PRIO_INHERIT). Another task running
> at SCHED_FIFO/ RR/ DL would also acquire that lock, block on it and
> then inherit its priority.
> This is the point where the former task has a different policy vs
> priority considering PI-boosting. You could argue that the task
> shouldn't sleep or invoke anything possible sleeping with a timeout > 0
> because it is using an important lock.
> But then it is userland and has the freedom to do whatever it wants you
> know…

Yes..

> 
> So it might be better to forget what I said and keeping the current

Okay I'll drop the patch then in next posting.

> behaviour. But then it is insistent which matters only in the RT case.
> Puh. Any sched folks regarding policy?

I am not sure I understood you here. Could you rephrase please?

Re: [PATCH v2] sched/rt: Clean up usage of rt_task()

2024-05-30 Thread Qais Yousef

On 05/29/24 12:55, Sebastian Andrzej Siewior wrote:
> On 2024-05-29 11:34:09 [+0100], Qais Yousef wrote:
> > > behaviour. But then it is insistent which matters only in the RT case.
> > > Puh. Any sched folks regarding policy?
> > 
> > I am not sure I understood you here. Could you rephrase please?
> 
> Right now a SCHED_OTHER task boosted to a realtime priority gets
> slack=0. In the !RT scenario everything is fine.
> For RT the slack=0 also happens but the init of the timer looks at the
> policy instead at the possible boosted priority and uses a different
> clock attribute. This can lead to a delayed wake up (so avoiding the
> slack does not solve the problem).
> 
> This is not consistent because IMHO the clock setup & slack should be
> handled equally. So I am asking the sched folks for a policy and I am
> leaning towards looking at task-policy in this case instead of prio
> because you shouldn't do anything that can delay.

Can't we do that based on is_soft/is_hard flag in hrtimer struct when we apply
the slack in hrtimer_set_expires_range_ns() instead?

(not compile tested even)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index aa1e65ccb615..e001f20bbea9 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -102,12 +102,16 @@ static inline void hrtimer_set_expires(struct hrtimer 
*timer, ktime_t time)
 
 static inline void hrtimer_set_expires_range(struct hrtimer *timer, ktime_t 
time, ktime_t delta)
 {
+   if (timer->is_soft || timer->is_hard)
+   delta = 0;
timer->_softexpires = time;
timer->node.expires = ktime_add_safe(time, delta);
 }
 
 static inline void hrtimer_set_expires_range_ns(struct hrtimer *timer, ktime_t 
time, u64 delta)
 {
+   if (timer->is_soft || timer->is_hard)
+   delta = 0;
timer->_softexpires = time;
timer->node.expires = ktime_add_safe(time, ns_to_ktime(delta));
 }

[PATCH v4 0/2] Clean up usage of rt_task()

2024-06-01 Thread Qais Yousef

Make rt_task() return true only for RT class and add new realtime_task() to
return true for RT and DL classes to avoid some confusion the old API can
cause.

No functional changes intended in patch 1. Patch 2 cleans up the return type as
suggested by Steve.

Changes since v3:

* Make sure the 'new' bool functions return true/false instead of 1/0.
* Drop patch 2 about hrtimer usage of realtime_task() as ongoing
  discussion on v1 indicates its scope outside of this simple cleanup.

Changes since v2:

* Fix one user that should use realtime_task() but remained using
  rt_task() (Sebastian)
* New patch to convert all hrtimer users to use realtime_task_policy()
  (Sebastian)
* Add a new patch to convert return type to bool (Steve)
* Rebase on tip/sched/core and handle a conflict with code shuffle to
  syscalls.c
* Add Reviewed-by Steve

Changes since v1:

* Use realtime_task_policy() instead task_has_realtime_policy() (Peter)
* Improve commit message readability about replace some rt_task()
  users.

v1 discussion: 
https://lore.kernel.org/lkml/20240514234112.792989-1-qyou...@layalina.io/
v2 discussion: 
https://lore.kernel.org/lkml/20240515220536.823145-1-qyou...@layalina.io/
v3 discussion: 
https://lore.kernel.org/lkml/20240527234508.1062360-1-qyou...@layalina.io/

Qais Yousef (2):
  sched/rt: Clean up usage of rt_task()
  sched/rt, dl: Convert functions to return bool

 fs/bcachefs/six.c |  2 +-
 fs/select.c   |  2 +-
 include/linux/ioprio.h|  2 +-
 include/linux/sched/deadline.h| 14 +++--
 include/linux/sched/prio.h|  1 +
 include/linux/sched/rt.h  | 35 ++-
 kernel/locking/rtmutex.c  |  4 ++--
 kernel/locking/rwsem.c|  4 ++--
 kernel/locking/ww_mutex.h |  2 +-
 kernel/sched/core.c   |  4 ++--
 kernel/sched/syscalls.c   |  2 +-
 kernel/time/hrtimer.c |  6 +++---
 kernel/trace/trace_sched_wakeup.c |  2 +-
 mm/page-writeback.c   |  4 ++--
 mm/page_alloc.c   |  2 +-
 15 files changed, 57 insertions(+), 29 deletions(-)

-- 
2.34.1

[PATCH v4 1/2] sched/rt: Clean up usage of rt_task()

2024-06-01 Thread Qais Yousef

rt_task() checks if a task has RT priority. But depends on your
dictionary, this could mean it belongs to RT class, or is a 'realtime'
task, which includes RT and DL classes.

Since this has caused some confusion already on discussion [1], it
seemed a clean up is due.

I define the usage of rt_task() to be tasks that belong to RT class.
Make sure that it returns true only for RT class and audit the users and
replace the ones required the old behavior with the new realtime_task()
which returns true for RT and DL classes. Introduce similar
realtime_prio() to create similar distinction to rt_prio() and update
the users that required the old behavior to use the new function.

Move MAX_DL_PRIO to prio.h so it can be used in the new definitions.

Document the functions to make it more obvious what is the difference
between them. PI-boosted tasks is a factor that must be taken into
account when choosing which function to use.

Rename task_is_realtime() to realtime_task_policy() as the old name is
confusing against the new realtime_task().

No functional changes were intended.

[1] 
https://lore.kernel.org/lkml/20240506100509.gl40...@noisy.programming.kicks-ass.net/

Reviewed-by: Phil Auld 
Reviewed-by: Steven Rostedt (Google) 
Signed-off-by: Qais Yousef 
---
 fs/bcachefs/six.c |  2 +-
 fs/select.c   |  2 +-
 include/linux/ioprio.h|  2 +-
 include/linux/sched/deadline.h|  6 --
 include/linux/sched/prio.h|  1 +
 include/linux/sched/rt.h  | 27 ++-
 kernel/locking/rtmutex.c  |  4 ++--
 kernel/locking/rwsem.c|  4 ++--
 kernel/locking/ww_mutex.h |  2 +-
 kernel/sched/core.c   |  4 ++--
 kernel/sched/syscalls.c   |  2 +-
 kernel/time/hrtimer.c |  6 +++---
 kernel/trace/trace_sched_wakeup.c |  2 +-
 mm/page-writeback.c   |  4 ++--
 mm/page_alloc.c   |  2 +-
 15 files changed, 49 insertions(+), 21 deletions(-)

diff --git a/fs/bcachefs/six.c b/fs/bcachefs/six.c
index 3a494c5d1247..b30870bf7e4a 100644
--- a/fs/bcachefs/six.c
+++ b/fs/bcachefs/six.c
@@ -335,7 +335,7 @@ static inline bool six_owner_running(struct six_lock *lock)
 */
rcu_read_lock();
struct task_struct *owner = READ_ONCE(lock->owner);
-   bool ret = owner ? owner_on_cpu(owner) : !rt_task(current);
+   bool ret = owner ? owner_on_cpu(owner) : !realtime_task(current);
rcu_read_unlock();
 
return ret;
diff --git a/fs/select.c b/fs/select.c
index 9515c3fa1a03..8d5c1419416c 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -82,7 +82,7 @@ u64 select_estimate_accuracy(struct timespec64 *tv)
 * Realtime tasks get a slack of 0 for obvious reasons.
 */
 
-   if (rt_task(current))
+   if (realtime_task(current))
return 0;
 
ktime_get_ts64(&now);
diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index db1249cd9692..75859b78d540 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -40,7 +40,7 @@ static inline int task_nice_ioclass(struct task_struct *task)
 {
if (task->policy == SCHED_IDLE)
return IOPRIO_CLASS_IDLE;
-   else if (task_is_realtime(task))
+   else if (realtime_task_policy(task))
return IOPRIO_CLASS_RT;
else
return IOPRIO_CLASS_BE;
diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
index df3aca89d4f5..5cb88b748ad6 100644
--- a/include/linux/sched/deadline.h
+++ b/include/linux/sched/deadline.h
@@ -10,8 +10,6 @@
 
 #include 
 
-#define MAX_DL_PRIO0
-
 static inline int dl_prio(int prio)
 {
if (unlikely(prio < MAX_DL_PRIO))
@@ -19,6 +17,10 @@ static inline int dl_prio(int prio)
return 0;
 }
 
+/*
+ * Returns true if a task has a priority that belongs to DL class. PI-boosted
+ * tasks will return true. Use dl_policy() to ignore PI-boosted tasks.
+ */
 static inline int dl_task(struct task_struct *p)
 {
return dl_prio(p->prio);
diff --git a/include/linux/sched/prio.h b/include/linux/sched/prio.h
index ab83d85e1183..6ab43b4f72f9 100644
--- a/include/linux/sched/prio.h
+++ b/include/linux/sched/prio.h
@@ -14,6 +14,7 @@
  */
 
 #define MAX_RT_PRIO100
+#define MAX_DL_PRIO0
 
 #define MAX_PRIO   (MAX_RT_PRIO + NICE_WIDTH)
 #define DEFAULT_PRIO   (MAX_RT_PRIO + NICE_WIDTH / 2)
diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
index b2b9e6eb9683..a055dd68a77c 100644
--- a/include/linux/sched/rt.h
+++ b/include/linux/sched/rt.h
@@ -7,18 +7,43 @@
 struct task_struct;
 
 static inline int rt_prio(int prio)
+{
+   if (unlikely(prio < MAX_RT_PRIO && prio >= MAX_DL_PRIO))
+   return 1;
+   return 0;
+}
+
+static inline int realtime_prio(int prio)
 {
if (unlikely(prio < MAX_RT_PRIO))
return 1;
re

[PATCH v4 2/2] sched/rt, dl: Convert functions to return bool

2024-06-01 Thread Qais Yousef

{rt, realtime, dl}_{task, prio}() functions return value is actually
a bool.  Convert their return type to reflect that.

Suggested-by: Steven Rostedt (Google) 
Signed-off-by: Qais Yousef 
---
 include/linux/sched/deadline.h |  8 
 include/linux/sched/rt.h   | 16 
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
index 5cb88b748ad6..f2053f46f1d5 100644
--- a/include/linux/sched/deadline.h
+++ b/include/linux/sched/deadline.h
@@ -10,18 +10,18 @@
 
 #include 
 
-static inline int dl_prio(int prio)
+static inline bool dl_prio(int prio)
 {
if (unlikely(prio < MAX_DL_PRIO))
-   return 1;
-   return 0;
+   return true;
+   return false;
 }
 
 /*
  * Returns true if a task has a priority that belongs to DL class. PI-boosted
  * tasks will return true. Use dl_policy() to ignore PI-boosted tasks.
  */
-static inline int dl_task(struct task_struct *p)
+static inline bool dl_task(struct task_struct *p)
 {
return dl_prio(p->prio);
 }
diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
index a055dd68a77c..efbdd2e57765 100644
--- a/include/linux/sched/rt.h
+++ b/include/linux/sched/rt.h
@@ -6,25 +6,25 @@
 
 struct task_struct;
 
-static inline int rt_prio(int prio)
+static inline bool rt_prio(int prio)
 {
if (unlikely(prio < MAX_RT_PRIO && prio >= MAX_DL_PRIO))
-   return 1;
-   return 0;
+   return true;
+   return false;
 }
 
-static inline int realtime_prio(int prio)
+static inline bool realtime_prio(int prio)
 {
if (unlikely(prio < MAX_RT_PRIO))
-   return 1;
-   return 0;
+   return true;
+   return false;
 }
 
 /*
  * Returns true if a task has a priority that belongs to RT class. PI-boosted
  * tasks will return true. Use rt_policy() to ignore PI-boosted tasks.
  */
-static inline int rt_task(struct task_struct *p)
+static inline bool rt_task(struct task_struct *p)
 {
return rt_prio(p->prio);
 }
@@ -34,7 +34,7 @@ static inline int rt_task(struct task_struct *p)
  * PI-boosted tasks will return true. Use realtime_task_policy() to ignore
  * PI-boosted tasks.
  */
-static inline int realtime_task(struct task_struct *p)
+static inline bool realtime_task(struct task_struct *p)
 {
return realtime_prio(p->prio);
 }
-- 
2.34.1

Re: [PATCH v2] sched/rt: Clean up usage of rt_task()

2024-06-01 Thread Qais Yousef

On 05/31/24 08:30, Sebastian Andrzej Siewior wrote:
> On 2024-05-30 12:10:44 [+0100], Qais Yousef wrote:
> > > This is not consistent because IMHO the clock setup & slack should be
> > > handled equally. So I am asking the sched folks for a policy and I am
> > > leaning towards looking at task-policy in this case instead of prio
> > > because you shouldn't do anything that can delay.
> > 
> > Can't we do that based on is_soft/is_hard flag in hrtimer struct when we 
> > apply
> > the slack in hrtimer_set_expires_range_ns() instead?
> 
> We need to decide on a policy first.
> You don't want to add overhead on each invocation plus some in-kernel
> ask for delta. ->is_soft is not a good criteria.

The suggestion was not to check policy or priority but dictate the behavior
based on hrtimer properties set at init. Which agrees with your thinking.

I think the check for policy at init to enforce a behavior is borderline hack
by the way and better to get an explicit request from a task and let the
HRTIMER comply adequately without checking for policies anywhere. Whether this
property should be inherited or not is a good question and I think is the
policy you're after here. Proxy Execution would probably handle all this
inheritance problems transparently, so we need not think much about it if we
agree proxy execution is the way to deal with all those inheritance problems.
There are always exceptions though.. So worth a think.

I'll drop this patch as this discussion has diverged to something else now.
I'll leave this here until others get a chance to comment with their views too.

Cheers

--
Qais Yousef

Re: [PATCH v4 2/2] sched/rt, dl: Convert functions to return bool

2024-06-04 Thread Qais Yousef

On 06/03/24 08:33, Metin Kaya wrote:
> On 01/06/2024 10:33 pm, Qais Yousef wrote:
> > {rt, realtime, dl}_{task, prio}() functions return value is actually
> > a bool.  Convert their return type to reflect that.
> > 
> > Suggested-by: Steven Rostedt (Google) 
> > Signed-off-by: Qais Yousef 
> > ---
> >   include/linux/sched/deadline.h |  8 
> >   include/linux/sched/rt.h   | 16 
> >   2 files changed, 12 insertions(+), 12 deletions(-)
> > 
> > diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
> > index 5cb88b748ad6..f2053f46f1d5 100644
> > --- a/include/linux/sched/deadline.h
> > +++ b/include/linux/sched/deadline.h
> > @@ -10,18 +10,18 @@
> >   #include 
> > -static inline int dl_prio(int prio)
> > +static inline bool dl_prio(int prio)
> >   {
> > if (unlikely(prio < MAX_DL_PRIO))
> > -   return 1;
> > -   return 0;
> > +   return true;
> > +   return false;
> 
> Nit: `return unlikely(prio < MAX_DL_PRIO)` would be simpler.
> The same can be applied to rt_prio() and realtime_prio(). This would make
> {dl, rt, realtime}_task() single-liner. Maybe further simplification can be
> done.

Fair. Thanks.

> 
> >   }
> >   /*
> >* Returns true if a task has a priority that belongs to DL class. 
> > PI-boosted
> >* tasks will return true. Use dl_policy() to ignore PI-boosted tasks.
> >*/
> > -static inline int dl_task(struct task_struct *p)
> > +static inline bool dl_task(struct task_struct *p)
> >   {
> > return dl_prio(p->prio);
> >   }
> > diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
> > index a055dd68a77c..efbdd2e57765 100644
> > --- a/include/linux/sched/rt.h
> > +++ b/include/linux/sched/rt.h
> > @@ -6,25 +6,25 @@
> >   struct task_struct;
> > -static inline int rt_prio(int prio)
> > +static inline bool rt_prio(int prio)
> >   {
> > if (unlikely(prio < MAX_RT_PRIO && prio >= MAX_DL_PRIO))
> > -   return 1;
> > -   return 0;
> > +   return true;
> > +   return false;
> >   }
> > -static inline int realtime_prio(int prio)
> > +static inline bool realtime_prio(int prio)
> >   {
> > if (unlikely(prio < MAX_RT_PRIO))
> > -   return 1;
> > -   return 0;
> > +   return true;
> > +   return false;
> >   }
> >   /*
> >* Returns true if a task has a priority that belongs to RT class. 
> > PI-boosted
> >* tasks will return true. Use rt_policy() to ignore PI-boosted tasks.
> >*/
> > -static inline int rt_task(struct task_struct *p)
> > +static inline bool rt_task(struct task_struct *p)
> >   {
> > return rt_prio(p->prio);
> >   }
> > @@ -34,7 +34,7 @@ static inline int rt_task(struct task_struct *p)
> >* PI-boosted tasks will return true. Use realtime_task_policy() to ignore
> >* PI-boosted tasks.
> >*/
> > -static inline int realtime_task(struct task_struct *p)
> > +static inline bool realtime_task(struct task_struct *p)
> >   {
> > return realtime_prio(p->prio);
> >   }
>

[PATCH v5 0/2] Clean up usage of rt_task()

2024-06-04 Thread Qais Yousef

Make rt_task() return true only for RT class and add new realtime_task() to
return true for RT and DL classes to avoid some confusion the old API can
cause.

No functional changes intended in patch 1. Patch 2 cleans up the return type as
suggested by Steve.

Changes since v4:

* Simplify return of rt/realtime_prio() as the explicit true/false was
  not necessary (Metin).

Changes since v3:

* Make sure the 'new' bool functions return true/false instead of 1/0.
* Drop patch 2 about hrtimer usage of realtime_task() as ongoing
  discussion on v1 indicates its scope outside of this simple cleanup.

Changes since v2:

* Fix one user that should use realtime_task() but remained using
  rt_task() (Sebastian)
* New patch to convert all hrtimer users to use realtime_task_policy()
  (Sebastian)
* Add a new patch to convert return type to bool (Steve)
* Rebase on tip/sched/core and handle a conflict with code shuffle to
  syscalls.c
* Add Reviewed-by Steve

Changes since v1:

* Use realtime_task_policy() instead task_has_realtime_policy() (Peter)
* Improve commit message readability about replace some rt_task()
  users.

v1 discussion: 
https://lore.kernel.org/lkml/20240514234112.792989-1-qyou...@layalina.io/
v2 discussion: 
https://lore.kernel.org/lkml/20240515220536.823145-1-qyou...@layalina.io/
v3 discussion: 
https://lore.kernel.org/lkml/20240527234508.1062360-1-qyou...@layalina.io/
v4 discussion: 
https://lore.kernel.org/lkml/20240601213309.1262206-1-qyou...@layalina.io/

Qais Yousef (2):
  sched/rt: Clean up usage of rt_task()
  sched/rt, dl: Convert functions to return bool

 fs/bcachefs/six.c |  2 +-
 fs/select.c   |  2 +-
 include/linux/ioprio.h|  2 +-
 include/linux/sched/deadline.h| 14 ++---
 include/linux/sched/prio.h|  1 +
 include/linux/sched/rt.h  | 33 +--
 kernel/locking/rtmutex.c  |  4 ++--
 kernel/locking/rwsem.c|  4 ++--
 kernel/locking/ww_mutex.h |  2 +-
 kernel/sched/core.c   |  4 ++--
 kernel/sched/syscalls.c   |  2 +-
 kernel/time/hrtimer.c |  6 +++---
 kernel/trace/trace_sched_wakeup.c |  2 +-
 mm/page-writeback.c   |  4 ++--
 mm/page_alloc.c   |  2 +-
 15 files changed, 53 insertions(+), 31 deletions(-)

-- 
2.34.1

[PATCH v5 1/2] sched/rt: Clean up usage of rt_task()

2024-06-04 Thread Qais Yousef

rt_task() checks if a task has RT priority. But depends on your
dictionary, this could mean it belongs to RT class, or is a 'realtime'
task, which includes RT and DL classes.

Since this has caused some confusion already on discussion [1], it
seemed a clean up is due.

I define the usage of rt_task() to be tasks that belong to RT class.
Make sure that it returns true only for RT class and audit the users and
replace the ones required the old behavior with the new realtime_task()
which returns true for RT and DL classes. Introduce similar
realtime_prio() to create similar distinction to rt_prio() and update
the users that required the old behavior to use the new function.

Move MAX_DL_PRIO to prio.h so it can be used in the new definitions.

Document the functions to make it more obvious what is the difference
between them. PI-boosted tasks is a factor that must be taken into
account when choosing which function to use.

Rename task_is_realtime() to realtime_task_policy() as the old name is
confusing against the new realtime_task().

No functional changes were intended.

[1] 
https://lore.kernel.org/lkml/20240506100509.gl40...@noisy.programming.kicks-ass.net/

Reviewed-by: Phil Auld 
Reviewed-by: Steven Rostedt (Google) 
Signed-off-by: Qais Yousef 
---
 fs/bcachefs/six.c |  2 +-
 fs/select.c   |  2 +-
 include/linux/ioprio.h|  2 +-
 include/linux/sched/deadline.h|  6 --
 include/linux/sched/prio.h|  1 +
 include/linux/sched/rt.h  | 27 ++-
 kernel/locking/rtmutex.c  |  4 ++--
 kernel/locking/rwsem.c|  4 ++--
 kernel/locking/ww_mutex.h |  2 +-
 kernel/sched/core.c   |  4 ++--
 kernel/sched/syscalls.c   |  2 +-
 kernel/time/hrtimer.c |  6 +++---
 kernel/trace/trace_sched_wakeup.c |  2 +-
 mm/page-writeback.c   |  4 ++--
 mm/page_alloc.c   |  2 +-
 15 files changed, 49 insertions(+), 21 deletions(-)

diff --git a/fs/bcachefs/six.c b/fs/bcachefs/six.c
index 3a494c5d1247..b30870bf7e4a 100644
--- a/fs/bcachefs/six.c
+++ b/fs/bcachefs/six.c
@@ -335,7 +335,7 @@ static inline bool six_owner_running(struct six_lock *lock)
 */
rcu_read_lock();
struct task_struct *owner = READ_ONCE(lock->owner);
-   bool ret = owner ? owner_on_cpu(owner) : !rt_task(current);
+   bool ret = owner ? owner_on_cpu(owner) : !realtime_task(current);
rcu_read_unlock();
 
return ret;
diff --git a/fs/select.c b/fs/select.c
index 9515c3fa1a03..8d5c1419416c 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -82,7 +82,7 @@ u64 select_estimate_accuracy(struct timespec64 *tv)
 * Realtime tasks get a slack of 0 for obvious reasons.
 */
 
-   if (rt_task(current))
+   if (realtime_task(current))
return 0;
 
ktime_get_ts64(&now);
diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index db1249cd9692..75859b78d540 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -40,7 +40,7 @@ static inline int task_nice_ioclass(struct task_struct *task)
 {
if (task->policy == SCHED_IDLE)
return IOPRIO_CLASS_IDLE;
-   else if (task_is_realtime(task))
+   else if (realtime_task_policy(task))
return IOPRIO_CLASS_RT;
else
return IOPRIO_CLASS_BE;
diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
index df3aca89d4f5..5cb88b748ad6 100644
--- a/include/linux/sched/deadline.h
+++ b/include/linux/sched/deadline.h
@@ -10,8 +10,6 @@
 
 #include 
 
-#define MAX_DL_PRIO0
-
 static inline int dl_prio(int prio)
 {
if (unlikely(prio < MAX_DL_PRIO))
@@ -19,6 +17,10 @@ static inline int dl_prio(int prio)
return 0;
 }
 
+/*
+ * Returns true if a task has a priority that belongs to DL class. PI-boosted
+ * tasks will return true. Use dl_policy() to ignore PI-boosted tasks.
+ */
 static inline int dl_task(struct task_struct *p)
 {
return dl_prio(p->prio);
diff --git a/include/linux/sched/prio.h b/include/linux/sched/prio.h
index ab83d85e1183..6ab43b4f72f9 100644
--- a/include/linux/sched/prio.h
+++ b/include/linux/sched/prio.h
@@ -14,6 +14,7 @@
  */
 
 #define MAX_RT_PRIO100
+#define MAX_DL_PRIO0
 
 #define MAX_PRIO   (MAX_RT_PRIO + NICE_WIDTH)
 #define DEFAULT_PRIO   (MAX_RT_PRIO + NICE_WIDTH / 2)
diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
index b2b9e6eb9683..a055dd68a77c 100644
--- a/include/linux/sched/rt.h
+++ b/include/linux/sched/rt.h
@@ -7,18 +7,43 @@
 struct task_struct;
 
 static inline int rt_prio(int prio)
+{
+   if (unlikely(prio < MAX_RT_PRIO && prio >= MAX_DL_PRIO))
+   return 1;
+   return 0;
+}
+
+static inline int realtime_prio(int prio)
 {
if (unlikely(prio < MAX_RT_PRIO))
return 1;
re

[PATCH v5 2/2] sched/rt, dl: Convert functions to return bool

2024-06-04 Thread Qais Yousef

{rt, realtime, dl}_{task, prio}() functions return value is actually
a bool.  Convert their return type to reflect that.

Suggested-by: Steven Rostedt (Google) 
Signed-off-by: Qais Yousef 
---
 include/linux/sched/deadline.h |  8 +++-
 include/linux/sched/rt.h   | 16 ++--
 2 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
index 5cb88b748ad6..3a912ab42bb5 100644
--- a/include/linux/sched/deadline.h
+++ b/include/linux/sched/deadline.h
@@ -10,18 +10,16 @@
 
 #include 
 
-static inline int dl_prio(int prio)
+static inline bool dl_prio(int prio)
 {
-   if (unlikely(prio < MAX_DL_PRIO))
-   return 1;
-   return 0;
+   return unlikely(prio < MAX_DL_PRIO);
 }
 
 /*
  * Returns true if a task has a priority that belongs to DL class. PI-boosted
  * tasks will return true. Use dl_policy() to ignore PI-boosted tasks.
  */
-static inline int dl_task(struct task_struct *p)
+static inline bool dl_task(struct task_struct *p)
 {
return dl_prio(p->prio);
 }
diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
index a055dd68a77c..91ef1ef2019f 100644
--- a/include/linux/sched/rt.h
+++ b/include/linux/sched/rt.h
@@ -6,25 +6,21 @@
 
 struct task_struct;
 
-static inline int rt_prio(int prio)
+static inline bool rt_prio(int prio)
 {
-   if (unlikely(prio < MAX_RT_PRIO && prio >= MAX_DL_PRIO))
-   return 1;
-   return 0;
+   return unlikely(prio < MAX_RT_PRIO && prio >= MAX_DL_PRIO);
 }
 
-static inline int realtime_prio(int prio)
+static inline bool realtime_prio(int prio)
 {
-   if (unlikely(prio < MAX_RT_PRIO))
-   return 1;
-   return 0;
+   return unlikely(prio < MAX_RT_PRIO);
 }
 
 /*
  * Returns true if a task has a priority that belongs to RT class. PI-boosted
  * tasks will return true. Use rt_policy() to ignore PI-boosted tasks.
  */
-static inline int rt_task(struct task_struct *p)
+static inline bool rt_task(struct task_struct *p)
 {
return rt_prio(p->prio);
 }
@@ -34,7 +30,7 @@ static inline int rt_task(struct task_struct *p)
  * PI-boosted tasks will return true. Use realtime_task_policy() to ignore
  * PI-boosted tasks.
  */
-static inline int realtime_task(struct task_struct *p)
+static inline bool realtime_task(struct task_struct *p)
 {
return realtime_prio(p->prio);
 }
-- 
2.34.1

Re: [PATCH v5 1/2] sched/rt: Clean up usage of rt_task()

2024-06-05 Thread Qais Yousef

On 06/05/24 11:32, Sebastian Andrzej Siewior wrote:
> On 2024-06-04 17:57:46 [+0200], Daniel Bristot de Oliveira wrote:
> > On 6/4/24 16:42, Qais Yousef wrote:
> > > - (wakeup_rt && !dl_task(p) && !rt_task(p)) ||
> > > + (wakeup_rt && !realtime_task(p)) ||
> > 
> > I do not like bikeshedding, and no hard feelings...

No hard feelings :-)

> > 
> > But rt is a shortened version of realtime, and so it is making *it less*
> > clear that we also have DL here.
> 
> Can SCHED_DL be considered a real-time scheduling class as in opposite
> to SCHED_BATCH for instance? Due to its requirements it fits for a real
> time scheduling class, right?
> And RT (as in real time) already includes SCHED_RR and SCHED_FIFO.

Yeah I think the usage of realtime to cover both makes sense. I followed your
precedence with task_is_realtime().

Anyway. If people really find this confusing, what would make sense is to split
them and ask users to call rt_task() and dl_task() explicitly without this
wrapper. I personally like it better with the wrapper. But happy to follow the
crowd.

[PATCH v6 1/3] sched/rt: Clean up usage of rt_task()

2024-06-10 Thread Qais Yousef

rt_task() checks if a task has RT priority. But depends on your
dictionary, this could mean it belongs to RT class, or is a 'realtime'
task, which includes RT and DL classes.

Since this has caused some confusion already on discussion [1], it
seemed a clean up is due.

I define the usage of rt_task() to be tasks that belong to RT class.
Make sure that it returns true only for RT class and audit the users and
replace the ones required the old behavior with the new realtime_task()
which returns true for RT and DL classes. Introduce similar
realtime_prio() to create similar distinction to rt_prio() and update
the users that required the old behavior to use the new function.

Move MAX_DL_PRIO to prio.h so it can be used in the new definitions.

Document the functions to make it more obvious what is the difference
between them. PI-boosted tasks is a factor that must be taken into
account when choosing which function to use.

Rename task_is_realtime() to realtime_task_policy() as the old name is
confusing against the new realtime_task().

No functional changes were intended.

[1] 
https://lore.kernel.org/lkml/20240506100509.gl40...@noisy.programming.kicks-ass.net/

Reviewed-by: Phil Auld 
Reviewed-by: Steven Rostedt (Google) 
Reviewed-by: Sebastian Andrzej Siewior 
Signed-off-by: Qais Yousef 
---
 fs/bcachefs/six.c |  2 +-
 fs/select.c   |  2 +-
 include/linux/ioprio.h|  2 +-
 include/linux/sched/deadline.h|  6 --
 include/linux/sched/prio.h|  1 +
 include/linux/sched/rt.h  | 27 ++-
 kernel/locking/rtmutex.c  |  4 ++--
 kernel/locking/rwsem.c|  4 ++--
 kernel/locking/ww_mutex.h |  2 +-
 kernel/sched/core.c   |  4 ++--
 kernel/sched/syscalls.c   |  2 +-
 kernel/time/hrtimer.c |  6 +++---
 kernel/trace/trace_sched_wakeup.c |  2 +-
 mm/page-writeback.c   |  4 ++--
 mm/page_alloc.c   |  2 +-
 15 files changed, 49 insertions(+), 21 deletions(-)

diff --git a/fs/bcachefs/six.c b/fs/bcachefs/six.c
index 3a494c5d1247..b30870bf7e4a 100644
--- a/fs/bcachefs/six.c
+++ b/fs/bcachefs/six.c
@@ -335,7 +335,7 @@ static inline bool six_owner_running(struct six_lock *lock)
 */
rcu_read_lock();
struct task_struct *owner = READ_ONCE(lock->owner);
-   bool ret = owner ? owner_on_cpu(owner) : !rt_task(current);
+   bool ret = owner ? owner_on_cpu(owner) : !realtime_task(current);
rcu_read_unlock();
 
return ret;
diff --git a/fs/select.c b/fs/select.c
index 9515c3fa1a03..8d5c1419416c 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -82,7 +82,7 @@ u64 select_estimate_accuracy(struct timespec64 *tv)
 * Realtime tasks get a slack of 0 for obvious reasons.
 */
 
-   if (rt_task(current))
+   if (realtime_task(current))
return 0;
 
ktime_get_ts64(&now);
diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index db1249cd9692..75859b78d540 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -40,7 +40,7 @@ static inline int task_nice_ioclass(struct task_struct *task)
 {
if (task->policy == SCHED_IDLE)
return IOPRIO_CLASS_IDLE;
-   else if (task_is_realtime(task))
+   else if (realtime_task_policy(task))
return IOPRIO_CLASS_RT;
else
return IOPRIO_CLASS_BE;
diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
index df3aca89d4f5..5cb88b748ad6 100644
--- a/include/linux/sched/deadline.h
+++ b/include/linux/sched/deadline.h
@@ -10,8 +10,6 @@
 
 #include 
 
-#define MAX_DL_PRIO0
-
 static inline int dl_prio(int prio)
 {
if (unlikely(prio < MAX_DL_PRIO))
@@ -19,6 +17,10 @@ static inline int dl_prio(int prio)
return 0;
 }
 
+/*
+ * Returns true if a task has a priority that belongs to DL class. PI-boosted
+ * tasks will return true. Use dl_policy() to ignore PI-boosted tasks.
+ */
 static inline int dl_task(struct task_struct *p)
 {
return dl_prio(p->prio);
diff --git a/include/linux/sched/prio.h b/include/linux/sched/prio.h
index ab83d85e1183..6ab43b4f72f9 100644
--- a/include/linux/sched/prio.h
+++ b/include/linux/sched/prio.h
@@ -14,6 +14,7 @@
  */
 
 #define MAX_RT_PRIO100
+#define MAX_DL_PRIO0
 
 #define MAX_PRIO   (MAX_RT_PRIO + NICE_WIDTH)
 #define DEFAULT_PRIO   (MAX_RT_PRIO + NICE_WIDTH / 2)
diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
index b2b9e6eb9683..a055dd68a77c 100644
--- a/include/linux/sched/rt.h
+++ b/include/linux/sched/rt.h
@@ -7,18 +7,43 @@
 struct task_struct;
 
 static inline int rt_prio(int prio)
+{
+   if (unlikely(prio < MAX_RT_PRIO && prio >= MAX_DL_PRIO))
+   return 1;
+   return 0;
+}
+
+static inline int realtime_prio(int prio)
 {
if (unlikely(prio < MAX_RT_PRIO

[PATCH v6 0/3] Clean up usage of rt_task()

2024-06-10 Thread Qais Yousef

Make rt_task() return true only for RT class and add new realtime_task() to
return true for RT and DL classes to avoid some confusion the old API can
cause.

No functional changes intended in patch 1. Patch 2 cleans up the return type as
suggested by Steve. Patch 3 uses rt_or_dl() instead of 'realtime' as suggested
by Daniel. As the name was debatable, I'll leave up to the maintainers to pick
their preference.

Changes since v5:

* Added a new patch to s/realtime/rt_or_dl/ as suggested by Daniel.
* Added Reviewed-bys.

Changes since v4:

* Simplify return of rt/realtime_prio() as the explicit true/false was
  not necessary.

Changes since v3:

* Make sure the 'new' bool functions return true/false instead of 1/0.
* Drop patch 2 about hrtimer usage of realtime_task() as ongoing
  discussion on v1 indicates its scope outside of this simple cleanup.

Changes since v2:

* Fix one user that should use realtime_task() but remained using
  rt_task() (Sebastian)
* New patch to convert all hrtimer users to use realtime_task_policy()
  (Sebastian)
* Add a new patch to convert return type to bool (Steve)
* Rebase on tip/sched/core and handle a conflict with code shuffle to
  syscalls.c
* Add Reviewed-by Steve

Changes since v1:

* Use realtime_task_policy() instead task_has_realtime_policy() (Peter)
* Improve commit message readability about replace some rt_task()
  users.

v1 discussion: 
https://lore.kernel.org/lkml/20240514234112.792989-1-qyou...@layalina.io/
v2 discussion: 
https://lore.kernel.org/lkml/20240515220536.823145-1-qyou...@layalina.io/
v3 discussion: 
https://lore.kernel.org/lkml/20240527234508.1062360-1-qyou...@layalina.io/
v4 discussion: 
https://lore.kernel.org/lkml/20240601213309.1262206-1-qyou...@layalina.io/
v5 discussion: 
https://lore.kernel.org/lkml/20240604144228.1356121-1-qyou...@layalina.io/

Qais Yousef (3):
  sched/rt: Clean up usage of rt_task()
  sched/rt, dl: Convert functions to return bool
  sched/rt: Rename realtime_{prio, task}() to rt_or_dl_{prio, task}()

 fs/bcachefs/six.c |  2 +-
 fs/select.c   |  2 +-
 include/linux/ioprio.h|  2 +-
 include/linux/sched/deadline.h| 14 ++---
 include/linux/sched/prio.h|  1 +
 include/linux/sched/rt.h  | 33 +--
 kernel/locking/rtmutex.c  |  4 ++--
 kernel/locking/rwsem.c|  4 ++--
 kernel/locking/ww_mutex.h |  2 +-
 kernel/sched/core.c   |  4 ++--
 kernel/sched/syscalls.c   |  2 +-
 kernel/time/hrtimer.c |  6 +++---
 kernel/trace/trace_sched_wakeup.c |  2 +-
 mm/page-writeback.c   |  4 ++--
 mm/page_alloc.c   |  2 +-
 15 files changed, 53 insertions(+), 31 deletions(-)

-- 
2.34.1

[PATCH v6 2/3] sched/rt, dl: Convert functions to return bool

2024-06-10 Thread Qais Yousef

{rt, realtime, dl}_{task, prio}() functions' return value is actually
a bool. Convert their return type to reflect that.

Suggested-by: Steven Rostedt (Google) 
Reviewed-by: Sebastian Andrzej Siewior 
Reviewed-by: Steven Rostedt (Google) 
Reviewed-by: Metin Kaya 
Signed-off-by: Qais Yousef 
---
 include/linux/sched/deadline.h |  8 +++-
 include/linux/sched/rt.h   | 16 ++--
 2 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
index 5cb88b748ad6..3a912ab42bb5 100644
--- a/include/linux/sched/deadline.h
+++ b/include/linux/sched/deadline.h
@@ -10,18 +10,16 @@
 
 #include 
 
-static inline int dl_prio(int prio)
+static inline bool dl_prio(int prio)
 {
-   if (unlikely(prio < MAX_DL_PRIO))
-   return 1;
-   return 0;
+   return unlikely(prio < MAX_DL_PRIO);
 }
 
 /*
  * Returns true if a task has a priority that belongs to DL class. PI-boosted
  * tasks will return true. Use dl_policy() to ignore PI-boosted tasks.
  */
-static inline int dl_task(struct task_struct *p)
+static inline bool dl_task(struct task_struct *p)
 {
return dl_prio(p->prio);
 }
diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
index a055dd68a77c..91ef1ef2019f 100644
--- a/include/linux/sched/rt.h
+++ b/include/linux/sched/rt.h
@@ -6,25 +6,21 @@
 
 struct task_struct;
 
-static inline int rt_prio(int prio)
+static inline bool rt_prio(int prio)
 {
-   if (unlikely(prio < MAX_RT_PRIO && prio >= MAX_DL_PRIO))
-   return 1;
-   return 0;
+   return unlikely(prio < MAX_RT_PRIO && prio >= MAX_DL_PRIO);
 }
 
-static inline int realtime_prio(int prio)
+static inline bool realtime_prio(int prio)
 {
-   if (unlikely(prio < MAX_RT_PRIO))
-   return 1;
-   return 0;
+   return unlikely(prio < MAX_RT_PRIO);
 }
 
 /*
  * Returns true if a task has a priority that belongs to RT class. PI-boosted
  * tasks will return true. Use rt_policy() to ignore PI-boosted tasks.
  */
-static inline int rt_task(struct task_struct *p)
+static inline bool rt_task(struct task_struct *p)
 {
return rt_prio(p->prio);
 }
@@ -34,7 +30,7 @@ static inline int rt_task(struct task_struct *p)
  * PI-boosted tasks will return true. Use realtime_task_policy() to ignore
  * PI-boosted tasks.
  */
-static inline int realtime_task(struct task_struct *p)
+static inline bool realtime_task(struct task_struct *p)
 {
return realtime_prio(p->prio);
 }
-- 
2.34.1

[PATCH v6 3/3] sched/rt: Rename realtime_{prio, task}() to rt_or_dl_{prio, task}()

2024-06-10 Thread Qais Yousef

Some find the name realtime overloaded. Use rt_or_dl() as an
alternative, hopefully better, name.

Suggested-by: Daniel Bristot de Oliveira 
Signed-off-by: Qais Yousef 
---
 fs/bcachefs/six.c |  2 +-
 fs/select.c   |  2 +-
 include/linux/ioprio.h|  2 +-
 include/linux/sched/rt.h  | 10 +-
 kernel/locking/rtmutex.c  |  4 ++--
 kernel/locking/rwsem.c|  4 ++--
 kernel/locking/ww_mutex.h |  2 +-
 kernel/sched/core.c   |  4 ++--
 kernel/sched/syscalls.c   |  2 +-
 kernel/time/hrtimer.c |  6 +++---
 kernel/trace/trace_sched_wakeup.c |  2 +-
 mm/page-writeback.c   |  4 ++--
 mm/page_alloc.c   |  2 +-
 13 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/fs/bcachefs/six.c b/fs/bcachefs/six.c
index b30870bf7e4a..9cbd3c14c94f 100644
--- a/fs/bcachefs/six.c
+++ b/fs/bcachefs/six.c
@@ -335,7 +335,7 @@ static inline bool six_owner_running(struct six_lock *lock)
 */
rcu_read_lock();
struct task_struct *owner = READ_ONCE(lock->owner);
-   bool ret = owner ? owner_on_cpu(owner) : !realtime_task(current);
+   bool ret = owner ? owner_on_cpu(owner) : !rt_or_dl_task(current);
rcu_read_unlock();
 
return ret;
diff --git a/fs/select.c b/fs/select.c
index 8d5c1419416c..73fce145eb72 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -82,7 +82,7 @@ u64 select_estimate_accuracy(struct timespec64 *tv)
 * Realtime tasks get a slack of 0 for obvious reasons.
 */
 
-   if (realtime_task(current))
+   if (rt_or_dl_task(current))
return 0;
 
ktime_get_ts64(&now);
diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index 75859b78d540..b25377b6ea98 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -40,7 +40,7 @@ static inline int task_nice_ioclass(struct task_struct *task)
 {
if (task->policy == SCHED_IDLE)
return IOPRIO_CLASS_IDLE;
-   else if (realtime_task_policy(task))
+   else if (rt_or_dl_task_policy(task))
return IOPRIO_CLASS_RT;
else
return IOPRIO_CLASS_BE;
diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
index 91ef1ef2019f..4e3338103654 100644
--- a/include/linux/sched/rt.h
+++ b/include/linux/sched/rt.h
@@ -11,7 +11,7 @@ static inline bool rt_prio(int prio)
return unlikely(prio < MAX_RT_PRIO && prio >= MAX_DL_PRIO);
 }
 
-static inline bool realtime_prio(int prio)
+static inline bool rt_or_dl_prio(int prio)
 {
return unlikely(prio < MAX_RT_PRIO);
 }
@@ -27,19 +27,19 @@ static inline bool rt_task(struct task_struct *p)
 
 /*
  * Returns true if a task has a priority that belongs to RT or DL classes.
- * PI-boosted tasks will return true. Use realtime_task_policy() to ignore
+ * PI-boosted tasks will return true. Use rt_or_dl_task_policy() to ignore
  * PI-boosted tasks.
  */
-static inline bool realtime_task(struct task_struct *p)
+static inline bool rt_or_dl_task(struct task_struct *p)
 {
-   return realtime_prio(p->prio);
+   return rt_or_dl_prio(p->prio);
 }
 
 /*
  * Returns true if a task has a policy that belongs to RT or DL classes.
  * PI-boosted tasks will return false.
  */
-static inline bool realtime_task_policy(struct task_struct *tsk)
+static inline bool rt_or_dl_task_policy(struct task_struct *tsk)
 {
int policy = tsk->policy;
 
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 55c9dab37f33..c2a530d704b4 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -347,7 +347,7 @@ static __always_inline int __waiter_prio(struct task_struct 
*task)
 {
int prio = task->prio;
 
-   if (!realtime_prio(prio))
+   if (!rt_or_dl_prio(prio))
return DEFAULT_PRIO;
 
return prio;
@@ -435,7 +435,7 @@ static inline bool rt_mutex_steal(struct rt_mutex_waiter 
*waiter,
 * Note that RT tasks are excluded from same priority (lateral)
 * steals to prevent the introduction of an unbounded latency.
 */
-   if (realtime_prio(waiter->tree.prio))
+   if (rt_or_dl_prio(waiter->tree.prio))
return false;
 
return rt_waiter_node_equal(&waiter->tree, &top_waiter->tree);
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index ad8d4438bc91..dcbd7b45359a 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -631,7 +631,7 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore 
*sem,
 * if it is an RT task or wait in the wait queue
 * for too long.
 */
-   if (has_handoff || (!realtime_task(waiter->task) &&
+   if (has_handoff || (!rt_or_dl_task(waiter->task) &&

Re: [PATCH v5 1/2] sched/rt: Clean up usage of rt_task()

2024-06-10 Thread Qais Yousef

On 06/05/24 16:07, Daniel Bristot de Oliveira wrote:
> On 6/5/24 15:24, Qais Yousef wrote:
> >>> But rt is a shortened version of realtime, and so it is making *it less*
> >>> clear that we also have DL here.
> >> Can SCHED_DL be considered a real-time scheduling class as in opposite
> >> to SCHED_BATCH for instance? Due to its requirements it fits for a real
> >> time scheduling class, right?
> >> And RT (as in real time) already includes SCHED_RR and SCHED_FIFO.
> > Yeah I think the usage of realtime to cover both makes sense. I followed 
> > your
> > precedence with task_is_realtime().
> > 
> > Anyway. If people really find this confusing, what would make sense is to 
> > split
> > them and ask users to call rt_task() and dl_task() explicitly without this
> > wrapper. I personally like it better with the wrapper. But happy to follow 
> > the
> > crowd.
> 
> For me, doing dl_ things it is better to keep them separate, so I can
> easily search for dl_ specific checks.
> 
> rt_or_dl_task(p);

I posted a new version with this suggestion as the top patch so that it can be
shredded more :-)

Thanks for having a look.


Cheers

--
Qais Yousef

Re: [PATCH v6 0/3] Clean up usage of rt_task()

2024-07-24 Thread Qais Yousef

On 06/10/24 20:20, Qais Yousef wrote:
> Make rt_task() return true only for RT class and add new realtime_task() to
> return true for RT and DL classes to avoid some confusion the old API can
> cause.

I am not aware of any pending review comments for this series. Is it ready to
be picked up?


Thanks!

--
Qais Yousef

> 
> No functional changes intended in patch 1. Patch 2 cleans up the return type 
> as
> suggested by Steve. Patch 3 uses rt_or_dl() instead of 'realtime' as suggested
> by Daniel. As the name was debatable, I'll leave up to the maintainers to pick
> their preference.
> 
> Changes since v5:
> 
>   * Added a new patch to s/realtime/rt_or_dl/ as suggested by Daniel.
>   * Added Reviewed-bys.
> 
> Changes since v4:
> 
>   * Simplify return of rt/realtime_prio() as the explicit true/false was
> not necessary.
> 
> Changes since v3:
> 
>   * Make sure the 'new' bool functions return true/false instead of 1/0.
>   * Drop patch 2 about hrtimer usage of realtime_task() as ongoing
> discussion on v1 indicates its scope outside of this simple cleanup.
> 
> Changes since v2:
> 
>   * Fix one user that should use realtime_task() but remained using
> rt_task() (Sebastian)
>   * New patch to convert all hrtimer users to use realtime_task_policy()
> (Sebastian)
>   * Add a new patch to convert return type to bool (Steve)
>   * Rebase on tip/sched/core and handle a conflict with code shuffle to
> syscalls.c
>   * Add Reviewed-by Steve
> 
> Changes since v1:
> 
>   * Use realtime_task_policy() instead task_has_realtime_policy() (Peter)
>   * Improve commit message readability about replace some rt_task()
> users.
> 
> v1 discussion: 
> https://lore.kernel.org/lkml/20240514234112.792989-1-qyou...@layalina.io/
> v2 discussion: 
> https://lore.kernel.org/lkml/20240515220536.823145-1-qyou...@layalina.io/
> v3 discussion: 
> https://lore.kernel.org/lkml/20240527234508.1062360-1-qyou...@layalina.io/
> v4 discussion: 
> https://lore.kernel.org/lkml/20240601213309.1262206-1-qyou...@layalina.io/
> v5 discussion: 
> https://lore.kernel.org/lkml/20240604144228.1356121-1-qyou...@layalina.io/
> 
> Qais Yousef (3):
>   sched/rt: Clean up usage of rt_task()
>   sched/rt, dl: Convert functions to return bool
>   sched/rt: Rename realtime_{prio, task}() to rt_or_dl_{prio, task}()
> 
>  fs/bcachefs/six.c |  2 +-
>  fs/select.c   |  2 +-
>  include/linux/ioprio.h|  2 +-
>  include/linux/sched/deadline.h| 14 ++---
>  include/linux/sched/prio.h|  1 +
>  include/linux/sched/rt.h  | 33 +--
>  kernel/locking/rtmutex.c  |  4 ++--
>  kernel/locking/rwsem.c|  4 ++--
>  kernel/locking/ww_mutex.h |  2 +-
>  kernel/sched/core.c   |  4 ++--
>  kernel/sched/syscalls.c   |  2 +-
>  kernel/time/hrtimer.c |  6 +++---
>  kernel/trace/trace_sched_wakeup.c |  2 +-
>  mm/page-writeback.c   |  4 ++--
>  mm/page_alloc.c   |  2 +-
>  15 files changed, 53 insertions(+), 31 deletions(-)
> 
> -- 
> 2.34.1
>

Re: [PATCH] docs: reporting-issues.rst: explain how to decode stack traces

2021-02-14 Thread Qais Yousef

thors of this document (see intro), as they might be able to write
> -   something then.
> +When the kernel detects an internal problem, it will log some information 
> about
> +the executed code. This makes it possible to pinpoint the exact line in the
> +source code that triggered the issue and shows how it was called. But that 
> only
> +works if you enabled CONFIG_DEBUG_INFO and CONFIG_KALLSYMS when configuring
> +your kernel. If you did so, consider to decode the information from the
> +kernel's log. That will make it a lot easier to understand what lead to the
> +'panic', 'oops', or 'warning', which increases the chances enormously that
> +someone can provide a fix.

I suggest removing the word enormously. It helps, but it all depends on the
particular circumstances. Sometimes it does, others it doesn't.

>  
> -   This section in the end should answer questions like "when is this 
> actually
> -   needed", "what .config options to ideally set earlier to make this step 
> easy
> -   or unnecessary?" (likely CONFIG_UNWINDER_ORC when it's available, 
> otherwise
> -   CONFIG_UNWINDER_FRAME_POINTER; but is there anything else needed?).
> +Decoding can be done with a script you find in the Linux source tree. If you
> +are running a kernel you compiled yourself earlier, call it like this::
>  
> -..
> +   [user@something ~]$ sudo dmesg | 
> ./linux-5.10.5/scripts/decode_stacktrace.sh ./linux-5.10.5/vmlinux
> +
> +If you are running a packaged vanilla kernel, you will likely have to install
> +the corresponding packages with debug symbols. Then call the script (which 
> you
> +might need to get from the Linux sources if your distro does not package it)
> +like this::
> +
> +   [user@something ~]$ sudo dmesg | 
> ./linux-5.10.5/scripts/decode_stacktrace.sh \
> +/usr/lib/debug/lib/modules/5.10.10-4.1.x86_64/vmlinux 
> /usr/src/kernels/5.10.10-4.1.x86_64/
> +
> +The script will work on log lines like the following, which show the address 
> of
> +the code the kernel was executing when the error occurred::
> +
> +   [   68.387301] RIP: 0010:test_module_init+0x5/0xffa [test_module]
> +
> +Once decoded, these lines will look like this::
> +
> +   [   68.387301] RIP: 0010:test_module_init 
> (/home/username/linux-5.10.5/test-module/test-module.c:16) test_module
> +
> +In this case the executed code was built from the file
> +'~/linux-5.10.5/test-module/test-module.c' and the error occurred by the
> +instructions found in line '16'.
>  
> -*If the failure includes a stack dump, like an Oops does, consider 
> decoding
> -it to find the offending line of code.*
> +The script will similarly decode the addresses mentioned in the section
> +starting with 'Call trace', which show the path to the function where the
> +problem occurred. Additionally, the script will show the assembler output for
> +the code section the kernel was executing.
>  
> -When the kernel detects an error, it will print a stack dump that allows to
> -identify the exact line of code where the issue happens. But that information
> -sometimes needs to get decoded to be readable, which is explained in
> -admin-guide/bug-hunting.rst.
> +Note, if you can't get this to work, simply skip this step and mention the
> +reason for it in the report. If you're lucky, it might not be needed. And if 
> it
> +is, someone might help you to get things going. Also be aware this is just 
> one
> +of several ways to decode kernel stack traces. Sometimes different steps will
> +be required to retrieve the relevant details. Don't worry about that, if 
> that's
> +needed in your case, developers will tell you what to do.

Ah you already clarify nicely here this is a good-to-have rather than
a must-have as I was trying to elude to above :-)

This looks good to me in general. With the above minor nits fixed, feel free to
add my

Reviewed-by: Qais Yousef 

Thanks!

--
Qais Yousef

>  
>  
>  Special care for regressions
> -- 
> 2.29.2
>

Re: [PATCH] docs: reporting-issues.rst: explain how to decode stack traces

2021-02-15 Thread Qais Yousef

Hi Thorsten

On 02/15/21 06:55, Thorsten Leemhuis wrote:
> Hi! Many thx for looking into this, much appreciated!
> 
> Am 14.02.21 um 17:00 schrieb Qais Yousef:
> > On 02/10/21 06:48, Thorsten Leemhuis wrote:
> >
> >> - * If the failure includes a stack dump, like an Oops does, consider 
> >> decoding
> >> -   it to find the offending line of code.
> >> + * If your failure involves a 'panic', 'oops', or 'warning', consider 
> >> decoding
> > or 'BUG'? There are similar other places below that could benefit from this
> > addition too.
> 
> Good point. In fact there are other places in the document where this is
> needed as well. Will address those in another patch.
> 
> >> +   the kernel log to find the line of code that trigger the error.
> >>  
> >>   * If your problem is a regression, try to narrow down when the issue was
> >> introduced as much as possible.
> >> @@ -869,6 +869,15 @@ pick up the configuration of your current kernel and 
> >> then tries to adjust it
> >>  somewhat for your system. That does not make the resulting kernel any 
> >> better,
> >>  but quicker to compile.
> >>  
> >> +Note: If you are dealing with a kernel panic, oops, or warning, please 
> >> make
> >> +sure to enable CONFIG_KALLSYMS when configuring your kernel. Additionally,
> > 
> > s/make sure/try/
> 
> I did that, but ignored...
> 
> > s/kernel./kernel if you can./
> 
> ...this. Yes, you have a point with...
> 
> > Less demanding wording in case the user doesn't have the capability to 
> > rebuild
> > or deploy such a kernel where the problem happens. Maybe you can tweak it 
> > more
> > if you like too :-)
> 
> ...that, but that section in the document is about building your own
> kernel, so I'd say we don't have to be that careful here.

Cool. Works for me.

Thanks

--
Qais Yousef

Re: [PATCH v2] cpu/hotplug: wait for cpuset_hotplug_work to finish on cpu onlining

2021-02-16 Thread Qais Yousef

On 02/12/21 00:30, Alexey Klimov wrote:
> When a CPU offlined and onlined via device_offline() and device_online()
> the userspace gets uevent notification. If, after receiving "online" uevent,
> userspace executes sched_setaffinity() on some task trying to move it
> to a recently onlined CPU, then it often fails with -EINVAL. Userspace needs
> to wait around 5..30 ms before sched_setaffinity() will succeed for recently
> onlined CPU after receiving uevent.
> 
> If in_mask argument for sched_setaffinity() has only recently onlined CPU,
> it often fails with such flow:
> 
>   sched_setaffinity()
> cpuset_cpus_allowed()
>   guarantee_online_cpus()   <-- cs->effective_cpus mask does not
> contain recently onlined cpu
> cpumask_and()   <-- final new_mask is empty
> __set_cpus_allowed_ptr()
>   cpumask_any_and_distribute() <-- returns dest_cpu equal to nr_cpu_ids
>   returns -EINVAL
> 
> Cpusets used in guarantee_online_cpus() are updated using workqueue from
> cpuset_update_active_cpus() which in its turn is called from cpu hotplug 
> callback
> sched_cpu_activate() hence it may not be observable by sched_setaffinity() if
> it is called immediately after uevent.

nit: newline

> Out of line uevent can be avoided if we will ensure that cpuset_hotplug_work
> has run to completion using cpuset_wait_for_hotplug() after onlining the
> cpu in cpu_device_up() and in cpuhp_smt_enable().
> 
> Co-analyzed-by: Joshua Baker 
> Signed-off-by: Alexey Klimov 
> ---

This looks good to me.

Reviewed-by: Qais Yousef 

Thanks

--
Qais Yousef

Re: [PATCH] ALSA: ASoC: soc-compress.c: fix NULL dereference

2015-01-02 Thread Qais Yousef


On 12/29/2014 04:13 PM, Mark Brown wrote:

On Tue, Dec 23, 2014 at 09:09:27AM +, Qais Yousef wrote:

In soc_new_compress() when rtd->dai_link->daynmic is set, we create the pcm
substreams with this call:

ret = snd_pcm_new_internal(rtd->card->snd_card, new_name, num,
1, 0, &be_pcm);

which passes 0 as capture_count leading to

be_pcm->streams[SNDRV_PCM_STREAM_CAPTURE].substream

being NULL, hence when trying to set rtd a few lines below we get an oops.
Fix by removing this line of code since CAPTURE substream will always be NULL.

Why will the capture stream always be NULL?  There should be no
intrinsic reason why we can't have hardware support for capturing
compressed audio.


I think because we pass 0 as capture_count in snd_pcm_new_internal(). If 
I read the code correctly this will lead to _snd_pcm_new() to be called 
which in return will call snd_pcm_new_stream(pcm, 
SNDRV_PCM_STREAM_CAPTURE, capture_count) which will cause no substream 
to be allocated for the capture case, hence being NULL. I get an oops in 
my experimental driver when I set dynamic = 1 in FE dai link. If I did 
something wrong there that caused this, it's not obvious to me how.


Maybe a better fix would be to replace the 1 and 0 in 
snd_pcm_new_internal() call with rtd->dai_link->dpcm_playback and 
rtd->dai_link->dpcm_capture.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] MIPS: Add full ISA emulator.

2014-12-04 Thread Qais Yousef

By all means I don't really understand the whole issues surrounding this 
but this approach looks better to me as well. It seems more generic and 
future proof and at least I can understand the patch series.


But did I say I don't understand all of this? Would be nice to hear from 
more people :)


Qais

On 12/04/2014 10:16 AM, Paul Burton wrote:

Nice work David, I like this approach. It's so much simpler than hacking
atop the current dsemul code. I also imagine this could be reused for
emulation of instructions removed in r6, when running pre-r6 userland
binaries on r6 systems.

On Wed, Dec 03, 2014 at 06:21:36PM -0800, David Daney wrote:

On 12/03/2014 05:56 PM, Leonid Yegoshin wrote:

I see only two technical issues here which differs:

1.  You believe your GCC experts, I trust HW Architecture manual and
don't trust toolchain people too much ==> we see a different value in
fact that your approach has a subset of emulated ISAs (and it can't, of
course, emulate anything because some custom opcodes are reused).

Yes, I agree that the emulation approach cannot handle some of the cases you
mention (most would have to be the result of hand coded assembly
specifically trying to break it).

I'm not sure I'd agree even with that - ASEs & vendor-specific
instructions could easily be added if necessary.

On Thu, Dec 04, 2014 at 05:56:51PM -0800, Leonid Yehoshin wrote:

2.  My approach is ready to use and is used right now, you still have a
framework which passed an initial boot.

Subjective.

Thanks,
 Paul


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ALSA: ASoC: soc-compress.c: fix NULL dereference

2014-12-23 Thread Qais Yousef

In soc_new_compress() when rtd->dai_link->daynmic is set, we create the pcm
substreams with this call:

   ret = snd_pcm_new_internal(rtd->card->snd_card, new_name, num,
   1, 0, &be_pcm);

which passes 0 as capture_count leading to

   be_pcm->streams[SNDRV_PCM_STREAM_CAPTURE].substream

being NULL, hence when trying to set rtd a few lines below we get an oops.
Fix by removing this line of code since CAPTURE substream will always be NULL.

Signed-off-by: Qais Yousef 
Cc: Vinod Koul 
Cc: Liam Girdwood 
Cc: Mark Brown 
Cc: Jaroslav Kysela 
Cc: Takashi Iwai 
Cc: linux-kernel@vger.kernel.org
---
Not sure if this is the correct fix but that's what I could come up with my
limited knowledge.

I think the more correct solution would be to use the value of
rtd->dai_link->dpcm_playback and rtd->dai_link->dpcm_capture in the args of
snd_pcm_new_internal() for playback_count and capture_count.

 sound/soc/soc-compress.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/sound/soc/soc-compress.c b/sound/soc/soc-compress.c
index 590a82f01d0b..7ab39f65384c 100644
--- a/sound/soc/soc-compress.c
+++ b/sound/soc/soc-compress.c
@@ -669,7 +669,6 @@ int soc_new_compress(struct snd_soc_pcm_runtime *rtd, int 
num)
rtd->pcm = be_pcm;
rtd->fe_compr = 1;

be_pcm->streams[SNDRV_PCM_STREAM_PLAYBACK].substream->private_data = rtd;
-   
be_pcm->streams[SNDRV_PCM_STREAM_CAPTURE].substream->private_data = rtd;
memcpy(compr->ops, &soc_compr_dyn_ops, 
sizeof(soc_compr_dyn_ops));
} else
memcpy(compr->ops, &soc_compr_ops, sizeof(soc_compr_ops));
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] ALSA: ASoC: soc-compress.c: fix NULL dereference

2015-01-13 Thread Qais Yousef


On 01/13/2015 02:59 PM, Vinod Koul wrote:

On Tue, Jan 13, 2015 at 11:18:53AM +, Qais Yousef wrote:

In soc_new_compress() when rtd->dai_link->daynmic is set, we create the pcm

   
typo

substreams with this call:

ret = snd_pcm_new_internal(rtd->card->snd_card, new_name, num,
1, 0, &be_pcm);

which passes 0 as capture_count leading to

be_pcm->streams[SNDRV_PCM_STREAM_CAPTURE].substream

being NULL, hence when trying to set rtd a few lines below we get an oops.

It is a good practice to add the oops here


Will this really be helpful? I think it'll be more clutter (the 
backtrace on metag arch is not great):


Oops: err 8007 (Unknown fault) addr 0008 [#1]
 Modules linked in:
 CPU: 0 PID: 6 Comm: kworker/u2:0 Not tainted 3.18.0-rc4+ #1904
 Workqueue: deferwq _deferred_probe_work_func
 task: 4f030780 ti: 4f044000 task.ti: 4f044000
  pt_regs @ 4f044388
  SaveMask = 0x4041
  Flags = 0x0008 (Znoc)
  TXRPT = 0x
  PC = 0x402e6c58
  A0StP = 0x4f044388 A1GbP = 0x60001000
  A0FrP = 0x4f044110 A1LbP = 0x4048
  A0.2  = 0x A1.2  = 0x
  A0.3  = 0x4009 A1.3  = 0x0001
  D0Re0 = 0x D1Re0 = 0x0001
  D0Ar6 = 0x D1Ar5 = 0x4b5c1a00
  D0Ar4 = 0x4f044330 D1Ar3 = 0x405833a8
  D0Ar2 = 0x4f1d7170 D1Ar1 = 0x4b5c25a0
  D0FrT = 0x0001 D1RtP = 0x402e6c20
  D0.5  = 0x D1.5  = 0x4f1f65c4
  D0.6  = 0x4f1f65c4 D1.6  = 0x4f1d0500
  D0.7  = 0x0001 D1.7  = 0x4f1e3e40

 Call trace:
 [<40410004>] _ieee80211_change_bss+0x1b4/0x220
 [<400f8034>] _kernfs_add_one+0x10c/0x17c
 [<400fa2b0>] ___kernfs_create_file+0x94/0xdc
 [<402d8bf0>] _snd_soc_register_card+0x12b8/0x1380
 [<400170e8>] ___request_region+0x58/0x150
 [<402068b4>] _devres_add+0x14/0x2c
 [<402e97f0>] _zero1xx_probe+0x2b8/0x37c
 [<40205004>] _platform_drv_probe+0x4c/0xc0
 [<40204fb4>] _platform_drv_remove+0x3c/0x40
 [<402032e0>] _driver_probe_device+0xc8/0x294
 [<40204fb4>] _platform_drv_remove+0x3c/0x40
 [<40203624>] _wait_for_device_probe+0x7c/0x80
 [<40201cac>] _bus_for_each_drv+0x5c/0xb0
 [<40203708>] _device_attach+0x84/0x9c
 [<40202258>] _bus_probe_device+0x90/0xd0
 [<4020354c>] _deferred_probe_work_func+0x70/0xac
 [<40025d84>] _process_one_work+0x110/0x364
 [<402034d8>] _device_bind_driver+0x2c/0x30
 [<40046150>] _mod_timer+0xc4/0x178
 [<400286a4>] _worker_thread+0x14c/0x4d4
 [<4002b90c>] _kthread_parkme+0x14/0x18
 [<40028554>] _pool_mayday_timeout+0xe8/0xec
 [<4002ba08>] _kthread+0xf8/0x100
 [<4000aeb4>] _ret_from_fork+0x44/0x110
 [<4002b90c>] _kthread_parkme+0x14/0x18
 [<4044>] _text+0x44/0x48
 [<4044>] _text+0x44/0x48
 [<4002b90c>] _kthread_parkme+0x14/0x18

 Process: kworker/u2:0 (pid: 6, stack limit = 4f046000)
 ---[ end trace fabdbb359f5c60d8 ]---




Fix by using rtd->dai_link->dpcm_playback and rtd->dai_link->dpcm_capture as
playback_count and capture_count to snd_pcm_new_internal().

Signed-off-by: Qais Yousef 
Cc: Vinod Koul 
Cc: Liam Girdwood 
Cc: Mark Brown 
Cc: Jaroslav Kysela 
Cc: Takashi Iwai 
Cc: linux-kernel@vger.kernel.org
---
v2->v1:
- use better way to fix it than just removing the line that caused the oops

  sound/soc/soc-compress.c | 9 ++---
  1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/sound/soc/soc-compress.c b/sound/soc/soc-compress.c
index 590a82f01d0b..27a668463ad7 100644
--- a/sound/soc/soc-compress.c
+++ b/sound/soc/soc-compress.c
@@ -659,7 +659,8 @@ int soc_new_compress(struct snd_soc_pcm_runtime *rtd, int 
num)
rtd->dai_link->stream_name);
  
  		ret = snd_pcm_new_internal(rtd->card->snd_card, new_name, num,

-   1, 0, &be_pcm);
+   rtd->dai_link->dpcm_playback,
+   rtd->dai_link->dpcm_capture, &be_pcm);
if (ret < 0) {
dev_err(rtd->card->dev, "ASoC: can't create compressed for 
%s\n",
rtd->dai_link->name);
@@ -668,8 +669,10 @@ int soc_new_compress(struct snd_soc_pcm_runtime *rtd, int 
num)
  
  		rtd->pcm = be_pcm;

rtd->fe_compr = 1;
-   
be_pcm->streams[SNDRV_PCM_STREAM_PLAYBACK].substream->private_data = rtd;
-   
be_pcm->streams[SNDRV_PCM_STREAM_CAPTURE].substream->private_data = rtd;
+   if (rtd->dai_link->dpcm_playback)
+   
be_pcm->streams[SNDRV_PCM_STREAM_PLAYBACK].substream->private_data = rtd;
+   if (rtd->dai_link->dpcm_capture)

this should be else if, as for compressed device we can have playback or
capture not both


+

Re: [PATCH v2] ALSA: ASoC: soc-compress.c: fix NULL dereference

2015-01-13 Thread Qais Yousef


On 01/13/2015 04:20 PM, Mark Brown wrote:

On Tue, Jan 13, 2015 at 03:16:10PM +, Qais Yousef wrote:

On 01/13/2015 02:59 PM, Vinod Koul wrote:

being NULL, hence when trying to set rtd a few lines below we get an oops.

It is a good practice to add the oops here

Will this really be helpful? I think it'll be more clutter (the backtrace on
metag arch is not great):

It's better in general to leave it out unless it's adding something (for
example sometimes the particular call path is important) and even there
edit it down to relevant details - the splat from the full oops normally
overwhelms the commit message.


I think the commit message explains what's going. So unless Vinod 
insists I'll send v3 with the other 2 requested fixes.


Thanks for the review!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] ALSA: ASoC: soc-compress.c: fix NULL dereference

2015-01-14 Thread Qais Yousef


On 01/13/2015 05:21 PM, James Hogan wrote:

On 13 January 2015 15:16:10 GMT+00:00, Qais Yousef  
wrote:


Will this really be helpful? I think it'll be more clutter (the
backtrace on metag arch is not great):

I suspect you don't have frame pointers enabled in your kernel config. That 
should improve the meaningfulness of the backtrace.




I forgot about this option. Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3] ALSA: ASoC: soc-compress.c: fix NULL dereference

2015-01-14 Thread Qais Yousef

In soc_new_compress() when rtd->dai_link->daynamic is set, we create the pcm
substreams with this call:

   ret = snd_pcm_new_internal(rtd->card->snd_card, new_name, num,
   1, 0, &be_pcm);

which passes 0 as capture_count leading to

   be_pcm->streams[SNDRV_PCM_STREAM_CAPTURE].substream

being NULL, hence when trying to set rtd a few lines below we get an oops.

Fix by using rtd->dai_link->dpcm_playback and rtd->dai_link->dpcm_capture as
playback_count and capture_count to snd_pcm_new_internal().

Signed-off-by: Qais Yousef 
Cc: Vinod Koul 
Cc: Liam Girdwood 
Cc: Mark Brown 
Cc: Jaroslav Kysela 
Cc: Takashi Iwai 
Cc: linux-kernel@vger.kernel.org
---
v3->v2:
  - fix a typo s/dynmic/dynamic/ in commit message
  - use if else if because either dpcm_playback or dpcm_capture should be on
for compressed devices

v2->v1:
   - use better way to fix it than just removing the line that caused the oops

 sound/soc/soc-compress.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/sound/soc/soc-compress.c b/sound/soc/soc-compress.c
index 590a82f01d0b..025c38fbe3c0 100644
--- a/sound/soc/soc-compress.c
+++ b/sound/soc/soc-compress.c
@@ -659,7 +659,8 @@ int soc_new_compress(struct snd_soc_pcm_runtime *rtd, int 
num)
rtd->dai_link->stream_name);
 
ret = snd_pcm_new_internal(rtd->card->snd_card, new_name, num,
-   1, 0, &be_pcm);
+   rtd->dai_link->dpcm_playback,
+   rtd->dai_link->dpcm_capture, &be_pcm);
if (ret < 0) {
dev_err(rtd->card->dev, "ASoC: can't create compressed 
for %s\n",
rtd->dai_link->name);
@@ -668,8 +669,10 @@ int soc_new_compress(struct snd_soc_pcm_runtime *rtd, int 
num)
 
rtd->pcm = be_pcm;
rtd->fe_compr = 1;
-   
be_pcm->streams[SNDRV_PCM_STREAM_PLAYBACK].substream->private_data = rtd;
-   
be_pcm->streams[SNDRV_PCM_STREAM_CAPTURE].substream->private_data = rtd;
+   if (rtd->dai_link->dpcm_playback)
+   
be_pcm->streams[SNDRV_PCM_STREAM_PLAYBACK].substream->private_data = rtd;
+   else if (rtd->dai_link->dpcm_capture)
+   
be_pcm->streams[SNDRV_PCM_STREAM_CAPTURE].substream->private_data = rtd;
memcpy(compr->ops, &soc_compr_dyn_ops, 
sizeof(soc_compr_dyn_ops));
} else
memcpy(compr->ops, &soc_compr_ops, sizeof(soc_compr_ops));
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4] ALSA: ASoC: soc-compress.c: fix NULL dereference

2015-01-14 Thread Qais Yousef

In soc_new_compress() when rtd->dai_link->dynamic is set, we create the pcm
substreams with this call:

   ret = snd_pcm_new_internal(rtd->card->snd_card, new_name, num,
   1, 0, &be_pcm);

which passes 0 as capture_count leading to

   be_pcm->streams[SNDRV_PCM_STREAM_CAPTURE].substream

being NULL, hence when trying to set rtd a few lines below we get an oops.

Fix by using rtd->dai_link->dpcm_playback and rtd->dai_link->dpcm_capture as
playback_count and capture_count to snd_pcm_new_internal().

Signed-off-by: Qais Yousef 
Cc: Vinod Koul 
Cc: Liam Girdwood 
Cc: Mark Brown 
Cc: Jaroslav Kysela 
Cc: Takashi Iwai 
Cc: linux-kernel@vger.kernel.org
---
v4->v3:
  - the typo fix went wrong, apologies for the noise

v3->v2:
  - fix a typo s/dynmic/dynamic/ in commit message
  - use if else if because either dpcm_playback or dpcm_capture should be on
for compressed devices

v2->v1:
   - use better way to fix it than just removing the line that caused the oops

 sound/soc/soc-compress.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/sound/soc/soc-compress.c b/sound/soc/soc-compress.c
index 590a82f01d0b..025c38fbe3c0 100644
--- a/sound/soc/soc-compress.c
+++ b/sound/soc/soc-compress.c
@@ -659,7 +659,8 @@ int soc_new_compress(struct snd_soc_pcm_runtime *rtd, int 
num)
rtd->dai_link->stream_name);
 
ret = snd_pcm_new_internal(rtd->card->snd_card, new_name, num,
-   1, 0, &be_pcm);
+   rtd->dai_link->dpcm_playback,
+   rtd->dai_link->dpcm_capture, &be_pcm);
if (ret < 0) {
dev_err(rtd->card->dev, "ASoC: can't create compressed 
for %s\n",
rtd->dai_link->name);
@@ -668,8 +669,10 @@ int soc_new_compress(struct snd_soc_pcm_runtime *rtd, int 
num)
 
rtd->pcm = be_pcm;
rtd->fe_compr = 1;
-   
be_pcm->streams[SNDRV_PCM_STREAM_PLAYBACK].substream->private_data = rtd;
-   
be_pcm->streams[SNDRV_PCM_STREAM_CAPTURE].substream->private_data = rtd;
+   if (rtd->dai_link->dpcm_playback)
+   
be_pcm->streams[SNDRV_PCM_STREAM_PLAYBACK].substream->private_data = rtd;
+   else if (rtd->dai_link->dpcm_capture)
+   
be_pcm->streams[SNDRV_PCM_STREAM_CAPTURE].substream->private_data = rtd;
memcpy(compr->ops, &soc_compr_dyn_ops, 
sizeof(soc_compr_dyn_ops));
} else
memcpy(compr->ops, &soc_compr_ops, sizeof(soc_compr_ops));
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 18/24] irqchip: mips-gic: Stop using per-platform mapping tables

2015-01-15 Thread Qais Yousef


On 01/15/2015 04:29 PM, James Hogan wrote:

On 15/01/15 11:59, James Hogan wrote:

Hi Andrew,

On 18/09/14 22:47, Andrew Bresticker wrote:

Now that the GIC properly uses IRQ domains, kill off the per-platform
routing tables that were used to make the GIC appear transparent.

This includes:
  - removing the mapping tables and the support for applying them,
  - moving GIC IPI support to the GIC driver,
  - properly routing the i8259 through the GIC on Malta, and
  - updating IRQ assignments on SEAD-3 when the GIC is present.

Platforms no longer will pass an interrupt mapping table to gic_init.
Instead, they will pass the CPU interrupt vector (2 - 7) that they
expect the GIC to route interrupts to.  Note that in EIC mode this
value is ignored and all GIC interrupts are routed to EIC vector 1.

Signed-off-by: Andrew Bresticker 
Acked-by: Jason Cooper 
Reviewed-by: Qais Yousef 
Tested-by: Qais Yousef 

This commit (18743d2781d01d34d132f952a2e16353ccb4c3de) appears to break
boot of interAptiv, dual core, dual vpe per core, on malta with
malta_defconfig.

It gets to here:
...
CPU1 revision is: 0001a120 (MIPS interAptiv (multi))
FPU revision is: 0173a000
Primary instruction cache 64kB, VIPT, 4-way, linesize 32 bytes.
Primary data cache 64kB, 4-way, PIPT, no aliases, linesize 32 bytes
MIPS secondary cache 1024kB, 8-way, linesize 32 bytes.
Synchronize counters for CPU 1: done.
Brought up 2 CPUs

and then appears to just hang. Passing nosmp works around it, allowing
it to get to userland.

Is that a problem you've already come across?

I'll keep debugging.

Right, it appears the CPU IRQ line that the GIC is using doesn't get
unmasked (STATUSF_IP2) when a new VPE is brought up, so only the first
CPU will actually get any interrupts after your patch (including the
rather critical IPIs), i.e. hacking it in vsmp_init_secondary() in
smp-mt.c allows it to boot.

Hmm, I'll have a think about what the most generic fix is, since
arbitrary stuff may or may not have registered handlers for the raw CPU
interrupts (timer, performance counter, gic etc)...

Cheers
James



Is this similar to the issue addressed by this (ff1e29ade4c6 MIPS: 
smp-cps: Enable all hardware interrupts on secondary CPUs)?


Qais
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] ALSA: ASoC: soc-compress.c: fix NULL dereference

2015-01-13 Thread Qais Yousef

In soc_new_compress() when rtd->dai_link->daynmic is set, we create the pcm
substreams with this call:

   ret = snd_pcm_new_internal(rtd->card->snd_card, new_name, num,
   1, 0, &be_pcm);

which passes 0 as capture_count leading to

   be_pcm->streams[SNDRV_PCM_STREAM_CAPTURE].substream

being NULL, hence when trying to set rtd a few lines below we get an oops.

Fix by using rtd->dai_link->dpcm_playback and rtd->dai_link->dpcm_capture as
playback_count and capture_count to snd_pcm_new_internal().

Signed-off-by: Qais Yousef 
Cc: Vinod Koul 
Cc: Liam Girdwood 
Cc: Mark Brown 
Cc: Jaroslav Kysela 
Cc: Takashi Iwai 
Cc: linux-kernel@vger.kernel.org
---
v2->v1:
   - use better way to fix it than just removing the line that caused the oops

 sound/soc/soc-compress.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/sound/soc/soc-compress.c b/sound/soc/soc-compress.c
index 590a82f01d0b..27a668463ad7 100644
--- a/sound/soc/soc-compress.c
+++ b/sound/soc/soc-compress.c
@@ -659,7 +659,8 @@ int soc_new_compress(struct snd_soc_pcm_runtime *rtd, int 
num)
rtd->dai_link->stream_name);
 
ret = snd_pcm_new_internal(rtd->card->snd_card, new_name, num,
-   1, 0, &be_pcm);
+   rtd->dai_link->dpcm_playback,
+   rtd->dai_link->dpcm_capture, &be_pcm);
if (ret < 0) {
dev_err(rtd->card->dev, "ASoC: can't create compressed 
for %s\n",
rtd->dai_link->name);
@@ -668,8 +669,10 @@ int soc_new_compress(struct snd_soc_pcm_runtime *rtd, int 
num)
 
rtd->pcm = be_pcm;
rtd->fe_compr = 1;
-   
be_pcm->streams[SNDRV_PCM_STREAM_PLAYBACK].substream->private_data = rtd;
-   
be_pcm->streams[SNDRV_PCM_STREAM_CAPTURE].substream->private_data = rtd;
+   if (rtd->dai_link->dpcm_playback)
+   
be_pcm->streams[SNDRV_PCM_STREAM_PLAYBACK].substream->private_data = rtd;
+   if (rtd->dai_link->dpcm_capture)
+   
be_pcm->streams[SNDRV_PCM_STREAM_CAPTURE].substream->private_data = rtd;
memcpy(compr->ops, &soc_compr_dyn_ops, 
sizeof(soc_compr_dyn_ops));
} else
memcpy(compr->ops, &soc_compr_ops, sizeof(soc_compr_ops));
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ALSA: compress_driver.h: include sound/core.h explicitly

2015-01-13 Thread Qais Yousef


Ping

On 12/19/2014 11:38 AM, Qais Yousef wrote:

Fixes the following compilation error:

include/sound/compress_driver.h: In function ‘snd_compr_drain_notify’:
include/sound/compress_driver.h:177:2: error: implicit declaration of 
function ‘snd_BUG_ON’ [-Werror=implicit-function-declaration]
  if (snd_BUG_ON(!stream))

snd_BUG_ON() is defined in sound/core.h but the file is not included explicitly,
so include it.

Signed-off-by: Qais Yousef 
Cc: Vinod Koul 
Cc: Jaroslav Kysela 
Cc: Takashi Iwai 
Cc: linux-kernel@vger.kernel.org

---
  include/sound/compress_driver.h | 1 +
  1 file changed, 1 insertion(+)

diff --git a/include/sound/compress_driver.h b/include/sound/compress_driver.h
index 396e8f73670a..1e2531058b7e 100644
--- a/include/sound/compress_driver.h
+++ b/include/sound/compress_driver.h
@@ -27,6 +27,7 @@
  
  #include 

  #include 
+#include 
  #include 
  #include 
  #include 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] MIPS: irq-mips-gic.c: handle pending interrupts once in __gic_irq_dispatch()

2015-01-19 Thread Qais Yousef

When an interrupt occurs __gic_irq_dispatch() continuously reads local and 
shared pending registers
until all is serviced before returning. The problem with that is that it could 
introduce
a long delay before returning if a piece of hardware keeps triggering while in 
one of these loops.

To ensure fairness and priority doesn't get skewed a lot, read local and shared 
pending registers once
to service each pending IRQ once.
If another interupt triggers while servicing the current ones, then we shall 
re-enter the handler after
we return.

Signed-off-by: Qais Yousef 
Cc: Thomas Gleixner 
Cc: Jason Cooper 
Cc: linux-kernel@vger.kernel.org
Cc: Andrew Bresticker 
---
 drivers/irqchip/irq-mips-gic.c | 44 +-
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/drivers/irqchip/irq-mips-gic.c b/drivers/irqchip/irq-mips-gic.c
index 2b0468e3df6a..f3f9873dfb68 100644
--- a/drivers/irqchip/irq-mips-gic.c
+++ b/drivers/irqchip/irq-mips-gic.c
@@ -234,9 +234,9 @@ int gic_get_c0_perfcount_int(void)
  GIC_LOCAL_TO_HWIRQ(GIC_LOCAL_INT_PERFCTR));
 }
 
-static unsigned int gic_get_int(void)
+static void gic_handle_shared_int(void)
 {
-   unsigned int i;
+   unsigned int i, intr, virq;
unsigned long *pcpu_mask;
unsigned long pending_reg, intrmask_reg;
DECLARE_BITMAP(pending, GIC_MAX_INTRS);
@@ -258,7 +258,16 @@ static unsigned int gic_get_int(void)
bitmap_and(pending, pending, intrmask, gic_shared_intrs);
bitmap_and(pending, pending, pcpu_mask, gic_shared_intrs);
 
-   return find_first_bit(pending, gic_shared_intrs);
+   intr = find_first_bit(pending, gic_shared_intrs);
+   while (intr != gic_shared_intrs) {
+   virq = irq_linear_revmap(gic_irq_domain,
+GIC_SHARED_TO_HWIRQ(intr));
+   do_IRQ(virq);
+
+   /* go to next pending bit */
+   bitmap_clear(pending, intr, 1);
+   intr = find_first_bit(pending, gic_shared_intrs);
+   }
 }
 
 static void gic_mask_irq(struct irq_data *d)
@@ -385,16 +394,26 @@ static struct irq_chip gic_edge_irq_controller = {
 #endif
 };
 
-static unsigned int gic_get_local_int(void)
+static void gic_handle_local_int(void)
 {
unsigned long pending, masked;
+   unsigned int intr, virq;
 
pending = gic_read(GIC_REG(VPE_LOCAL, GIC_VPE_PEND));
masked = gic_read(GIC_REG(VPE_LOCAL, GIC_VPE_MASK));
 
bitmap_and(&pending, &pending, &masked, GIC_NUM_LOCAL_INTRS);
 
-   return find_first_bit(&pending, GIC_NUM_LOCAL_INTRS);
+   intr = find_first_bit(&pending, GIC_NUM_LOCAL_INTRS);
+   while (intr != GIC_NUM_LOCAL_INTRS) {
+   virq = irq_linear_revmap(gic_irq_domain,
+GIC_LOCAL_TO_HWIRQ(intr));
+   do_IRQ(virq);
+
+   /* go to next pending bit */
+   bitmap_clear(&pending, intr, 1);
+   intr = find_first_bit(&pending, GIC_NUM_LOCAL_INTRS);
+   }
 }
 
 static void gic_mask_local_irq(struct irq_data *d)
@@ -453,19 +472,8 @@ static struct irq_chip gic_all_vpes_local_irq_controller = 
{
 
 static void __gic_irq_dispatch(void)
 {
-   unsigned int intr, virq;
-
-   while ((intr = gic_get_local_int()) != GIC_NUM_LOCAL_INTRS) {
-   virq = irq_linear_revmap(gic_irq_domain,
-GIC_LOCAL_TO_HWIRQ(intr));
-   do_IRQ(virq);
-   }
-
-   while ((intr = gic_get_int()) != gic_shared_intrs) {
-   virq = irq_linear_revmap(gic_irq_domain,
-GIC_SHARED_TO_HWIRQ(intr));
-   do_IRQ(virq);
-   }
+   gic_handle_local_int();
+   gic_handle_shared_int();
 }
 
 static void gic_irq_dispatch(unsigned int irq, struct irq_desc *desc)
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 1/2] Provide in-kernel headers for making it easy to extend the kernel

2019-02-28 Thread Qais Yousef

Hi Joel

On 02/27/19 14:37, Joel Fernandes (Google) wrote:
> Introduce in-kernel headers and other artifacts which are made available
> as an archive through proc (/proc/kheaders.tar.xz file). This archive makes
> it possible to build kernel modules, run eBPF programs, and other
> tracing programs that need to extend the kernel for tracing purposes
> without any dependency on the file system having headers and build
> artifacts.

Thanks for doing this work. This will be a very helpful feature and simplify
the workflow, especially when dealing with multiple targets each with its own
kernel tree/version.

> 
> On Android and embedded systems, it is common to switch kernels but not
> have kernel headers available on the file system. Raw kernel headers
> also cannot be copied into the filesystem like they can be on other
> distros, due to licensing and other issues. There's no linux-headers
> package on Android. Further once a different kernel is booted, any

I use non-android systems quite often for development. And yes while getting
the headers on the target isn't always a problem but when moving across trees
or versions it's very easy to mess this up and having the headers embedded in
the kernel means you're always guaranteed to use the right headers. eBPF is my
main use case here.

> headers stored on the file system will no longer be useful. By storing
> the headers as a compressed archive within the kernel, we can avoid these
> issues that have been a hindrance for a long time.
> 
> The feature is also buildable as a module just in case the user desires
> it not being part of the kernel image. This makes it possible to load
> and unload the headers on demand. A tracing program, or a kernel module
> builder can load the module, do its operations, and then unload the
> module to save kernel memory. The total memory needed is 3.8MB.

[...]

> diff --git a/scripts/gen_ikh_data.sh b/scripts/gen_ikh_data.sh
> new file mode 100755
> index ..7329262bed2f
> --- /dev/null
> +++ b/scripts/gen_ikh_data.sh
> @@ -0,0 +1,76 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +
> +spath="$(dirname "$(readlink -f "$0")")"
> +kroot="$spath/.."
> +outdir="$(pwd)"
> +tarfile=$1
> +cpio_dir=$outdir/$tarfile.tmp
> +
> +src_file_list=""
> +for f in $file_list; do

$file_list is not assigned here.

I applied the patches and I got an empty tar generated. Setting `file_list=$*`
fixed it for me - though not sure if this is the right fix to use. Last minute
change/cleanup accidently removed the line that assigns $file_list?

Cheers

--
Qais Yousef

> + src_file_list="$src_file_list $(echo $f | grep -v OBJDIR)"
> +done
> +
> +obj_file_list=""
> +for f in $file_list; do
> + f=$(echo $f | grep OBJDIR | sed -e 's/OBJDIR\///g')
> + obj_file_list="$obj_file_list $f";
> +done
> +
> +# Support incremental builds by skipping archive generation
> +# if timestamps of files being archived are not changed.
> +
> +# This block is useful for debugging the incremental builds.
> +# Uncomment it for debugging.
> +# iter=1
> +# if [ ! -f /tmp/iter ]; then echo 1 > /tmp/iter;
> +# else;  iter=$(($(cat /tmp/iter) + 1)); fi
> +# find $src_file_list -type f | xargs ls -lR > /tmp/src-ls-$iter
> +# find $obj_file_list -type f | xargs ls -lR > /tmp/obj-ls-$iter
> +
> +# modules.order and include/generated/compile.h are ignored because these are
> +# touched even when none of the source files changed. This causes pointless
> +# regeneration, so let us ignore them for md5 calculation.
> +pushd $kroot > /dev/null
> +src_files_md5="$(find $src_file_list -type f ! -name modules.order |
> + grep -v "include/generated/compile.h"  |
> + xargs ls -lR | md5sum | cut -d ' ' -f1)"
> +popd > /dev/null
> +obj_files_md5="$(find $obj_file_list -type f ! -name modules.order |
> + grep -v "include/generated/compile.h"  |
> + xargs ls -lR | md5sum | cut -d ' ' -f1)"
> +
> +if [ -f $tarfile ]; then tarfile_md5="$(md5sum $tarfile | cut -d ' ' -f1)"; 
> fi
> +if [ -f kernel/kheaders.md5 ] &&
> + [ "$(cat kernel/kheaders.md5|head -1)" == "$src_files_md5" ] &&
> + [ "$(cat kernel/kheaders.md5|head -2|tail -1)" == "$obj_files_md5" ] &&
> + [ "$(cat kernel/kheaders.md5|tail -1)" == "$tarfile_md5" ]; then
> + exit
> +fi
> +
> +rm -rf $cpio_dir
> +mkdir $cpio_dir
> +
> +pushd $kroot > /dev/null
> +for f

Re: [PATCH v3 1/2] Provide in-kernel headers for making it easy to extend the kernel

2019-02-28 Thread Qais Yousef

On 02/28/19 17:04, Dietmar Eggemann wrote:
> Hi Joel,
> 
> On 2/28/19 3:47 PM, Joel Fernandes wrote:
> > On Thu, Feb 28, 2019 at 01:53:43PM +0000, Qais Yousef wrote:
> > > Hi Joel
> > > 
> > > On 02/27/19 14:37, Joel Fernandes (Google) wrote:
> 
> [...]
> 
> > Ah good catch, I made this change for "file_list=${@:2}" in my tree but
> > forgot to push it. Below is the updated patch. Sorry and I'll refresh the
> > series with the change after we finish the discussion in the other thread.
> > Meanwhile the updated patch is as follows...
> > 
> > ---8<---
> > 
> > From: "Joel Fernandes (Google)" 
> > Subject: [PATCH v3.1] Provide in-kernel headers for making it easy to 
> > extend the kernel
> > 
> > Introduce in-kernel headers and other artifacts which are made available
> > as an archive through proc (/proc/kheaders.tar.xz file). This archive makes
> > it possible to build kernel modules, run eBPF programs, and other
> > tracing programs that need to extend the kernel for tracing purposes
> > without any dependency on the file system having headers and build
> > artifacts.
> > 
> > On Android and embedded systems, it is common to switch kernels but not
> > have kernel headers available on the file system. Raw kernel headers
> > also cannot be copied into the filesystem like they can be on other
> > distros, due to licensing and other issues. There's no linux-headers
> > package on Android. Further once a different kernel is booted, any
> > headers stored on the file system will no longer be useful. By storing
> > the headers as a compressed archive within the kernel, we can avoid these
> > issues that have been a hindrance for a long time.
> > 
> > The feature is also buildable as a module just in case the user desires
> > it not being part of the kernel image. This makes it possible to load
> > and unload the headers on demand. A tracing program, or a kernel module
> > builder can load the module, do its operations, and then unload the
> > module to save kernel memory. The total memory needed is 3.8MB.
> > 
> > The code to read the headers is based on /proc/config.gz code and uses
> > the same technique to embed the headers.
> 
> This version gives me the header files on a v5.0-rc8 kernel on my arm64 box
> but does not compile anymore on v4.20:
> 
> kernel/kheaders.c:25:22: error: expected identifier or ‘(’ before string
> constant
>  #define KH_MAGIC_END "IKHD_ED"
>   ^
> kernel/kheaders_data.h:1:1: note: in expansion of macro ‘KH_MAGIC_END’

I believe this is caused by this change ad774086356d (kbuild: change filechk
to surround the given command with { }) which changes how filechk defined - and
since filechk_ikheadersxz is used to generate kheaders_data.h it fails when
backported because the syntax has changed.

HTH

--
Qais Yousef

>  KH_MAGIC_END;
>  ^~~~
> kernel/kheaders.c: In function ‘ikheaders_read_current’:
> kernel/kheaders.c:38:12: error: ‘kernel_headers_data’ undeclared (first use
> in this function); did you mean ‘kernel_headers_data_size’?
> kernel_headers_data + KH_MAGIC_SIZE,
> ^~~
> kernel_headers_data_size
> kernel/kheaders.c:38:12: note: each undeclared identifier is reported only
> once for each function it appears in
> kernel/kheaders.c: In function ‘ikheaders_init’:
> kernel/kheaders.c:31:10: error: ‘kernel_headers_data’ undeclared (first use
> in this function); did you mean ‘kernel_headers_data_size’?
>   (sizeof(kernel_headers_data) - 1 - KH_MAGIC_SIZE * 2)
>   ^
> kernel/kheaders.c:57:23: note: in expansion of macro
> ‘kernel_headers_data_size’
>   proc_set_size(entry, kernel_headers_data_size);
>^~~~
> kernel/kheaders.c: In function ‘ikheaders_read_current’:
> kernel/kheaders.c:40:1: warning: control reaches end of non-void function
> [-Wreturn-type]
>  }
> 
> 
> The reason for me to stay on v4.20 is that with v5.0-rc8 I don't have ebpf
> 'raw tracepoint' support any more on my arm64 board. But this issue is not
> related to your patch though.
> 
> Another point which supports the functionality your patch provides is the
> fact that maintainers don't want to see new TRACE_EVENTs in their code. So
> here your patch comes handy when using ebpf for tracing in embedded
> environments.

Re: [PATCH v3 1/2] Provide in-kernel headers for making it easy to extend the kernel

2019-02-28 Thread Qais Yousef

On 02/28/19 17:04, Dietmar Eggemann wrote:
> Hi Joel,
> 
> On 2/28/19 3:47 PM, Joel Fernandes wrote:
> > On Thu, Feb 28, 2019 at 01:53:43PM +0000, Qais Yousef wrote:
> > > Hi Joel
> > > 
> > > On 02/27/19 14:37, Joel Fernandes (Google) wrote:
> 
> [...]
> 
> > Ah good catch, I made this change for "file_list=${@:2}" in my tree but
> > forgot to push it. Below is the updated patch. Sorry and I'll refresh the
> > series with the change after we finish the discussion in the other thread.
> > Meanwhile the updated patch is as follows...
> > 
> > ---8<---
> > 
> > From: "Joel Fernandes (Google)" 
> > Subject: [PATCH v3.1] Provide in-kernel headers for making it easy to 
> > extend the kernel
> > 
> > Introduce in-kernel headers and other artifacts which are made available
> > as an archive through proc (/proc/kheaders.tar.xz file). This archive makes
> > it possible to build kernel modules, run eBPF programs, and other
> > tracing programs that need to extend the kernel for tracing purposes
> > without any dependency on the file system having headers and build
> > artifacts.
> > 
> > On Android and embedded systems, it is common to switch kernels but not
> > have kernel headers available on the file system. Raw kernel headers
> > also cannot be copied into the filesystem like they can be on other
> > distros, due to licensing and other issues. There's no linux-headers
> > package on Android. Further once a different kernel is booted, any
> > headers stored on the file system will no longer be useful. By storing
> > the headers as a compressed archive within the kernel, we can avoid these
> > issues that have been a hindrance for a long time.
> > 
> > The feature is also buildable as a module just in case the user desires
> > it not being part of the kernel image. This makes it possible to load
> > and unload the headers on demand. A tracing program, or a kernel module
> > builder can load the module, do its operations, and then unload the
> > module to save kernel memory. The total memory needed is 3.8MB.
> > 
> > The code to read the headers is based on /proc/config.gz code and uses
> > the same technique to embed the headers.
> 
> This version gives me the header files on a v5.0-rc8 kernel on my arm64 box
> but does not compile anymore on v4.20:
> 
> kernel/kheaders.c:25:22: error: expected identifier or ‘(’ before string
> constant
>  #define KH_MAGIC_END "IKHD_ED"
>   ^
> kernel/kheaders_data.h:1:1: note: in expansion of macro ‘KH_MAGIC_END’
>  KH_MAGIC_END;
>  ^~~~
> kernel/kheaders.c: In function ‘ikheaders_read_current’:
> kernel/kheaders.c:38:12: error: ‘kernel_headers_data’ undeclared (first use
> in this function); did you mean ‘kernel_headers_data_size’?
> kernel_headers_data + KH_MAGIC_SIZE,
> ^~~
> kernel_headers_data_size
> kernel/kheaders.c:38:12: note: each undeclared identifier is reported only
> once for each function it appears in
> kernel/kheaders.c: In function ‘ikheaders_init’:
> kernel/kheaders.c:31:10: error: ‘kernel_headers_data’ undeclared (first use
> in this function); did you mean ‘kernel_headers_data_size’?
>   (sizeof(kernel_headers_data) - 1 - KH_MAGIC_SIZE * 2)
>   ^
> kernel/kheaders.c:57:23: note: in expansion of macro
> ‘kernel_headers_data_size’
>   proc_set_size(entry, kernel_headers_data_size);
>^~~~
> kernel/kheaders.c: In function ‘ikheaders_read_current’:
> kernel/kheaders.c:40:1: warning: control reaches end of non-void function
> [-Wreturn-type]
>  }
> 
> 
> The reason for me to stay on v4.20 is that with v5.0-rc8 I don't have ebpf
> 'raw tracepoint' support any more on my arm64 board. But this issue is not
> related to your patch though.

And this is caused by a38d1107f937 (bpf: support raw tracepoints in modules)
which renamed bpf_find_raw_tracepoint() which bcc-tools relies on to detect if
raw tracepoints are supported..

https://github.com/iovisor/bcc/blob/master/src/python/bcc/__init__.py#L860

Speaking of fragile depedencies :-)

I guess the check can be extended to check for both symbols - but it'll stay
fragile. Not sure if they can do better.

I filed a bug

https://github.com/iovisor/bcc/issues/2240

--
Qais Yousef

> 
> Another point which supports the functionality your patch provides is the
> fact that maintainers don't want to see new TRACE_EVENTs in their code. So
> here your patch comes handy when using ebpf for tracing in embedded
> environments.

Re: [PATCH v3 1/2] Provide in-kernel headers for making it easy to extend the kernel

2019-03-01 Thread Qais Yousef

On 02/28/19 22:26, Joel Fernandes wrote:
> On Fri, Mar 01, 2019 at 11:28:26AM +0900, Masami Hiramatsu wrote:
> > Hi Joel,
> 
> Hi Masami,
> 
> > On Thu, 28 Feb 2019 10:00:54 -0500
> > Joel Fernandes  wrote:
> > 
> > > > Hmm, isn't it easier to add kernel-headers package on Android?
> > > 
> > > I have already been down that road. In the Android ecosystem, the Android
> > > teams only provide a "userspace system image" which goes on the system
> > > partition of the flash (and a couple other images are also provided but
> > > system is the main one). The system image cannot contain GPL source code. 
> > > It
> > > is also not possible to put kernel headers for every kernel version on the
> > > system images that ship and is not practical. Android boots on 1000s of 
> > > forked
> > > kernels. It does not make sense to provide headers on the system image for
> > > every kernel version and I already had many discussions on the subject 
> > > with
> > > the teams, it is something that is just not done. Now for kernel modules,
> > > there's another image called the "vendor image" which is flashed onto the
> > > vendor parition, this is where kernel modules go.  This vendor image is 
> > > not
> > > provided by Google for non-Pixel devices. So we have no control over what
> > > goes there BUT we do know that kernel modules that are enabled will go 
> > > there,
> > > and we do have control over enforcing that certain kernel modules should 
> > > be
> > > built and available as they are mandatory for Android to function 
> > > properly.
> > > We would also possibly make it a built-in option as well. Anyway my point 
> > > is
> > > keeping it in the kernel is really the easiest and the smartest choice 
> > > IMO.
> > 
> > Sorry, I'm not convinced yet. This sounds like "because Android decided not
> > to put the header files on vendor partition, but kernel module is OK"
> > Why don't google ask vendors to put their kernel headers (or header tarball)
> > on vendor partition instead?
> 
> May be Google can do that, but I think you missed the point of the patches.
> Making it a module is not mandatory, people can build it into the kernel as
> well (CONFIG_IKHEADERS_PROC=y). In this case, the proc entry will be
> available on boot without any dependency on the filesystem. If you go through
> the other threads such as folks from ARM who replied, they just boot a kernel
> image for debug purpose and want headers on device available without any
> additional step of copying headers to the filesystem. And folks from Google
> also said that they wanted a built-in option.

Yes I do see this patch a useful addition. When you need the header and you're
dealing with multiple devices and kernel trees and versions - having the
headers part of the kernel just makes it easier to use them when you need them.
Especially sometimes when you change a config option (like enabling a new
syscall e.g: bpf) it's very easy to forget that the headers has changed as well
and you need to push an updated copy.

FWIW most of the time I run on non-android systems for development/debugging
purposes. I use the built-in version although I can see a module version
helpful too. You can save some space if your kernel image partition is small
and it's easy to install the module along with all other modules when
rebuilding the kernel than remembering to update the headers. It's a (very)
nice convenience.

--
Qais Yousef

> 
> There are many usecases for this, I have often run into issues with Linux
> over the years not only with Android, but other distros, where I boot custom
> kernels with no linux-headers package. This is quite painful. It is
> convenient to have it as /proc file since the file is dependent on kernel
> being booted up and this will work across all Linux distros and systems. I
> feel that if you can keep an open mind about it, you will see that a lot of
> people will use this feature if it is accepted and there is a lot of positive
> feedback in earlier posts of this set.
> 
> > > > > The code to read the headers is based on /proc/config.gz code and uses
> > > > > the same technique to embed the headers.
> > > > > 
> > > > > To build a module, the below steps have been tested on an x86 machine:
> > > > > modprobe kheaders
> > > > > rm -rf $HOME/headers
> > > > > mkdir -p $HOME/headers
> > > > > tar -xvf /proc/kheaders.tar.xz -C $HOME/headers >/dev/null
> > > > >

Re: [RFC][PATCH] Export supported trace features in debugfs

2019-03-12 Thread Qais Yousef

On 03/12/19 11:07, Qais Yousef wrote:
> eBPF tools like bcc-tools have hard time figuring out when features like
> raw_tracepoint are supported in the kernel on which we are running. At
> the moment a fragile mechanism of matching bpf_find_raw_tracepoint()
> function in /proc/kallsyms is used to find out whether raw tracepoints
> can be used or not. But when this function was renamed recently to
> bpf_get_raw_tracepoint() the tool started to fail to use raw
> tracepoints.
> 
> To help in providing a more reliable way to detect features like
> RAW_TRACEPOINT, add a new file in trace debugfs to export the supported
> features.
> 
> $cat /sys/kernel/debug/tracing/supported_features
> RAW_TRACEPOINT
> EXAMPLE_FEATURE_1
> EXAMPLE_FEATURE_2
> 
> Signed-off-by: Qais Yousef 
> ---
> 
> This is a half baked patch to probe the potential of this solution.
> 
> The breakage mentioned in the commit message is here:
> 
> https://github.com/iovisor/bcc/pull/2241/commits/0f5849187972a50adf0d9eaa8788c11f9fd926ea
> 
> I am not sure what else beside raw_tracepoint makes sense to expose right now.
> 
>  kernel/trace/trace.c | 25 +
>  1 file changed, 25 insertions(+)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index c4238b441624..daae09238e62 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -6670,6 +6670,28 @@ static const struct file_operations snapshot_raw_fops 
> = {
> 
>  #endif /* CONFIG_TRACER_SNAPSHOT */
> 
> +#define TRACE_FEATURE(feat)__stringify(feat) "\n"
> +
> +#define TRACE_FEATURES\
> +TRACE_FEATURE(RAW_TRACEPOINT)\
> +TRACE_FEATURE(EXAMPLE_FEATUTE_1)\
> +TRACE_FEATURE(EXAMPLE_FEATUTE_2)
> +
> +static ssize_t
> +tracing_read_trace_features(struct file *filp, char __user *ubuf,
> +size_t cnt, loff_t *ppos)
> +{
> +char *buf = TRACE_FEATURES;
> +size_t len = sizeof(TRACE_FEATURES);
> +
> +return simple_read_from_buffer(ubuf, cnt, ppos, buf, len);
> +}
> +
> +static const struct file_operations show_trace_features_fops = {
> +.read   = tracing_read_trace_features,
> +.llseek = no_llseek,
> +};
> +
>  static int tracing_buffers_open(struct inode *inode, struct file *filp)
>  {
>  struct trace_array *tr = inode->i_private;
> @@ -8242,6 +8264,9 @@ static __init int tracer_init_tracefs(void)
>  &ftrace_update_tot_cnt, &tracing_dyn_info_fops);
>  #endif
> 
> +trace_create_file("trace_features", 0444, d_tracer,
> +NULL, &show_trace_features_fops);
> +
>  create_trace_instances(d_tracer);
> 
>  update_tracer_options(&global_trace);
> --
> 2.17.1
> 
> IMPORTANT NOTICE: The contents of this email and any attachments are 
> confidential and may also be privileged. If you are not the intended 
> recipient, please notify the sender immediately and do not disclose the 
> contents to any other person, use it for any purpose, or store or copy the 
> information in any medium. Thank you.

Apologies that shouldn't have appeared. I can resend the patch if it matters.

--
Qais Yousef

Re: [PATCH 0/3] drivers: Frequency constraint infrastructure

2019-01-28 Thread Qais Yousef

> On Fri, Jan 18, 2019 at 1:39 PM Juri Lelli  wrote:
> >
> > On 17/01/19 15:55, Rafael J. Wysocki wrote:
> > > On Thu, Jan 17, 2019 at 2:16 PM Juri Lelli  wrote:
> > > >
> > > > On 11/01/19 10:47, Rafael J. Wysocki wrote:
> > > > > On Fri, Jan 11, 2019 at 10:18 AM Viresh Kumar 
> > > > >  wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > This commit introduces the frequency constraint infrastructure, 
> > > > > > which
> > > > > > provides a generic interface for parts of the kernel to constraint 
> > > > > > the
> > > > > > working frequency range of a device.
> > > > > >
> > > > > > The primary users of this are the cpufreq and devfreq frameworks. 
> > > > > > The
> > > > > > cpufreq framework already implements such constraints with help of
> > > > > > notifier chains (for thermal and other constraints) and some local 
> > > > > > code
> > > > > > (for user-space constraints). The devfreq framework developers have 
> > > > > > also
> > > > > > shown interest [1] in such a framework, which may use it at a later
> > > > > > point of time.
> > > > > >
> > > > > > The idea here is to provide a generic interface and get rid of the
> > > > > > notifier based mechanism.
> > > > > >
> > > > > > Only one constraint is added for now for the cpufreq framework and 
> > > > > > the
> > > > > > rest will follow after this stuff is merged.
> > > > > >
> > > > > > Matthias Kaehlcke was involved in the preparation of the first 
> > > > > > draft of
> > > > > > this work and so I have added him as Co-author to the first patch.
> > > > > > Thanks Matthias.
> > > > > >
> > > > > > FWIW, This doesn't have anything to do with the boot-constraints
> > > > > > framework [2] I was trying to upstream earlier :)
> > > > >
> > > > > This is quite a bit of code to review, so it will take some time.
> > > > >
> > > > > One immediate observation is that it seems to do quite a bit of what
> > > > > is done in the PM QoS framework, so maybe there is an opportunity for
> > > > > some consolidation in there.
> > > >
> > > > Right, had the same impression. :-)
> > > >
> > > > I was also wondering how this new framework is dealing with
> > > > constraints/request imposed/generated by the scheduler and related
> > > > interfaces (thinking about schedutil and Patrick's util_clamp).
> > >
> > > My understanding is that it is orthogonal to them, like adding extra
> > > constraints on top of them etc.
> >
> > Mmm, ok. But, if that is indeed the case, I now wonder why and how
> > existing (or hopefully to be added soon) interfaces are not sufficient.
> > I'm not against this proposal, just trying to understand if this might
> > create unwanted, hard to manage, overlap.

I echo these concerns as well.

>
> That is a valid concern IMO.  Especially the utilization clamping and
> the interconnect framework seem to approach the same problem space
> from different directions.
>
> For cpufreq this work can be regarded as a replacement for notifiers
> which are a bandaid of sorts and it would be good to get rid of them.
> They are mostly used for thermal management and I guess that devfreq
> users also may want to reduce frequency for thermal reasons and I'd
> rather not add notifiers to that framework for this purpose.
>
> However, as stated previously, this resembles the PM QoS framework
> quite a bit to me and whatever thermal entity, say, sets these
> constraints, it should not work against schedutil and similar.  In

But we have no way to enforce this, no? I'm thinking if frequency can be
constrained in PM QoS framework, then we will end up with some drivers that
think it's a good idea to use it and potentially end up breaking this "should
not work against schedutil and similar".

Or did I miss something?

My point is that if we introduce something too generic we might end up
encouraging more users and end up with a complex set of rules/interactions and
lose some determinism. But I could be reading too much into it :-)

Cheers

--
Qais Yousef

> some situations setting a max frequency limit to control thermals is
> not the most efficient way to go as it effectively turns into
> throttling and makes performance go south.  For example, it may cause
> things to run at the limit frequency all the time which may be too
> slow and it may be more efficient to allow higher frequencies to be
> used, but instead control how much of the time they can be used.  So
> we need to be careful here.
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

Re: Tracehooks in scheduler

2019-04-09 Thread Qais Yousef

(+ LKML)

Apologies forgot to CC the list.

On 04/07/19 18:52, Qais Yousef wrote:
> Hi Steve, Peter
> 
> I know the topic has sprung up in the past but I couldn't find anything that
> points into any conclusion.
> 
> As far as I understand new TRACE_EVENTS() in the scheduler (and probably other
> subsystems) isn't desirable as it intorduces a sort of ABI that can be painful
> to maintain.
> 
> But for us to be able to test various aspect of EAS, we rely on some events
> that track load_avg, util_avg and some other metrics in the scheduler.
> Example of such patches that are in android and we maintain out of tree can be
> found here:
> 
> https://android.googlesource.com/kernel/common/+/42903694913697da88a4ac627a92bbfdf44f0a2e
> https://android.googlesource.com/kernel/common/+/6dfaed989ea4ca223f0913dfc11cdafd9664fc1c
> 
> Dietmar and Quentin pointed me to a discussion you guys had with Daniel 
> Bristot
> in the last LPC when he had a similar need. So it is something that could
> benefit other users as well.
> 
> What is the best way forward to be able to add tracehooks into the scheduler
> and any other subsystem for that matters?
> 
> We tried using DECLARE_TRACE() to create a tracepoint which doesn't export
> anything in /sys/kernel/debug/tracing/events and hoped that we can use eBPF or
> a kernel module to attach to this tracepoint and access the args to inject our
> own trace_printks() but this didn't work. The glue logic necessary to attach
> to this tracepoint in a similar manner to how RAW_TRACEPOINT() in eBPF works
> isn't there AFAICT.
> 
> I can post the full example if the above doesn't make sense. I am still
> familiarizing myself with the different aspects of this code as well. There
> might be support for what we want but I failed to figure out the magic
> combination to get it to work.
> 
> If I got this glue logic done, would this be an acceptable solution? If not, 
> do
> you have any suggestions on how to progress?
> 
> Thanks
> 
> --
> Qais Yousef

Re: [PATCH 1/2] sched/topology: build_sched_groups: Skip duplicate group rewrites

2019-04-10 Thread Qais Yousef

On 04/09/19 18:35, Valentin Schneider wrote:
> While staring at build_sched_domains(), I realized that get_group()
> does several duplicate (thus useless) writes.
> 
> If you take the Arm Juno r0 (LITTLEs = [0, 3, 4, 5], bigs = [1, 2]), the
> sched_group build flow would look like this:
> 
> ('MC[cpu]->sg' means 'per_cpu_ptr(&tl->data->sg, cpu)' with 'tl == MC')
> 
> build_sched_groups(MC[CPU0]->sd, CPU0)
>   get_group(0) -> MC[CPU0]->sg
>   get_group(3) -> MC[CPU3]->sg
>   get_group(4) -> MC[CPU4]->sg
>   get_group(5) -> MC[CPU5]->sg
> 
> build_sched_groups(DIE[CPU0]->sd, CPU0)
>   get_group(0) -> DIE[CPU0]->sg
>   get_group(1) -> DIE[CPU1]->sg <-+
> |
> build_sched_groups(MC[CPU1]->sd, CPU1)|
>   get_group(1) -> MC[CPU1]->sg|
>   get_group(2) -> MC[CPU2]->sg|
> |
> build_sched_groups(DIE[CPU1]->sd, CPU1)   ^
>   get_group(1) -> DIE[CPU1]->sg  } We've set up these two up here!
>   get_group(3) -> DIE[CPU0]->sg  }
> 
> From this point on, we will only use sched_groups that have been
> previously visited & initialized. The only new operation will
> be which group pointer we affect to sd->groups.
> 
> On the Juno r0 we get 32 get_group() calls, every single one of them
> writing to a sched_group->cpumask. However, all of the data structures
> we need are set up after 8 visits (see above).
> 
> Return early from get_group() if we've already visited (and thus
> initialized) the sched_group we're looking at. Overlapping domains
> are not affected as they do not use build_sched_groups().
> 
> Tested on a Juno and a 2 * (Xeon E5-2690) system.
> 
> Signed-off-by: Valentin Schneider 
> ---
> FWIW I initially checked the refs for both sg && sg->sgc, but figured if
> they weren't both 0 or > 1 then something must have gone wrong, so I
> threw in a WARN_ON().
> ---
>  kernel/sched/topology.c | 12 +---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 64bec54ded3e..6c0b7326f66e 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1059,6 +1059,7 @@ static struct sched_group *get_group(int cpu, struct 
> sd_data *sdd)
>   struct sched_domain *sd = *per_cpu_ptr(sdd->sd, cpu);
>   struct sched_domain *child = sd->child;
>   struct sched_group *sg;
> + bool already_visited;
>  
>   if (child)
>   cpu = cpumask_first(sched_domain_span(child));
> @@ -1066,9 +1067,14 @@ static struct sched_group *get_group(int cpu, struct 
> sd_data *sdd)
>   sg = *per_cpu_ptr(sdd->sg, cpu);
>   sg->sgc = *per_cpu_ptr(sdd->sgc, cpu);
>  
> - /* For claim_allocations: */
> - atomic_inc(&sg->ref);
> - atomic_inc(&sg->sgc->ref);
> + /* Increase refcounts for claim_allocations: */
> + already_visited = atomic_inc_return(&sg->ref) > 1;
> + /* sgc visits should follow a similar trend as sg */
> + WARN_ON(already_visited != (atomic_inc_return(&sg->sgc->ref) > 1));

NIT: I think it would be better to not have any side effect of calling
WARN_ON(). Ie: do the atomic_inc_return() outside the WARN_ON() condition.
Having two bool already_visited_sg and already_visited_sgc will make the code
more readable too.

Thanks

--
Qais Yousef

> +
> + /* If we have already visited that group, it's already initialized. */
> + if (already_visited)
> + return sg;
>  
>   if (child) {
>   cpumask_copy(sched_group_span(sg), sched_domain_span(child));
> --
> 2.20.1
>

Re: [PATCH 1/2] sched/topology: build_sched_groups: Skip duplicate group rewrites

2019-04-10 Thread Qais Yousef

On 04/10/19 11:17, Valentin Schneider wrote:
> On 10/04/2019 10:27, Qais Yousef wrote:
> [...]
> >> @@ -1066,9 +1067,14 @@ static struct sched_group *get_group(int cpu, 
> >> struct sd_data *sdd)
> >>sg = *per_cpu_ptr(sdd->sg, cpu);
> >>sg->sgc = *per_cpu_ptr(sdd->sgc, cpu);
> >>  
> >> -  /* For claim_allocations: */
> >> -  atomic_inc(&sg->ref);
> >> -  atomic_inc(&sg->sgc->ref);
> >> +  /* Increase refcounts for claim_allocations: */
> >> +  already_visited = atomic_inc_return(&sg->ref) > 1;
> >> +  /* sgc visits should follow a similar trend as sg */
> >> +  WARN_ON(already_visited != (atomic_inc_return(&sg->sgc->ref) > 1));
> > 
> > NIT: I think it would be better to not have any side effect of calling
> > WARN_ON(). Ie: do the atomic_inc_return() outside the WARN_ON() condition.
> > Having two bool already_visited_sg and already_visited_sgc will make the 
> > code
> > more readable too.
> > 
> 
> I agree it's not the nicest reading flow there is. It's already gone in
> tip/sched/core but if needed I can resend with this extra separation.

It was just a nit from my side. So for me it's not worth a resend if it's
already accepted.

--
Qais Yousef

Re: [PATCH V2 4/5] cpufreq: Register notifiers with the PM QoS framework

2019-02-26 Thread Qais Yousef

On 02/26/19 08:00, Viresh Kumar wrote:
> On 25-02-19, 12:14, Qais Yousef wrote:
> > On 02/25/19 14:39, Viresh Kumar wrote:
> > > On 25-02-19, 08:58, Qais Yousef wrote:
> > > > On 02/25/19 10:01, Viresh Kumar wrote:
> > > > > > > + min = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MIN_FREQUENCY);
> > > > > > > + max = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MAX_FREQUENCY);
> > > > > > > +
> > > > > > > + if (min > new_policy->min)
> > > > > > > + new_policy->min = min;
> > > > > > > + if (max < new_policy->max)
> > > > > > > + new_policy->max = max;
> > > 
> > > > And this is why we need to check here if the PM QoS value doesn't 
> > > > conflict with
> > > > the current min/max, right? Until the current notifier code is removed 
> > > > they
> > > > could trip over each others.
> > > 
> > > No. The above if/else block is already removed as part of patch 5/5. It 
> > > was
> > > required because of conflict between userspace specific min/max and qos 
> > > min/max,
> > > which are migrated to use qos by patc 5/5.
> > > 
> > > The cpufreq notifier mechanism already lets users play with min/max and 
> > > that is
> > > already safe from conflicts.
> > > 
> > > 
> > > > It would be nice to add a comment here about PM QoS managing and 
> > > > remembering
> > > > values
> > > 
> > > I am not sure if that would add any value. Some documentation update may 
> > > be
> > > useful for people looking for details though, that I shall do after all 
> > > the
> > > changes get in and things become a bit stable.
> > > 
> > 
> > Up to you. But not everyone is familiar with the code and a one line comment
> > that points to where aggregation is happening would be helpful for someone
> > scanning this code IMHO.
> 
> Okay, will add something then.
> 
> > > > and that we need to be careful that both mechanisms don't trip over
> > > > each others until this transient period is over.
> > > 
> > > The second mechanism will die very very soon once this is merged, 
> > > migrating them
> > > shouldn't be a big challenge AFAICT. I didn't attempt that because I 
> > > didn't
> > > wanted to waste time updating things in case this version also doesn't 
> > > make
> > > sense to others.
> > > 
> > > > I have a nit too. It would be nice to explicitly state this is
> > > > CPU_{MIN,MAX}_FREQUENCY. I can see someone else adding 
> > > > {MIN,MAX}_FREQUENCY for
> > > > something elsee (memory maybe?)
> > > 
> > > This is not CPU specific, but any device. The same interface shall be 
> > > used by
> > > devfreq as well, who wanted to use freq-constraints initially.
> > > 
> > 
> > I don't get that to be honest. I probably have to read more.
> > 
> > Is what you're saying that when applying a MIN_FREQUENCY constraint the same
> > value will be applied to both cpufreq and devfreq? Isn't this too coarse?
> 
> Oh no. A constraint with QoS is added like this:
> 
> dev_pm_qos_add_request(dev, req, DEV_PM_QOS_MIN_FREQUENCY, min);
> 
> Now dev here can be any device struct, CPU's or GPU's or anything else. All 
> the
> MIN freq requests are stored/processed per device and for a CPU in cpufreq all
> we will see is MIN requests for the CPUs. And so the macro is required to be a
> bit generic and shouldn't have CPU word within it.
> 
> Hope I was able to clarify your doubt a bit. Thanks.

Ah I see yes it all makes sense now.

Thanks!

--
Qais Yousef

Re: [PATCH RESEND v2 1/2] cpu/hotplug: Fix build error of using {add,remove}_cpu() with !CONFIG_SMP

2021-02-23 Thread Qais Yousef

On 02/21/21 21:43, shuo.a@intel.com wrote:
> From: Shuo Liu 
> 
> 279dcf693ac7 ("virt: acrn: Introduce an interface for Service VM to
> control vCPU") introduced {add,remove}_cpu() usage and it hit below
> error with !CONFIG_SMP:
> 
> ../drivers/virt/acrn/hsm.c: In function ‘remove_cpu_store’:
> ../drivers/virt/acrn/hsm.c:389:3: error: implicit declaration of function 
> ‘remove_cpu’; [-Werror=implicit-function-declaration]
>remove_cpu(cpu);
> 
> ../drivers/virt/acrn/hsm.c:402:2: error: implicit declaration of function 
> ‘add_cpu’; [-Werror=implicit-function-declaration]
>add_cpu(cpu);
> 
> Add add_cpu() function prototypes with !CONFIG_SMP and remove_cpu() with
> !CONFIG_HOTPLUG_CPU for such usage.
> 
> Fixes: 279dcf693ac7 ("virt: acrn: Introduce an interface for Service VM to 
> control vCPU")
> Reported-by: Randy Dunlap 
> Signed-off-by: Shuo Liu 
> Acked-by: Randy Dunlap  # build-tested
> Cc: Stephen Rothwell 
> Cc: Thomas Gleixner 
> Cc: Greg Kroah-Hartman 
> Cc: Qais Yousef 
> ---

Reviewed-by: Qais Yousef 

Thanks!

--
Qais Yousef

>  include/linux/cpu.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index 3aaa0687e8df..94a578a96202 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -108,6 +108,8 @@ static inline void cpu_maps_update_done(void)
>  {
>  }
>  
> +static inline int add_cpu(unsigned int cpu) { return 0;}
> +
>  #endif /* CONFIG_SMP */
>  extern struct bus_type cpu_subsys;
>  
> @@ -137,6 +139,7 @@ static inline int  cpus_read_trylock(void) { return true; 
> }
>  static inline void lockdep_assert_cpus_held(void) { }
>  static inline void cpu_hotplug_disable(void) { }
>  static inline void cpu_hotplug_enable(void) { }
> +static inline int remove_cpu(unsigned int cpu) { return -EPERM; }
>  static inline void smp_shutdown_nonboot_cpus(unsigned int primary_cpu) { }
>  #endif   /* !CONFIG_HOTPLUG_CPU */
>  
> 
> base-commit: abaf6f60176f1ae9d946d63e4db63164600b7b1a
> -- 
> 2.28.0
>

Re: [PATCH RESEND v2 2/2] virt: acrn: Make remove_cpu sysfs invisible with !CONFIG_HOTPLUG_CPU

2021-02-23 Thread Qais Yousef

On 02/21/21 21:43, shuo.a@intel.com wrote:
> From: Shuo Liu 
> 
> Without cpu hotplug support, vCPU cannot be removed from a Service VM.
> Don't expose remove_cpu sysfs when CONFIG_HOTPLUG_CPU disabled.
> 
> Signed-off-by: Shuo Liu 
> Acked-by: Randy Dunlap  # build-tested
> Cc: Stephen Rothwell 
> Cc: Thomas Gleixner 
> Cc: Greg Kroah-Hartman 
> Cc: Qais Yousef 
> ---
>  drivers/virt/acrn/hsm.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/virt/acrn/hsm.c b/drivers/virt/acrn/hsm.c
> index 1f6b7c54a1a4..6996ea6219e5 100644
> --- a/drivers/virt/acrn/hsm.c
> +++ b/drivers/virt/acrn/hsm.c
> @@ -404,6 +404,14 @@ static ssize_t remove_cpu_store(struct device *dev,
>  }
>  static DEVICE_ATTR_WO(remove_cpu);
>  
> +static umode_t acrn_attr_visible(struct kobject *kobj, struct attribute *a, 
> int n)
> +{
> +   if (a == &dev_attr_remove_cpu.attr)
> +   return IS_ENABLED(CONFIG_HOTPLUG_CPU) ? a->mode : 0;
> +
> +   return a->mode;
> +}
> +

I can't find this code in Linus' master. But this looks fine from my narrow
PoV. Protecting the attribute with ifdef CONFIG_HOTPLUG_CPU is easier to read
for me, but this doesn't mean this approach is not fine.

Thanks

--
Qais Yousef

>  static struct attribute *acrn_attrs[] = {
>   &dev_attr_remove_cpu.attr,
>   NULL
> @@ -411,6 +419,7 @@ static struct attribute *acrn_attrs[] = {
>  
>  static struct attribute_group acrn_attr_group = {
>   .attrs = acrn_attrs,
> + .is_visible = acrn_attr_visible,
>  };
>  
>  static const struct attribute_group *acrn_attr_groups[] = {
> -- 
> 2.28.0
>

Re: [PATCH 0/8] sched/fair: misfit task load-balance tweaks

2021-02-03 Thread Qais Yousef

On 01/28/21 18:31, Valentin Schneider wrote:
> Hi folks,
> 
> Here is this year's series of misfit changes. On the menu:
> 
> o Patch 1 is an independent active balance cleanup
> o Patch 2 adds some more sched_asym_cpucapacity static branches
> o Patch 3 introduces yet another margin for capacity to capacity
>   comparisons
> o Patches 4-6 build on top of patch 3 and change capacity comparisons
>   throughout misfit load balancing  
> o Patches 7-8 fix some extra misfit issues I've been seeing on "real"
>   workloads.
> 
> IMO the somewhat controversial bit is patch 3, because it attempts to solve
> margin issues by... Adding another margin. This does solve issues on
> existing platforms (e.g. Pixel4), but we'll be back to square one the day
> some "clever" folks spin a platform with two different CPU capacities less 
> than
> 5% apart.

One more margin is a cause of apprehension to me. But in this case I think it
is the appropriate thing to do now. I can't think of a scenario where this
could hurt.

Thanks

--
Qais Yousef

Re: [PATCH 4/8] sched/fair: Use dst_cpu's capacity rather than group {min, max} capacity

2021-02-03 Thread Qais Yousef

On 01/28/21 18:31, Valentin Schneider wrote:
> Comparing capacity extrema of local and source sched_group's doesn't make
> much sense when at the day of the day the imbalance will be pulled by a
> known env->dst_cpu, whose capacity can be anywhere within the local group's
> capacity extrema.
> 
> Replace group_smaller_{min, max}_cpu_capacity() with comparisons of the
> source group's min/max capacity and the destination CPU's capacity.
> 
> Signed-off-by: Valentin Schneider 
> ---

Reviewed-by: Qais Yousef 

Thanks

--
Qais Yousef

>  kernel/sched/fair.c | 31 +++
>  1 file changed, 3 insertions(+), 28 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 58ce0b22fcb0..0959a770ecc0 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8352,26 +8352,6 @@ group_is_overloaded(unsigned int imbalance_pct, struct 
> sg_lb_stats *sgs)
>   return false;
>  }
>  
> -/*
> - * group_smaller_min_cpu_capacity: Returns true if sched_group sg has smaller
> - * per-CPU capacity than sched_group ref.
> - */
> -static inline bool
> -group_smaller_min_cpu_capacity(struct sched_group *sg, struct sched_group 
> *ref)
> -{
> - return capacity_greater(ref->sgc->min_capacity, sg->sgc->min_capacity);
> -}
> -
> -/*
> - * group_smaller_max_cpu_capacity: Returns true if sched_group sg has smaller
> - * per-CPU capacity_orig than sched_group ref.
> - */
> -static inline bool
> -group_smaller_max_cpu_capacity(struct sched_group *sg, struct sched_group 
> *ref)
> -{
> - return capacity_greater(ref->sgc->max_capacity, sg->sgc->max_capacity);
> -}
> -
>  static inline enum
>  group_type group_classify(unsigned int imbalance_pct,
> struct sched_group *group,
> @@ -8523,15 +8503,10 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>   if (!sgs->sum_h_nr_running)
>   return false;
>  
> - /*
> -  * Don't try to pull misfit tasks we can't help.
> -  * We can use max_capacity here as reduction in capacity on some
> -  * CPUs in the group should either be possible to resolve
> -  * internally or be covered by avg_load imbalance (eventually).
> -  */
> + /* Don't try to pull misfit tasks we can't help */
>   if (static_branch_unlikely(&sched_asym_cpucapacity) &&
>   sgs->group_type == group_misfit_task &&
> - (!group_smaller_max_cpu_capacity(sg, sds->local) ||
> + (!capacity_greater(capacity_of(env->dst_cpu), 
> sg->sgc->max_capacity) ||
>sds->local_stat.group_type != group_has_spare))
>   return false;
>  
> @@ -8615,7 +8590,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>*/
>   if (sd_has_asym_cpucapacity(env->sd) &&
>   (sgs->group_type <= group_fully_busy) &&
> - (group_smaller_min_cpu_capacity(sds->local, sg)))
> + (capacity_greater(sg->sgc->min_capacity, 
> capacity_of(env->dst_cpu
>   return false;
>  
>   return true;
> -- 
> 2.27.0
>

Re: [PATCH 3/8] sched/fair: Tweak misfit-related capacity checks

2021-02-03 Thread Qais Yousef

On 01/28/21 18:31, Valentin Schneider wrote:
> During load-balance, groups classified as group_misfit_task are filtered
> out if they do not pass
> 
>   group_smaller_max_cpu_capacity(, );
> 
> which itself employs fits_capacity() to compare the sgc->max_capacity of
> both groups.
> 
> Due to the underlying margin, fits_capacity(X, 1024) will return false for
> any X > 819. Tough luck, the capacity_orig's on e.g. the Pixel 4 are
> {261, 871, 1024}. If a CPU-bound task ends up on one of those "medium"
> CPUs, misfit migration will never intentionally upmigrate it to a CPU of
> higher capacity due to the aforementioned margin.
> 
> One may argue the 20% margin of fits_capacity() is excessive in the advent
> of counter-enhanced load tracking (APERF/MPERF, AMUs), but one point here
> is that fits_capacity() is meant to compare a utilization value to a
> capacity value, whereas here it is being used to compare two capacity
> values. As CPU capacity and task utilization have different dynamics, a
> sensible approach here would be to add a new helper dedicated to comparing
> CPU capacities.
> 
> Introduce capacity_greater(), which uses a 5% margin. Use it to replace the
> existing capacity checks. Note that check_cpu_capacity() uses yet another
> margin (sd->imbalance_pct), and is left alone for now.
> 
> Signed-off-by: Valentin Schneider 
> ---
>  kernel/sched/fair.c | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7d01fa0bfc7e..58ce0b22fcb0 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -113,6 +113,13 @@ int __weak arch_asym_cpu_priority(int cpu)
>   */
>  #define fits_capacity(cap, max)  ((cap) * 1280 < (max) * 1024)
>  
> +/*
> + * The margin used when comparing CPU capacities.
> + * is 'cap' noticeably greater than 'ref'
> + *
> + * (default: ~5%)
> + */
> +#define capacity_greater(cap, ref) ((ref) * 1078 < (cap) * 1024)

nit: can we use cap1 and cap2 and make the implementation use '>' instead of
'<'? ie:

#define capacity_greater(cap1, cap2) ((cap1) * 1024 > (cap2) * 1078)

this is more intuitive to read IMHO. Especially few lines below we have

    return capacity_greater(ref->sgc->max_capacity, sg->sgc->max_capacity);

which pass 'ref->...' as cap which can be confusing when looking at the
function signature @ref.

Either way, this LGTM

Reviewed-by: Qais Yousef 

Thanks

--
Qais Yousef

>  #endif
>  
>  #ifdef CONFIG_CFS_BANDWIDTH
> @@ -8253,7 +8260,7 @@ check_cpu_capacity(struct rq *rq, struct sched_domain 
> *sd)
>  static inline int check_misfit_status(struct rq *rq, struct sched_domain *sd)
>  {
>   return rq->misfit_task_load &&
> - (rq->cpu_capacity_orig < rq->rd->max_cpu_capacity ||
> + (capacity_greater(rq->rd->max_cpu_capacity, 
> rq->cpu_capacity_orig) ||
>check_cpu_capacity(rq, sd));
>  }
>  
> @@ -8352,7 +8359,7 @@ group_is_overloaded(unsigned int imbalance_pct, struct 
> sg_lb_stats *sgs)
>  static inline bool
>  group_smaller_min_cpu_capacity(struct sched_group *sg, struct sched_group 
> *ref)
>  {
> - return fits_capacity(sg->sgc->min_capacity, ref->sgc->min_capacity);
> + return capacity_greater(ref->sgc->min_capacity, sg->sgc->min_capacity);
>  }
>  
>  /*
> @@ -8362,7 +8369,7 @@ group_smaller_min_cpu_capacity(struct sched_group *sg, 
> struct sched_group *ref)
>  static inline bool
>  group_smaller_max_cpu_capacity(struct sched_group *sg, struct sched_group 
> *ref)
>  {
> - return fits_capacity(sg->sgc->max_capacity, ref->sgc->max_capacity);
> + return capacity_greater(ref->sgc->max_capacity, sg->sgc->max_capacity);
>  }
>  
>  static inline enum
> @@ -9421,7 +9428,7 @@ static struct rq *find_busiest_queue(struct lb_env *env,
>* average load.
>*/
>   if (sd_has_asym_cpucapacity(env->sd) &&
> - capacity_of(env->dst_cpu) < capacity &&
> + !capacity_greater(capacity_of(env->dst_cpu), capacity) &&
>   nr_running == 1)
>   continue;
>  
> -- 
> 2.27.0
>

Re: [PATCH 5/8] sched/fair: Make check_misfit_status() only compare dynamic capacities

2021-02-03 Thread Qais Yousef

On 01/28/21 18:31, Valentin Schneider wrote:
> check_misfit_status() checks for both capacity pressure & available CPUs
> with higher capacity. Now that we have a sane(ish) capacity comparison
> margin which is used throughout load-balance, this can be condensed into a
> single check:
> 
>   capacity_greater(, );
> 
> This has the added benefit of returning false if the misfit task CPU's is
> heavily pressured, but there are no better candidates for migration.
> 
> Signed-off-by: Valentin Schneider 
> ---

check_cpu_capacity() call looks redundant now, yes.

Reviewed-by: Qais Yousef 

Thanks

--
Qais Yousef

>  kernel/sched/fair.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0959a770ecc0..ef44474b8fbf 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8254,14 +8254,12 @@ check_cpu_capacity(struct rq *rq, struct sched_domain 
> *sd)
>  
>  /*
>   * Check whether a rq has a misfit task and if it looks like we can actually
> - * help that task: we can migrate the task to a CPU of higher capacity, or
> - * the task's current CPU is heavily pressured.
> + * help that task: we can migrate the task to a CPU of higher capacity.
>   */
> -static inline int check_misfit_status(struct rq *rq, struct sched_domain *sd)
> +static inline int check_misfit_status(struct rq *rq)
>  {
>   return rq->misfit_task_load &&
> - (capacity_greater(rq->rd->max_cpu_capacity, 
> rq->cpu_capacity_orig) ||
> -  check_cpu_capacity(rq, sd));
> + capacity_greater(rq->rd->max_cpu_capacity, rq->cpu_capacity);
>  }
>  
>  /*
> @@ -10238,7 +10236,7 @@ static void nohz_balancer_kick(struct rq *rq)
>* When ASYM_CPUCAPACITY; see if there's a higher capacity CPU
>* to run the misfit task on.
>*/
> - if (check_misfit_status(rq, sd)) {
> + if (check_misfit_status(rq)) {
>   flags = NOHZ_KICK_MASK;
>   goto unlock;
>   }
> -- 
> 2.27.0
>

1 2 3 4 5 6 7 >

1 - 100 of 683 matches

Mail list logo