Re: [PATCH 1/3] iommu/amd: Add logic to decode AMD IOMMU event flag

2013-04-02 Thread Suravee Suthikulpanit

On 4/2/2013 10:29 AM, Borislav Petkov wrote:

On Tue, Apr 02, 2013 at 05:03:04PM +0200, Joerg Roedel wrote:

On Tue, Apr 02, 2013 at 04:40:37PM +0200, Borislav Petkov wrote:

While you guys are at it, can someone fix this too pls (ASUS board with
a PD on it).

[0.220342] [Firmware Bug]: AMD-Vi: IOAPIC[9] not in IVRS table
[0.220398] [Firmware Bug]: AMD-Vi: IOAPIC[10] not in IVRS table
[0.220451] [Firmware Bug]: AMD-Vi: No southbridge IOAPIC found in IVRS table
[0.220506] AMD-Vi: Disabling interrupt remapping due to BIOS Bug(s)

That is actually a BIOS problem. I wonder whether it would help to turn this
into a WARN_ON to get the board vendors to release working BIOSes.
Opinions?

Good luck trying to get ASUS to fix anything in their BIOS :(.
I have tried to contact Asus in the past to have them fix the issue, but 
I got no luck.  Once it is out in the field, it's very difficult to get 
them to make changes.  I am also addressing this issue with the BIOS 
team for the future hardware.


Turning this into WARN_ON() at this point might break a lot of systems 
currently out in the field.  However, users can always switching to use 
"intremap=off" but this might not be obvious.


Suravee


Can't we detect the SB IOAPIC some other way in this case?




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V5 0/3] perf/x86/amd: AMD Family 16h Data Breakpoint Extensions

2013-10-02 Thread Suravee Suthikulpanit

On 10/2/2013 11:15 AM, Oleg Nesterov wrote:

On 10/02,suravee.suthikulpa...@amd.com  wrote:

>
>From: Suravee Suthikulpanit
>
>Frederic, this is the rebase of the V4 patch onto the linux-3.12.0-rc3 
(linux.git),
>and retest.

But the code is the same? If yes,

Reviewed-by: Oleg Nesterov


The only change I made was in the tools/perf/util/parse-events.y due to change 
in parse_events_add_breakpoint(), but should be the same logic.

Suravee.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, microcode, AMD: Fix patch level reporting for family15h

2013-09-26 Thread Suravee Suthikulpanit

On 9/26/2013 6:06 PM, Andreas Herrmann wrote:

On Fri, Sep 27, 2013 at 12:13:22AM +0200, Borislav Petkov wrote:

On Thu, Sep 26, 2013 at 04:54:32PM -0500, suravee.suthikulpa...@amd.com wrote:

From: Suravee Suthikulpanit 

On AMD family15h, applying microcode patch on the a core (core0)
would also affect the other core (core1) in the same compute unit.
The driver would skip applying the patch on core1, but it still
need to update kernel structures to reflect the proper patch level.

The current logic is not updating the struct ucode_cpu_info.cpu_sig.rev
of the skipped core. This causes the 
/sys/devices/system/cpu/cpu1/microcode/version
to report incorrect patch level as shown below:

[   10.708841] microcode: CPU0: new patch_level=0x0600063d
[   10.714256] microcode: CPU1: patch_level=0x06000626
[   10.719345] microcode: CPU2: patch_level=0x06000626
[   10.748095] microcode: CPU2: new patch_level=0x0600063d
[   10.753365] microcode: CPU3: patch_level=0x06000626
[   10.758264] microcode: CPU4: patch_level=0x06000626
[   10.786999] microcode: CPU4: new patch_level=0x0600063d

Actually, this is collect_cpu_info_amd()'s normal operation and shows
that there's no need to apply a microcode patch on the odd core since
the even core's ucode has been updated.

Hmm, I think Boris is right, above messages are just logging what
happened during µcode update. I think the patch_level in "CPU1:
patch_level=0x06000626" is based on c->microcode which is updated
shortly after this message was printed.

I assume with your patch, above message won't look different but just
the contents in /sys/devices/system/cpu/cpu1/microcode/version will
show the correct version, right?


Andreas

Yes, the message in dmesg is still showing the same. Only the sysfs... 
version is now fixed.


Suravee

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V4 0/3] perf/x86/amd: AMD Family 16h Data Breakpoint Extensions

2013-09-30 Thread Suravee Suthikulpanit

On 4/29/2013 7:30 AM, Oleg Nesterov wrote:

On 04/29, Ingo Molnar wrote:

* Oleg Nesterov  wrote:

Obviously I can't ack the changes in this area, but to me the whole
series looks fine.

Thanks Oleg - can I add your Reviewed-by tags?

Yes, sure, thanks,

Reviewed-by: Oleg Nesterov 


Hi All,

I am following up with the status of this patch as I have not seen it 
merge upstream yet.
I was working with Jacob, and I can help following up with any pending 
issues.


Thank you,

Suravee

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] Change IBS PMU to use perf_hw_context

2012-12-18 Thread Suravee Suthikulpanit
000100
20 :     
   0100
21 :     
   0100
22 :     
   0100
23 :     
   0100
24 :     
   0100
25 :     
   0100
26 :     
   0100
27 :     
   0100
28 :     
   0100
29 :     
   0100
30 :     
   0100
31 : 00034d89 811370cc 0012  
0008 88082592bc58 00082592bc58 0100


Suravee


On Mon, 2012-12-17 at 10:44 +0100, Robert Richter wrote:
> On 16.12.12 10:04:10, Ingo Molnar wrote:
> > 
> > * suravee.suthikulpa...@amd.com  wrote:
> > 
> > > From: Suravee Suthikulpanit 
> > > 
> > > Currently, the AMD IBS PMU initialize pmu.task_ctx_nr to 
> > > perf_invalid_context which only allows IBS to be running only 
> > > in system-wide mode (e.g. perf record -a). IBS hardware is 
> > > available in each core and should be per-context.  This patch 
> > > modifies the task_ctx_nr to use the perf_hw_context (default) 
> > > instead.
> > 
> > I'm wondering how extensively was it tested/verified that it's 
> > safe to enable IBS in per context mode as well, and that the 
> > profiling results are precise and accurate?
> 
> From the implementation's point of view this is very similar to hw
> perf counters. I wouldn't expect any issues here. Since IBS can be
> immediatly started/stopped and there is no caching, there won't be any
> incomming sample that is not related to that context.
> 
> The only potential problem I see could be a security risk in a way
> that an IBS sample might expose data related to other contexts such as
> cache information. This is similar to uncore/northbridge events so I
> don't think this is an issue, but we might want to evaluate this.
> 
> > We never used the IBS hardware in this fashion before, so some 
> > extra care is prudent - and traces of that extra care should be 
> > visible in the changelog as well.
> 
> Yeah, a comparison of numbers for IBS and hw counter (-e r076:p,r076
> and -e r0C1:p,r0C1) in per-context mode would be useful here.
> 
> -Robert
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] iommu/amd: Add workaround for ERBT1312

2013-04-18 Thread Suravee Suthikulpanit

On 4/18/2013 1:35 PM, Joerg Roedel wrote:

On Thu, Apr 18, 2013 at 11:59:58AM -0500, Suthikulpanit, Suravee wrote:

One last concern I have for this patch is the case when we re-enable
the interrupt, then another interrupt happens while we processing
the log and set the bit.  If the interrupt thread doesn't check this
right before the thread exits the handler.  We could still end up
leaving the interrupt disabled.

That can't happen, the patch checks whether the bit is really 0 and then
it processes the event/ppr-log entries. If any new entry is queued while
we process the logs another interrupt will be fired and the irq-thread
will run again. So we will not miss any log entry.
According to the "kernel/irq/handle.c:irq_wake_thread()", I thought that 
for the threaded IRQ, if the system getting a new interrupt from the 
device while the thread is running, it will just return and do nothing.


Suravee


Joerg






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] iommu/amd: IOMMU Error Reporting/Handling/Filtering

2013-06-03 Thread Suravee Suthikulpanit

Ping

On 5/22/2013 2:15 PM, suravee.suthikulpa...@amd.com wrote:

From: Suravee Suthikulpanit 

This patch set implements framework for handling errors reported via IOMMU
event log. It also implements mechanism to filter/suppress error messages when
IOMMU hardware generates large amount event logs, which is often caused by
devices performing invalid operations or from misconfiguring IOMMU hardware
(e.g. IO_PAGE_FAULT and INVALID_DEVICE_QEQUEST").

DEVICE vs IOMMU ERRORS:
===
Event types in AMD IOMMU event log can be categorized as:
 - IOMMU error : An error which is specific to IOMMU hardware
 - Device error: An error which is specific to a device
 - Non-error   : Miscelleneous events which are not classified as errors.
This patch set implements frameworks for handling "IOMMU error" and "device 
error".
For IOMMU error, the driver will log the event in dmesg and panic since the 
IOMMU
hardware is no longer functioning. For device error, the driver will decode and
log the error in dmesg based on the error logging level specified at boot time.

ERROR LOGGING LEVEL:

The filtering framework introduce 3 levels of event logging,
"AMD_IOMMU_LOG_[DEFAULT|VERBOSE|DEBUG]".  Users can specify the level
via a new boot option "amd_iommu_log=[default|verbose|debug]".
 - default: Each error message is truncated. Filtering is enabled.
 - verbose: Output detail error message. Filtering is enabled.
 - debug  : Output detail error message. Filtering is disabled.

ERROR THRESHOLD LEVEL:
==
Error threshold is used by the log filtering logic to determine when to suppress
the errors from a particular device. The threshold is defined as "the number of 
errors
(X) over a specified period (Y sec)". When the threshold is reached, IOMMU 
driver will
suppress subsequent error messages from the device for a predefined period (Z 
sec).
X, Y, and Z is currently hard-coded to 10 errors, 5 sec, and 30 sec.

DATA STRUCTURE:
===
A new structure "struct dte_err_info" is added. It contains error information
specific to each device table entry (DTE). The structure is allocated 
dynamically
per DTE when IOMMU driver handle device error for the first time.

ERROR STATES and LOG FILTERING:

The filtering framework define 3 device error states "NONE", "PROBATION" and 
"SUPPRESS".
  1. From IOMMU driver intialization, all devices are in DEV_ERR_NONE state.
  2. During interupt handling, IOMMU driver processes each entry in the event 
log.
  3. If an entry is device error, the driver tags DTE with DEV_ERR_PROBATION and
 report error via via dmesg.
  4. For non-debug mode, if the device threshold is reached, the device is 
moved into
 DEV_ERR_SUPPRESS state in which all error messages are suppressed.
  5. After the suppress period has passed, the driver put the device in 
probation state,
 and errors are reported once again. If the device continues to generate 
errors,
 it will be re-suppress once the next threshold is reached.

EXAMPLE OUTPUT:
===
AMD-Vi: Event=IO_PAGE_FAULT dev=3:0.0 dom=0x1b addr=0x97040 flg=N Ex Sup M P W 
Pm Ill Ta
AMD-Vi: Event=IO_PAGE_FAULT dev=3:0.0 dom=0x1b addr=0x97070 flg=N Ex Sup M P W 
Pm Ill Ta
AMD-Vi: Event=IO_PAGE_FAULT dev=3:0.0 dom=0x1b addr=0x97060 flg=N Ex Sup M P W 
Pm Ill Ta
AMD-Vi: Event=IO_PAGE_FAULT dev=3:0.0 dom=0x1b addr=0x4970 flg=N Ex Sup M P W 
Pm Ill Ta
AMD-Vi: Event=IO_PAGE_FAULT dev=3:0.0 dom=0x1b addr=0x98840 flg=N Ex Sup M P W 
Pm Ill Ta
AMD-Vi: Event=IO_PAGE_FAULT dev=3:0.0 dom=0x1b addr=0x98870 flg=N Ex Sup M P W 
Pm Ill Ta
AMD-Vi: Event=IO_PAGE_FAULT dev=3:0.0 dom=0x1b addr=0x98860 flg=N Ex Sup M P W 
Pm Ill Ta
AMD-Vi: Event=IO_PAGE_FAULT dev=3:0.0 dom=0x1b addr=0x4980 flg=N Ex Sup M P W 
Pm Ill Ta
AMD-Vi: Event=IO_PAGE_FAULT dev=3:0.0 dom=0x1b addr=0x99040 flg=N Ex Sup M P W 
Pm Ill Ta
AMD-Vi: Event=IO_PAGE_FAULT dev=3:0.0 dom=0x1b addr=0x99060 flg=N Ex Sup M P W 
Pm Ill Ta
AMD-Vi: Warning: IOMMU error threshold (10) reached for device=3:0.0. Suppress 
for 30 secs.!!!

Suravee Suthikulpanit (3):
   iommu/amd: Adding amd_iommu_log cmdline option
   iommu/amd: Add error handling/reporting/filtering logic
   iommu/amd: Remove old event printing logic

  Documentation/kernel-parameters.txt |   10 +
  drivers/iommu/Makefile  |2 +-
  drivers/iommu/amd_iommu.c   |   85 +---
  drivers/iommu/amd_iommu_fault.c |  368 +++
  drivers/iommu/amd_iommu_init.c  |   19 ++
  drivers/iommu/amd_iommu_proto.h |6 +
  drivers/iommu/amd_iommu_types.h |   16 ++
  7 files changed, 426 insertions(+), 80 deletions(-)
  create mode 100644 drivers/iommu/amd_iommu_fault.c




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to

Re: [GIT PULL] perf changes for v3.11

2013-07-03 Thread Suravee Suthikulpanit

On 7/3/2013 2:55 AM, Peter Zijlstra wrote:

On Tue, Jul 02, 2013 at 05:50:29PM -0700, Linus Torvalds wrote:

On Mon, Jul 1, 2013 at 2:03 AM, Ingo Molnar  wrote:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
perf-core-for-linus

Kernel improvements:

  * AMD IOMMU uncore PMU support by Suravee Suthikulpanit001a

This one prints a really annoying error message if you're not on an
AMD platform:

+   if (!amd_iommu_pc_supported()) {
+   pr_err("perf: amd_iommu PMU not installed. No support!\n");
+   return -ENODEV;
+   }

and you know what? That's not acceptable. It damn well is *not* an
error to not have an AMD IOMMU.

It should - at most - be a pr_info(). Maybe nothing at all. "pr_err()"
is just totally out of line.

Quite; it prints enough stuff when it does find one so I'm all for
scrapping that one print when it doesn't find it.

Sorry for not seeing that; when I initially read that code I thought it was for
the case where the hardware was expected to have the device but we couldn't
find it for some weird reason.

---
Subject: perf, amd: Do not print an error when the device is not present

As Linus said its not an error to not have an AMD IOMMU; esp. when you're not
even running on an AMD platform.

Requested-by: Linus Torvalds 
Signed-off-by: Peter Zijlstra 
---
  arch/x86/kernel/cpu/perf_event_amd_iommu.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd_iommu.c 
b/arch/x86/kernel/cpu/perf_event_amd_iommu.c
index 0db655e..639d128 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_iommu.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_iommu.c
@@ -491,10 +491,8 @@ static struct perf_amd_iommu __perf_iommu = {
  static __init int amd_iommu_pc_init(void)
  {
/* Make sure the IOMMU PC resource is available */
-   if (!amd_iommu_pc_supported()) {
-   pr_err("perf: amd_iommu PMU not installed. No support!\n");
+   if (!amd_iommu_pc_supported())
return -ENODEV;
-   }
  
  	_init_perf_amd_iommu(&__perf_iommu, "amd_iommu");
  

Linus, sorry for inconvenience.  This is a mistake in my part.

Peter, thank you for sending out the patch quickly.  I have checked the 
patch and this is okay.


Suravee

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2 v2] perf/x86/amd: IOMMU Performance Counter Support

2013-05-13 Thread Suravee Suthikulpanit
This is the rework of the original patch set here:

http://lists.linuxfoundation.org/pipermail/iommu/2013-January/005075.html

These patches implement the AMD IOMMU Performance Counter functionality
via custom perf PMU and implement static counting for various IOMMU
translations.

1) Extend the AMD IOMMU initialization to include performance
   counter enablement.

2) The perf AMD IOMMU PMU to manage performance counters, which
   interface with the AMD IOMMU core driver.

The command-line, to invoke the iommuv2 PMU, is:

perf stat -e amd_iommu/config=[data],config1=[data]/{u,r} [command]

The IOMMU performance counter support is available starting in the 
AMD family15h model 0x30.

For information regarding IOMMU performance counter configuration,
please see the AMD IOMMU v2.5 specification.

Steven L Kinney (2):
  IOMMU/AMD: Adding IOMMUv2 PC resource management
  IOMMU/AMD: IOMMUV2 PC PERF uncore PMU implementation

 arch/x86/kernel/cpu/Makefile |2 +-
 arch/x86/kernel/cpu/perf_event_amd_iommuv2.c |  409 ++
 arch/x86/kernel/cpu/perf_event_amd_iommuv2.h |   55 
 drivers/iommu/amd_iommu_init.c   |  116 +++-
 drivers/iommu/amd_iommu_proto.h  |7 +
 drivers/iommu/amd_iommu_types.h  |   12 +-
 6 files changed, 594 insertions(+), 7 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/perf_event_amd_iommuv2.c
 create mode 100644 arch/x86/kernel/cpu/perf_event_amd_iommuv2.h

-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out

2013-01-21 Thread Suravee Suthikulpanit
Udo, 

I am trying to debug the issue but need to check one thing on your
system.  Would you please try the following and check the output value
on your system?

# setpci -s 00:00.02 F0.w=90
# setpci -s 00:00.02 F4.w


Thank you,

Suravee


On Mon, 2013-01-21 at 10:04 -0600, Jacob Shin wrote:
> On Sun, Jan 20, 2013 at 12:48:28PM +0100, Borislav Petkov wrote:
> > On Sun, Jan 20, 2013 at 12:40:11PM +0100, Jörg Rödel wrote:
> > > Yes, the BIOS vendor can fix this issue. They need to disable NB clock
> > > gating for the IOMMU.
> > 
> > Right, Udo, you can try Gigabyte first.
> > 
> > Btw, can't we add a quirk to disable NB clock gating? Maybe Boris and
> > Jacob could help. CCed.
> 
> Hi, yes we will try and reproduce the NB clock gating issue on our
> end and submit a patch ASAP.
> 
> And Boris P., I think your IOAPIC not in IVRS issue we've also seen
> something similar recently (on Xen), so we'll atempt to tackle that
> one too afterwards.
> 
> -Jacob
> 
> > 
> > Guys, the error description is at
> > http://marc.info/?l=linux-kernel&m=135867802432660
> > 
> > Thanks.
> > 
> > -- 
> > Regards/Gruss,
> > Boris.
> > 
> > Sent from a fat crate under my desk. Formatting is fine.
> > --
> > 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out

2013-01-23 Thread Suravee Suthikulpanit

On 1/23/2013 8:19 AM, Udo van den Heuvel wrote:

On 2013-01-23 00:29, Suravee Suthikulanit wrote:

I sent out a patch
(http://marc.info/?l=linux-kernel&m=135889686523524&w=2) which should
implement
the workaround for AMD processor family15h model 10-1Fh erratum 746 in
the IOMMU driver.
In your case, the output from "setpci -s 00:00.02 F4.w" is "0050" which
tells me that BIOS doesn't
implement the work around. After patching, you should see the following
message in "dmesg".

"AMD-Vi: Applying erratum 746 for IOMMU at :00:00.2"

Thanks!
I'll check for that after these messages.


The following patch slightly modify
the code to always issue "COMPLETION_WAIT" after every command.  This
should help increasing the chance of reproducing
the issue.

Should I test with these two patches together?
Or should I apply the first one first and then see what the second can help?
Please try the first one first.  If the issue doesn't reproduce, you can 
use the second patch to try to trigger it.


Thank you,

Suravee



Kind regards,
Udo




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out

2013-01-23 Thread Suravee Suthikulpanit

On 1/23/2013 8:23 AM, Udo van den Heuvel wrote:


On 2013-01-23 00:29, Suravee Suthikulanit wrote:

message in "dmesg".

"AMD-Vi: Applying erratum 746 for IOMMU at :00:00.2"


This is expected.

Regards,

Suravee


[1.091733] AMD-Vi: Found IOMMU at :00:00.2 cap 0x40

I assume that is correct.

Kind regards,
Udo




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] AMD Family15h Model10-1Fh erratum 746 Workaround

2013-01-23 Thread Suravee Suthikulpanit

On 1/23/2013 1:06 AM, Joerg Roedel wrote:


On Tue, Jan 22, 2013 at 05:19:10PM -0600, Suthikulpanit, Suravee wrote:

From: Suravee Suthikulpanit 
@@ -1171,6 +1195,8 @@ static int iommu_init_pci(struct amd_iommu *iommu)
for (i = 0; i < 0x83; i++)
iommu->stored_l2[i] = iommu_read_l2(iommu, i);
}
+   
+   amd_iommu_apply_erratum_746(iommu);

This will also be applied to RD890 IOMMUs, right? This workaround should
be limited to Trinity IOMMUs.


Inside the function, it checks the family and model number.  Only Trinitiy is 
affected.

Suravee


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] Change IBS PMU to use perf_hw_context

2013-01-16 Thread Suravee Suthikulpanit
Hi,

I am following up with this patch. Please let me know if you would like
me to provide any more data or verifications.

Thank you,

Suravee

On Tue, 2012-12-18 at 16:54 -0600, Suravee Suthikulpanit wrote:
> Ingo, Robert
> 
> I am including a set of output from "perf report" to help validating IBS in 
> per-process mode. 
> In this experiment I ran a couple test cases:
> 
> case 1. perf record -e cycles   (baseline per-process mode w/ regular 
> counter)
> case 2. perf record -a -e cycles:p  (baseline system-wide mode w/ IBS)
> case 3. perf record -e cycles:p (the proposed per-process mode w/IBS)
> 
> In all 3 test cases, the target application (classic) are showing about 27K 
> samples.
> I am also including the IBS OP MSRs (0xc00110[33-3a]) snapshots on all 32 
> cores 
> (using rdmsr tools) from case 2 and 3 above.
> 
> 
> CASE1:
> 
> # 
> # captured on: Tue Dec 18 16:32:43 2012
> # hostname : sos-dev02
> # os release : 3.7.0-IBS+
> # perf version : 3.7.rc8.g805f38
> # arch : x86_64
> # nrcpus online : 32
> # nrcpus avail : 32
> # cpudesc : AMD Eng Sample, 1S228145TGG54_31/22/20_2/16
> # cpuid : AuthenticAMD,21,2,0
> # total memory : 32863836 kB
> # cmdline : /sandbox/kernels/suravee/tools/perf/perf record -e cycles taskset 
> -c 31 src/classic 
> # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 
> 0x0, excl_usr = 0, excl_kern = 0, excl_host = 0, excl_guest = 1, precise_ip = 
> 0, id = { 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 
> 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 
> 226, 227, 228, 229 }
> # HEADER_CPU_TOPOLOGY info available, use -I to display
> # HEADER_NUMA_TOPOLOGY info available, use -I to display
> # pmu mappings: cpu = 4, software = 1, tracepoint = 2, ibs_fetch = 6, ibs_op 
> = 7, breakpoint = 5
> # 
> #
> # Samples: 27K of event 'cycles'
> # Event count (approx.): 20938245323
> #
> # Overhead  Samples  Command  Shared Object   
> Symbol
> #   ...  ...  .  
> ...
> #
> 99.16%26927  classic  classic[.] multiply_matrices()  
> <--- TARGET APP 
>  0.32%   78  classic  libc-2.15.so   [.] random   
>   
>  0.10%   23  classic  libc-2.15.so   [.] random_r 
>   
>  0.07%   16  classic  classic[.] 
> initialize_matrices()  
>  0.04%   10  classic  [kernel.kallsyms]  [k] ttwu_do_wakeup   
>   
>  0.03%9  classic  [kernel.kallsyms]  [k] clear_page_c 
>   
>  0.02%   11  classic  [kernel.kallsyms]  [k] 
> native_write_msr_safe  
>  0.02%5  classic  libc-2.15.so   [.] rand 
>   
>  0.02%2  classic  ld-2.15.so [.] 0xa456   
>   
> 
> 
> CASE 2:
> 
> # 
> # captured on: Tue Dec 18 16:11:35 2012
> # hostname : sos-dev02
> # os release : 3.7.0-IBS+
> # perf version : 3.7.rc8.g805f38
> # arch : x86_64
> # nrcpus online : 32
> # nrcpus avail : 32
> # cpudesc : AMD Eng Sample, 1S228145TGG54_31/22/20_2/16
> # cpuid : AuthenticAMD,21,2,0
> # total memory : 32863836 kB
> # cmdline : /sandbox/kernels/suravee/tools/perf/perf record -a -e cycles:p 
> taskset -c 31 src/classic 
> # event : name = cycles:p, type = 0, config = 0x0, config1 = 0x0, config2 = 
> 0x0, excl_usr = 0, excl_kern = 0, excl_host = 0, excl_guest = 0, precise_ip = 
> 1, id = { 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 
> 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98 }
> # HEADER_CPU_TOPOLOGY info available, use -I to display
> # HEADER_NUMA_TOPOLOGY info available, use -I to display
> # pmu mappings: cpu = 4, software = 1, tracepoint = 2, ibs_fetch = 6, ibs_op 
> = 7, breakpoint = 5
> # 
> #
> # Samples: 189K of event 'cycles:p'
> # Event count (approx.): 40504131338
> #
> # Overhead  Samples  Command Shared Object
>Symbol
> #   ...  ...    
> ...
> #
> 51.07%26959  classic  classic   
> [.] multiply_matrices()   <-- TARGET APP 
> 35.39%   131620  swapp

Re: [PATCH] kvm: svm: fix unsigned compare less than zero comparison

2016-09-19 Thread Suravee Suthikulpanit

Hi,

On 9/19/16 13:11, Colin King wrote:

From: Colin Ian King 

vm_data->avic_vm_id is a u32, so the check for a error
return (less than zero) such as -EAGAIN from
avic_get_next_vm_id currently has no effect whatsoever.
Fix this by using a temporary int for the comparison
and assign vm_data->avic_vm_id to this. I used an explicit
u32 cast in the assignment to show why vm_data->avic_vm_id
cannot be used in the assign/compare steps.

Signed-off-by: Colin Ian King 
---
 arch/x86/kvm/svm.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1b66c5a..2ca66aa 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1419,7 +1419,7 @@ static void avic_vm_destroy(struct kvm *kvm)
 static int avic_vm_init(struct kvm *kvm)
 {
unsigned long flags;
-   int err = -ENOMEM;
+   int vm_id, err = -ENOMEM;
struct kvm_arch *vm_data = &kvm->arch;
struct page *p_page;
struct page *l_page;
@@ -1427,9 +1427,10 @@ static int avic_vm_init(struct kvm *kvm)
if (!avic)
return 0;

-   vm_data->avic_vm_id = avic_get_next_vm_id();
-   if (vm_data->avic_vm_id < 0)
-   return vm_data->avic_vm_id;
+   vm_id = avic_get_next_vm_id();
+   if (vm_id < 0)
+   return vm_id;
+   vm_data->avic_vm_id = (u32)vm_id;

/* Allocating physical APIC ID table (4KB) */
p_page = alloc_page(GFP_KERNEL);



Thanks for catching this.
Suravee


Re: [PART2 PATCH v7 00/12] iommu/AMD: Introduce IOMMU AVIC support

2016-09-02 Thread Suravee Suthikulpanit

Thanks All. Please let me know if you need anything else from my side.

Suravee

On 9/2/16 21:05, Joerg Roedel wrote:

Hi Paolo,

On Fri, Sep 02, 2016 at 12:46:28PM +0200, Paolo Bonzini wrote:

Joerg, if there's no other issues, could you apply the first 9 patches
to a branch based on 4.8-rc1 or similar, so that I can pull it into the
KVM tree?


Sure, I was actually waiting for your Acked-By to put all the
patches into the IOMMU tree, but this will work too :) I'll let you know
when I pushed the branch.


Regards,

Joerg



Re: [PART2 PATCH v5 00/12] iommu/AMD: Introduce IOMMU AVIC support

2016-08-08 Thread Suravee Suthikulpanit

Hi Joerg/Radim/Paolo,

Are there any other concerns about this series?

Thanks,
Suravee

On 7/25/16 16:31, Suravee Suthikulpanit wrote:

From: Suravee Suthikulpanit 

CHANGES FROM V4
===
  * Remove the hash look up in the amd_iommu_update_ga() (see patch 7/12).
Instead, use per-vcpu pi_list to keep track of posted interrupts so that
SVM can directly update IOMMU interrupt remapping table entries directly
when rescheduling VCPUs. (see patch 8/12 and 12/12)(per Radim's suggestion)

  * Re-implement AVIC VM-ID using a set of bit-mask to ensure no ID conflict
between active VMs. (see patch 10/12)

  * Verify VM-ID of the hash entry in avic_ga_log_notifier() before referencing
each per-VM data structure. (see patch 11/12) (per Radim's suggestion)

GITHUB
==
Latest git tree can be found at:
http://github.com/ssuthiku/linux.gitavic_part2_v5

OVERVIEW

This patch set is the second part of the two-part patch series to introduce
the new AMD Advance Virtual Interrupt Controller (AVIC) support.

In addition to the SVM AVIC, AMD IOMMU also extends the AVIC capability
to allow I/O interrupts injection directly into the virtualized guest
local APIC without the need for hypervisor intervention.

This patch series introduces a new hardware interrupt remapping (IR) mode
in AMD IOMMU driver, the Guest Virtual APIC (GA) mode. This is in contrast
to the existing "legacy" mode. The IR mode can be specified with a new
kernel parameter:

amd_iommu_guest_ir=[vapic (default) | legacy]

When enabling GA mode, the AMD IOMMU driver will configure device interrupt
remapping in GA mode when possible (i.e. SVM AVIC must be enabled, and if
the interrupt types are supported). Otherewise, the driver will fallback
to using the legacy IR mode.

This patch series also introduces new interfaces between SVM and IOMMU
to allow:
  * SVM driver to communicate to IOMMU with updated vcpu scheduling
information.
  * IOMMU driver to notify SVM driver to schedule vcpu on to physical core
handle IOMMU GALog entry.

DOCUMENTATIONS
==
More information about SVM AVIC can be found in the
AMD64 Architecture Programmer’s Manual Volume 2 - System Programming.

http://support.amd.com/TechDocs/24593.pdf

More information about IOMMU AVIC can be found int the
AMD I/O Virtualization Technology (IOMMU) Specification - Rev 2.62.

http://support.amd.com/TechDocs/48882_IOMMU.pdf

Any feedback and comments are very much appreciated.

Thank you,
Suravee

Suravee Suthikulpanit (12):
  iommu/amd: Detect and enable guest vAPIC support
  iommu/amd: Move and introduce new IRTE-related unions and structures
  iommu/amd: Introduce interrupt remapping ops structure
  iommu/amd: Add support for multiple IRTE formats
  iommu/amd: Detect and initialize guest vAPIC log
  iommu/amd: Adding GALOG interrupt handler
  iommu/amd: Introduce amd_iommu_update_ga()
  iommu/amd: Implements irq_set_vcpu_affinity() hook to setup vapic mode
for pass-through devices
  iommu/amd: Enable vAPIC interrupt remapping mode by default
  svm: Introduces AVIC per-VM ID
  svm: Introduce AMD IOMMU avic_ga_log_notifier
  svm: Implements update_pi_irte hook to setup posted interrupt

 arch/x86/include/asm/kvm_host.h |   2 +
 arch/x86/kvm/svm.c  | 381 +-
 drivers/iommu/amd_iommu.c   | 501 +++-
 drivers/iommu/amd_iommu_init.c  | 183 ++-
 drivers/iommu/amd_iommu_proto.h |   1 +
 drivers/iommu/amd_iommu_types.h | 151 
 include/linux/amd-iommu.h   |  42 +++-
 7 files changed, 1191 insertions(+), 70 deletions(-)



[PART2 PATCH v7 01/12] iommu/amd: Detect and enable guest vAPIC support

2016-08-23 Thread Suravee Suthikulpanit
From: Suravee Suthikulpanit 

This patch introduces a new IOMMU driver parameter, amd_iommu_guest_ir,
which can be used to specify different interrupt remapping mode for
passthrough devices to VM guest:
* legacy: Legacy interrupt remapping (w/ 32-bit IRTE)
* vapic : Guest vAPIC interrupt remapping (w/ GA mode 128-bit IRTE)

Note that in vapic mode, it can also supports legacy interrupt remapping
for non-passthrough devices with the 128-bit IRTE.

Signed-off-by: Suravee Suthikulpanit 
---
 Documentation/kernel-parameters.txt |  9 +
 drivers/iommu/amd_iommu_init.c  | 71 +
 drivers/iommu/amd_iommu_proto.h |  1 +
 drivers/iommu/amd_iommu_types.h | 24 +
 4 files changed, 99 insertions(+), 6 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 17e33db..66c8f4b 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -460,6 +460,15 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
driver will print ACPI tables for AMD IOMMU during
IOMMU initialization.
 
+   amd_iommu_intr= [HW,X86-64]
+   Specifies one of the following AMD IOMMU interrupt
+   remapping modes:
+   legacy - Use legacy interrupt remapping mode.
+   vapic  - Use virtual APIC mode, which allows IOMMU
+to inject interrupts directly into guest.
+This mode requires kvm-amd.avic=1.
+(Default when IOMMU HW support is present.)
+
amijoy.map= [HW,JOY] Amiga joystick support
Map of devices attached to JOY0DAT and JOY1DAT
Format: ,
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 59741ea..c3afd86 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -145,6 +145,8 @@ struct ivmd_header {
 bool amd_iommu_dump;
 bool amd_iommu_irq_remap __read_mostly;
 
+int amd_iommu_guest_ir;
+
 static bool amd_iommu_detected;
 static bool __initdata amd_iommu_disabled;
 static int amd_iommu_target_ivhd_type;
@@ -1258,6 +1260,8 @@ static int __init init_iommu_one(struct amd_iommu *iommu, 
struct ivhd_header *h)
iommu->mmio_phys_end = MMIO_REG_END_OFFSET;
else
iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET;
+   if (((h->efr_attr & (0x1 << IOMMU_FEAT_GASUP_SHIFT)) == 0))
+   amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY;
break;
case 0x11:
case 0x40:
@@ -1265,6 +1269,8 @@ static int __init init_iommu_one(struct amd_iommu *iommu, 
struct ivhd_header *h)
iommu->mmio_phys_end = MMIO_REG_END_OFFSET;
else
iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET;
+   if (((h->efr_reg & (0x1 << IOMMU_EFR_GASUP_SHIFT)) == 0))
+   amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY;
break;
default:
return -EINVAL;
@@ -1488,6 +1494,14 @@ static int iommu_init_pci(struct amd_iommu *iommu)
if (iommu_feature(iommu, FEATURE_PPR) && alloc_ppr_log(iommu))
return -ENOMEM;
 
+   /* Note: We have already checked GASup from IVRS table.
+*   Now, we need to make sure that GAMSup is set.
+*/
+   if (AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) &&
+   !iommu_feature(iommu, FEATURE_GAM_VAPIC))
+   amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY_GA;
+
+
if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE))
amd_iommu_np_cache = true;
 
@@ -1545,16 +1559,24 @@ static void print_iommu_info(void)
dev_name(&iommu->dev->dev), iommu->cap_ptr);
 
if (iommu->cap & (1 << IOMMU_CAP_EFR)) {
-   pr_info("AMD-Vi:  Extended features: ");
+   pr_info("AMD-Vi: Extended features (%#llx):\n",
+   iommu->features);
for (i = 0; i < ARRAY_SIZE(feat_str); ++i) {
if (iommu_feature(iommu, (1ULL << i)))
pr_cont(" %s", feat_str[i]);
}
+
+   if (iommu->features & FEATURE_GAM_VAPIC)
+   pr_cont(" GA_vAPIC");
+
pr_cont("\n");
}
}
-   if (irq_remapping_enabled)
+   if (irq_remapping_enabled) {
pr_info("AMD-Vi: Interrupt remapping enabled\n&qu

[PART2 PATCH v7 03/12] iommu/amd: Introduce interrupt remapping ops structure

2016-08-23 Thread Suravee Suthikulpanit
From: Suravee Suthikulpanit 

Currently, IOMMU support two interrupt remapping table entry formats,
32-bit (legacy) and 128-bit (GA). The spec also implies that it might
support additional modes/formats in the future.

So, this patch introduces the new struct amd_irte_ops, which allows
the same code to work with different irte formats by providing hooks
for various operations on an interrupt remapping table entry.

Suggested-by: Joerg Roedel 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd_iommu.c   | 190 ++--
 drivers/iommu/amd_iommu_types.h |  20 +
 2 files changed, 205 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index ac2962f..5260b42 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3826,11 +3826,12 @@ out:
return index;
 }
 
-static int modify_irte(u16 devid, int index, union irte irte)
+static int modify_irte_ga(u16 devid, int index, struct irte_ga *irte)
 {
struct irq_remap_table *table;
struct amd_iommu *iommu;
unsigned long flags;
+   struct irte_ga *entry;
 
iommu = amd_iommu_rlookup_table[devid];
if (iommu == NULL)
@@ -3841,7 +3842,38 @@ static int modify_irte(u16 devid, int index, union irte 
irte)
return -ENOMEM;
 
spin_lock_irqsave(&table->lock, flags);
-   table->table[index] = irte.val;
+
+   entry = (struct irte_ga *)table->table;
+   entry = &entry[index];
+   entry->lo.fields_remap.valid = 0;
+   entry->hi.val = irte->hi.val;
+   entry->lo.val = irte->lo.val;
+   entry->lo.fields_remap.valid = 1;
+
+   spin_unlock_irqrestore(&table->lock, flags);
+
+   iommu_flush_irt(iommu, devid);
+   iommu_completion_wait(iommu);
+
+   return 0;
+}
+
+static int modify_irte(u16 devid, int index, union irte *irte)
+{
+   struct irq_remap_table *table;
+   struct amd_iommu *iommu;
+   unsigned long flags;
+
+   iommu = amd_iommu_rlookup_table[devid];
+   if (iommu == NULL)
+   return -EINVAL;
+
+   table = get_irq_table(devid, false);
+   if (!table)
+   return -ENOMEM;
+
+   spin_lock_irqsave(&table->lock, flags);
+   table->table[index] = irte->val;
spin_unlock_irqrestore(&table->lock, flags);
 
iommu_flush_irt(iommu, devid);
@@ -3872,6 +3904,134 @@ static void free_irte(u16 devid, int index)
iommu_completion_wait(iommu);
 }
 
+static void irte_prepare(void *entry,
+u32 delivery_mode, u32 dest_mode,
+u8 vector, u32 dest_apicid)
+{
+   union irte *irte = (union irte *) entry;
+
+   irte->val= 0;
+   irte->fields.vector  = vector;
+   irte->fields.int_type= delivery_mode;
+   irte->fields.destination = dest_apicid;
+   irte->fields.dm  = dest_mode;
+   irte->fields.valid   = 1;
+}
+
+static void irte_ga_prepare(void *entry,
+   u32 delivery_mode, u32 dest_mode,
+   u8 vector, u32 dest_apicid)
+{
+   struct irte_ga *irte = (struct irte_ga *) entry;
+
+   irte->lo.val  = 0;
+   irte->hi.val  = 0;
+   irte->lo.fields_remap.guest_mode  = 0;
+   irte->lo.fields_remap.int_type= delivery_mode;
+   irte->lo.fields_remap.dm  = dest_mode;
+   irte->hi.fields.vector= vector;
+   irte->lo.fields_remap.destination = dest_apicid;
+   irte->lo.fields_remap.valid   = 1;
+}
+
+static void irte_activate(void *entry, u16 devid, u16 index)
+{
+   union irte *irte = (union irte *) entry;
+
+   irte->fields.valid = 1;
+   modify_irte(devid, index, irte);
+}
+
+static void irte_ga_activate(void *entry, u16 devid, u16 index)
+{
+   struct irte_ga *irte = (struct irte_ga *) entry;
+
+   irte->lo.fields_remap.valid = 1;
+   modify_irte_ga(devid, index, irte);
+}
+
+static void irte_deactivate(void *entry, u16 devid, u16 index)
+{
+   union irte *irte = (union irte *) entry;
+
+   irte->fields.valid = 0;
+   modify_irte(devid, index, irte);
+}
+
+static void irte_ga_deactivate(void *entry, u16 devid, u16 index)
+{
+   struct irte_ga *irte = (struct irte_ga *) entry;
+
+   irte->lo.fields_remap.valid = 0;
+   modify_irte_ga(devid, index, irte);
+}
+
+static void irte_set_affinity(void *entry, u16 devid, u16 index,
+ u8 vector, u32 dest_apicid)
+{
+   union irte *irte = (union irte *) entry;
+
+   irte->fields.vector = vector;
+   irte->fields.destination = dest_apicid;
+   modify_irte(devid, index, irte);
+}
+
+static void irte_ga_set_affinity(void *entry, u16 devid, u16 index,
+u8 vect

[PART2 PATCH v7 02/12] iommu/amd: Move and introduce new IRTE-related unions and structures

2016-08-23 Thread Suravee Suthikulpanit
From: Suravee Suthikulpanit 

Move existing unions and structs for accessing/managing IRTE to a proper
header file. This is mainly to simplify variable declarations in subsequent
patches.

Besides, this patch also introduces new struct irte_ga for the new
128-bit IRTE format.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd_iommu.c   | 28 ---
 drivers/iommu/amd_iommu_types.h | 76 +
 2 files changed, 76 insertions(+), 28 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 634f636..ac2962f 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3693,34 +3693,6 @@ EXPORT_SYMBOL(amd_iommu_device_info);
  *
  */
 
-union irte {
-   u32 val;
-   struct {
-   u32 valid   : 1,
-   no_fault: 1,
-   int_type: 3,
-   rq_eoi  : 1,
-   dm  : 1,
-   rsvd_1  : 1,
-   destination : 8,
-   vector  : 8,
-   rsvd_2  : 8;
-   } fields;
-};
-
-struct irq_2_irte {
-   u16 devid; /* Device ID for IRTE table */
-   u16 index; /* Index into IRTE table*/
-};
-
-struct amd_ir_data {
-   struct irq_2_irte   irq_2_irte;
-   union irte  irte_entry;
-   union {
-   struct msi_msg  msi_entry;
-   };
-};
-
 static struct irq_chip amd_ir_chip;
 
 #define DTE_IRQ_PHYS_ADDR_MASK (((1ULL << 45)-1) << 6)
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 25f939b..c37c5c4 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -22,6 +22,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -706,4 +707,79 @@ enum amd_iommu_intr_mode_type {
 x == AMD_IOMMU_GUEST_IR_LEGACY_GA)
 
 #define AMD_IOMMU_GUEST_IR_VAPIC(x)(x == AMD_IOMMU_GUEST_IR_VAPIC)
+
+union irte {
+   u32 val;
+   struct {
+   u32 valid   : 1,
+   no_fault: 1,
+   int_type: 3,
+   rq_eoi  : 1,
+   dm  : 1,
+   rsvd_1  : 1,
+   destination : 8,
+   vector  : 8,
+   rsvd_2  : 8;
+   } fields;
+};
+
+union irte_ga_lo {
+   u64 val;
+
+   /* For int remapping */
+   struct {
+   u64 valid   : 1,
+   no_fault: 1,
+   /* -- */
+   int_type: 3,
+   rq_eoi  : 1,
+   dm  : 1,
+   /* -- */
+   guest_mode  : 1,
+   destination : 8,
+   rsvd: 48;
+   } fields_remap;
+
+   /* For guest vAPIC */
+   struct {
+   u64 valid   : 1,
+   no_fault: 1,
+   /* -- */
+   ga_log_intr : 1,
+   rsvd1   : 3,
+   is_run  : 1,
+   /* -- */
+   guest_mode  : 1,
+   destination : 8,
+   rsvd2   : 16,
+   ga_tag  : 32;
+   } fields_vapic;
+};
+
+union irte_ga_hi {
+   u64 val;
+   struct {
+   u64 vector  : 8,
+   rsvd_1  : 4,
+   ga_root_ptr : 40,
+   rsvd_2  : 12;
+   } fields;
+};
+
+struct irte_ga {
+   union irte_ga_lo lo;
+   union irte_ga_hi hi;
+};
+
+struct irq_2_irte {
+   u16 devid; /* Device ID for IRTE table */
+   u16 index; /* Index into IRTE table*/
+};
+
+struct amd_ir_data {
+   struct irq_2_irte irq_2_irte;
+   union irte irte_entry;
+   struct msi_msg msi_entry;
+};
+
 #endif /* _ASM_X86_AMD_IOMMU_TYPES_H */
-- 
1.9.1



[PART2 PATCH v7 04/12] iommu/amd: Add support for multiple IRTE formats

2016-08-23 Thread Suravee Suthikulpanit
From: Suravee Suthikulpanit 

This patch enables support for the new 128-bit IOMMU IRTE format,
which can be used for both legacy and vapic interrupt remapping modes.
It replaces the existing operations on IRTE, which can only support
the older 32-bit IRTE format, with calls to the new struct amd_irt_ops.

It also provides helper functions for setting up, accessing, and
updating interrupt remapping table entries in different mode.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd_iommu.c   | 72 +++--
 drivers/iommu/amd_iommu_init.c  |  2 ++
 drivers/iommu/amd_iommu_types.h |  1 -
 3 files changed, 50 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 5260b42..52e1e4a 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3714,8 +3714,6 @@ static void set_dte_irq_entry(u16 devid, struct 
irq_remap_table *table)
amd_iommu_dev_table[devid].data[2] = dte;
 }
 
-#define IRTE_ALLOCATED (~1U)
-
 static struct irq_remap_table *get_irq_table(u16 devid, bool ioapic)
 {
struct irq_remap_table *table = NULL;
@@ -3761,13 +3759,18 @@ static struct irq_remap_table *get_irq_table(u16 devid, 
bool ioapic)
goto out;
}
 
-   memset(table->table, 0, MAX_IRQS_PER_TABLE * sizeof(u32));
+   if (!AMD_IOMMU_GUEST_IR_GA(amd_iommu_guest_ir))
+   memset(table->table, 0,
+  MAX_IRQS_PER_TABLE * sizeof(u32));
+   else
+   memset(table->table, 0,
+  (MAX_IRQS_PER_TABLE * (sizeof(u64) * 2)));
 
if (ioapic) {
int i;
 
for (i = 0; i < 32; ++i)
-   table->table[i] = IRTE_ALLOCATED;
+   iommu->irte_ops->set_allocated(table, i);
}
 
irq_lookup_table[devid] = table;
@@ -3793,6 +3796,10 @@ static int alloc_irq_index(u16 devid, int count)
struct irq_remap_table *table;
unsigned long flags;
int index, c;
+   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
+
+   if (!iommu)
+   return -ENODEV;
 
table = get_irq_table(devid, false);
if (!table)
@@ -3804,14 +3811,14 @@ static int alloc_irq_index(u16 devid, int count)
for (c = 0, index = table->min_index;
 index < MAX_IRQS_PER_TABLE;
 ++index) {
-   if (table->table[index] == 0)
+   if (!iommu->irte_ops->is_allocated(table, index))
c += 1;
else
c = 0;
 
if (c == count) {
for (; c != 0; --c)
-   table->table[index - c + 1] = IRTE_ALLOCATED;
+   iommu->irte_ops->set_allocated(table, index - c 
+ 1);
 
index -= count - 1;
goto out;
@@ -3897,7 +3904,7 @@ static void free_irte(u16 devid, int index)
return;
 
spin_lock_irqsave(&table->lock, flags);
-   table->table[index] = 0;
+   iommu->irte_ops->clear_allocated(table, index);
spin_unlock_irqrestore(&table->lock, flags);
 
iommu_flush_irt(iommu, devid);
@@ -3987,6 +3994,7 @@ static void irte_ga_set_affinity(void *entry, u16 devid, 
u16 index,
modify_irte_ga(devid, index, irte);
 }
 
+#define IRTE_ALLOCATED (~1U)
 static void irte_set_allocated(struct irq_remap_table *table, int index)
 {
table->table[index] = IRTE_ALLOCATED;
@@ -4116,19 +4124,17 @@ static void irq_remapping_prepare_irte(struct 
amd_ir_data *data,
 {
struct irq_2_irte *irte_info = &data->irq_2_irte;
struct msi_msg *msg = &data->msi_entry;
-   union irte *irte = &data->irte_entry;
struct IO_APIC_route_entry *entry;
+   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
+
+   if (!iommu)
+   return;
 
data->irq_2_irte.devid = devid;
data->irq_2_irte.index = index + sub_handle;
-
-   /* Setup IRTE for IOMMU */
-   irte->val = 0;
-   irte->fields.vector  = irq_cfg->vector;
-   irte->fields.int_type= apic->irq_delivery_mode;
-   irte->fields.destination = irq_cfg->dest_apicid;
-   irte->fields.dm  = apic->irq_dest_mode;
-   irte->fields.valid   = 1;
+   iommu->irte_ops->prepare(data->entry, apic->irq_delivery_mode,
+apic->irq_dest_mode, irq_cfg->vector,
+irq_cfg->dest_apicid);
 
switch (info->type) {
case X86_IRQ_ALLOC_TYPE_IOAPIC:
@@ -4184,7 +4190,7 @@ static int irq_remapping_alloc(struct irq_domain *domain, 
unsigned int virq,
 {
struct irq_alloc_info *info = arg;
struct irq_data *irq_data;
-   struc

[PART2 PATCH v7 09/12] iommu/amd: Enable vAPIC interrupt remapping mode by default

2016-08-23 Thread Suravee Suthikulpanit
From: Suravee Suthikulpanit 

Introduce struct iommu_dev_data.use_vapic flag, which IOMMU driver
uses to determine if it should enable vAPIC support, by setting
the ga_mode bit in the device's interrupt remapping table entry.

Currently, it is enabled for all pass-through device if vAPIC mode
is enabled.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd_iommu.c   | 44 +
 drivers/iommu/amd_iommu_init.c  | 12 ++-
 drivers/iommu/amd_iommu_types.h |  2 +-
 3 files changed, 48 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 7aa0c08..8f9e534 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -105,6 +105,7 @@ struct iommu_dev_data {
bool pri_tlp; /* PASID TLB required for
 PPR completions */
u32 errata;   /* Bitmap for errata to apply */
+   bool use_vapic;   /* Enable device to use vapic mode */
 };
 
 /*
@@ -3211,6 +3212,12 @@ static void amd_iommu_detach_device(struct iommu_domain 
*dom,
if (!iommu)
return;
 
+#ifdef CONFIG_IRQ_REMAP
+   if (AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) &&
+   (dom->type == IOMMU_DOMAIN_UNMANAGED))
+   dev_data->use_vapic = 0;
+#endif
+
iommu_completion_wait(iommu);
 }
 
@@ -3236,6 +3243,15 @@ static int amd_iommu_attach_device(struct iommu_domain 
*dom,
 
ret = attach_device(dev, domain);
 
+#ifdef CONFIG_IRQ_REMAP
+   if (AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir)) {
+   if (dom->type == IOMMU_DOMAIN_UNMANAGED)
+   dev_data->use_vapic = 1;
+   else
+   dev_data->use_vapic = 0;
+   }
+#endif
+
iommu_completion_wait(iommu);
 
return ret;
@@ -3983,7 +3999,7 @@ static void free_irte(u16 devid, int index)
 
 static void irte_prepare(void *entry,
 u32 delivery_mode, u32 dest_mode,
-u8 vector, u32 dest_apicid)
+u8 vector, u32 dest_apicid, int devid)
 {
union irte *irte = (union irte *) entry;
 
@@ -3997,13 +4013,14 @@ static void irte_prepare(void *entry,
 
 static void irte_ga_prepare(void *entry,
u32 delivery_mode, u32 dest_mode,
-   u8 vector, u32 dest_apicid)
+   u8 vector, u32 dest_apicid, int devid)
 {
struct irte_ga *irte = (struct irte_ga *) entry;
+   struct iommu_dev_data *dev_data = search_dev_data(devid);
 
irte->lo.val  = 0;
irte->hi.val  = 0;
-   irte->lo.fields_remap.guest_mode  = 0;
+   irte->lo.fields_remap.guest_mode  = dev_data ? dev_data->use_vapic : 0;
irte->lo.fields_remap.int_type= delivery_mode;
irte->lo.fields_remap.dm  = dest_mode;
irte->hi.fields.vector= vector;
@@ -4057,11 +4074,14 @@ static void irte_ga_set_affinity(void *entry, u16 
devid, u16 index,
 u8 vector, u32 dest_apicid)
 {
struct irte_ga *irte = (struct irte_ga *) entry;
+   struct iommu_dev_data *dev_data = search_dev_data(devid);
 
-   irte->hi.fields.vector = vector;
-   irte->lo.fields_remap.destination = dest_apicid;
-   irte->lo.fields_remap.guest_mode = 0;
-   modify_irte_ga(devid, index, irte, NULL);
+   if (!dev_data || !dev_data->use_vapic) {
+   irte->hi.fields.vector = vector;
+   irte->lo.fields_remap.destination = dest_apicid;
+   irte->lo.fields_remap.guest_mode = 0;
+   modify_irte_ga(devid, index, irte, NULL);
+   }
 }
 
 #define IRTE_ALLOCATED (~1U)
@@ -4204,7 +4224,7 @@ static void irq_remapping_prepare_irte(struct amd_ir_data 
*data,
data->irq_2_irte.index = index + sub_handle;
iommu->irte_ops->prepare(data->entry, apic->irq_delivery_mode,
 apic->irq_dest_mode, irq_cfg->vector,
-irq_cfg->dest_apicid);
+irq_cfg->dest_apicid, devid);
 
switch (info->type) {
case X86_IRQ_ALLOC_TYPE_IOAPIC:
@@ -4404,6 +4424,14 @@ static int amd_ir_set_vcpu_affinity(struct irq_data 
*data, void *vcpu_info)
struct amd_ir_data *ir_data = data->chip_data;
struct irte_ga *irte = (struct irte_ga *) ir_data->entry;
struct irq_2_irte *irte_info = &ir_data->irq_2_irte;
+   struct iommu_dev_data *dev_data = search_dev_data(irte_info->devid);
+
+   /* Note:
+* This device has never been set up for guest mode.
+* we should not modify the IRTE
+*/
+   if (!dev_data || !dev_data->use_vapic)
+  

[PART2 PATCH v7 08/12] iommu/amd: Implements irq_set_vcpu_affinity() hook to setup vapic mode for pass-through devices

2016-08-23 Thread Suravee Suthikulpanit
From: Suravee Suthikulpanit 

This patch implements irq_set_vcpu_affinity() function to set up interrupt
remapping table entry with vapic mode for pass-through devices.

In case requirements for vapic mode are not met, it falls back to set up
the IRTE in legacy mode.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd_iommu.c   | 68 ++---
 drivers/iommu/amd_iommu_types.h |  1 +
 include/linux/amd-iommu.h   | 14 +
 3 files changed, 79 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 9f91480..7aa0c08 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3900,7 +3900,8 @@ out:
return index;
 }
 
-static int modify_irte_ga(u16 devid, int index, struct irte_ga *irte)
+static int modify_irte_ga(u16 devid, int index, struct irte_ga *irte,
+ struct amd_ir_data *data)
 {
struct irq_remap_table *table;
struct amd_iommu *iommu;
@@ -3923,6 +3924,8 @@ static int modify_irte_ga(u16 devid, int index, struct 
irte_ga *irte)
entry->hi.val = irte->hi.val;
entry->lo.val = irte->lo.val;
entry->lo.fields_remap.valid = 1;
+   if (data)
+   data->ref = entry;
 
spin_unlock_irqrestore(&table->lock, flags);
 
@@ -4021,7 +4024,7 @@ static void irte_ga_activate(void *entry, u16 devid, u16 
index)
struct irte_ga *irte = (struct irte_ga *) entry;
 
irte->lo.fields_remap.valid = 1;
-   modify_irte_ga(devid, index, irte);
+   modify_irte_ga(devid, index, irte, NULL);
 }
 
 static void irte_deactivate(void *entry, u16 devid, u16 index)
@@ -4037,7 +4040,7 @@ static void irte_ga_deactivate(void *entry, u16 devid, 
u16 index)
struct irte_ga *irte = (struct irte_ga *) entry;
 
irte->lo.fields_remap.valid = 0;
-   modify_irte_ga(devid, index, irte);
+   modify_irte_ga(devid, index, irte, NULL);
 }
 
 static void irte_set_affinity(void *entry, u16 devid, u16 index,
@@ -4058,7 +4061,7 @@ static void irte_ga_set_affinity(void *entry, u16 devid, 
u16 index,
irte->hi.fields.vector = vector;
irte->lo.fields_remap.destination = dest_apicid;
irte->lo.fields_remap.guest_mode = 0;
-   modify_irte_ga(devid, index, irte);
+   modify_irte_ga(devid, index, irte, NULL);
 }
 
 #define IRTE_ALLOCATED (~1U)
@@ -4393,6 +4396,62 @@ static struct irq_domain_ops amd_ir_domain_ops = {
.deactivate = irq_remapping_deactivate,
 };
 
+static int amd_ir_set_vcpu_affinity(struct irq_data *data, void *vcpu_info)
+{
+   struct amd_iommu *iommu;
+   struct amd_iommu_pi_data *pi_data = vcpu_info;
+   struct vcpu_data *vcpu_pi_info = pi_data->vcpu_data;
+   struct amd_ir_data *ir_data = data->chip_data;
+   struct irte_ga *irte = (struct irte_ga *) ir_data->entry;
+   struct irq_2_irte *irte_info = &ir_data->irq_2_irte;
+
+   pi_data->ir_data = ir_data;
+
+   /* Note:
+* SVM tries to set up for VAPIC mode, but we are in
+* legacy mode. So, we force legacy mode instead.
+*/
+   if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir)) {
+   pr_debug("AMD-Vi: %s: Fall back to using intr legacy remap\n",
+__func__);
+   pi_data->is_guest_mode = false;
+   }
+
+   iommu = amd_iommu_rlookup_table[irte_info->devid];
+   if (iommu == NULL)
+   return -EINVAL;
+
+   pi_data->prev_ga_tag = ir_data->cached_ga_tag;
+   if (pi_data->is_guest_mode) {
+   /* Setting */
+   irte->hi.fields.ga_root_ptr = (pi_data->base >> 12);
+   irte->hi.fields.vector = vcpu_pi_info->vector;
+   irte->lo.fields_vapic.guest_mode = 1;
+   irte->lo.fields_vapic.ga_tag = pi_data->ga_tag;
+
+   ir_data->cached_ga_tag = pi_data->ga_tag;
+   } else {
+   /* Un-Setting */
+   struct irq_cfg *cfg = irqd_cfg(data);
+
+   irte->hi.val = 0;
+   irte->lo.val = 0;
+   irte->hi.fields.vector = cfg->vector;
+   irte->lo.fields_remap.guest_mode = 0;
+   irte->lo.fields_remap.destination = cfg->dest_apicid;
+   irte->lo.fields_remap.int_type = apic->irq_delivery_mode;
+   irte->lo.fields_remap.dm = apic->irq_dest_mode;
+
+   /*
+* This communicates the ga_tag back to the caller
+* so that it can do all the necessary clean up.
+*/
+   ir_data->cached_ga_tag = 0;
+   }
+
+   return modify_irte_ga(irte_info->devid, irte_info->index, irte, 
ir_data);
+}
+
 static int amd_ir_set_affinity(struct irq_data *data,
   const struct cpumask 

[PART2 PATCH v7 06/12] iommu/amd: Adding GALOG interrupt handler

2016-08-23 Thread Suravee Suthikulpanit
From: Suravee Suthikulpanit 

This patch adds AMD IOMMU guest virtual APIC log (GALOG) handler.
When IOMMU hardware receives an interrupt targeting a blocking vcpu,
it creates an entry in the GALOG, and generates an interrupt to notify
the AMD IOMMU driver.

At this point, the driver processes the log entry, and notify the SVM
driver via the registered iommu_ga_log_notifier function.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd_iommu.c | 73 +--
 include/linux/amd-iommu.h | 20 +++--
 2 files changed, 87 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 52e1e4a..8df3dbf 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -741,14 +741,74 @@ static void iommu_poll_ppr_log(struct amd_iommu *iommu)
}
 }
 
+#ifdef CONFIG_IRQ_REMAP
+static int (*iommu_ga_log_notifier)(u32);
+
+int amd_iommu_register_ga_log_notifier(int (*notifier)(u32))
+{
+   iommu_ga_log_notifier = notifier;
+
+   return 0;
+}
+EXPORT_SYMBOL(amd_iommu_register_ga_log_notifier);
+
+static void iommu_poll_ga_log(struct amd_iommu *iommu)
+{
+   u32 head, tail, cnt = 0;
+
+   if (iommu->ga_log == NULL)
+   return;
+
+   head = readl(iommu->mmio_base + MMIO_GA_HEAD_OFFSET);
+   tail = readl(iommu->mmio_base + MMIO_GA_TAIL_OFFSET);
+
+   while (head != tail) {
+   volatile u64 *raw;
+   u64 log_entry;
+
+   raw = (u64 *)(iommu->ga_log + head);
+   cnt++;
+
+   /* Avoid memcpy function-call overhead */
+   log_entry = *raw;
+
+   /* Update head pointer of hardware ring-buffer */
+   head = (head + GA_ENTRY_SIZE) % GA_LOG_SIZE;
+   writel(head, iommu->mmio_base + MMIO_GA_HEAD_OFFSET);
+
+   /* Handle GA entry */
+   switch (GA_REQ_TYPE(log_entry)) {
+   case GA_GUEST_NR:
+   if (!iommu_ga_log_notifier)
+   break;
+
+   pr_debug("AMD-Vi: %s: devid=%#x, ga_tag=%#x\n",
+__func__, GA_DEVID(log_entry),
+GA_TAG(log_entry));
+
+   if (iommu_ga_log_notifier(GA_TAG(log_entry)) != 0)
+   pr_err("AMD-Vi: GA log notifier failed.\n");
+   break;
+   default:
+   break;
+   }
+   }
+}
+#endif /* CONFIG_IRQ_REMAP */
+
+#define AMD_IOMMU_INT_MASK \
+   (MMIO_STATUS_EVT_INT_MASK | \
+MMIO_STATUS_PPR_INT_MASK | \
+MMIO_STATUS_GALOG_INT_MASK)
+
 irqreturn_t amd_iommu_int_thread(int irq, void *data)
 {
struct amd_iommu *iommu = (struct amd_iommu *) data;
u32 status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET);
 
-   while (status & (MMIO_STATUS_EVT_INT_MASK | MMIO_STATUS_PPR_INT_MASK)) {
-   /* Enable EVT and PPR interrupts again */
-   writel((MMIO_STATUS_EVT_INT_MASK | MMIO_STATUS_PPR_INT_MASK),
+   while (status & AMD_IOMMU_INT_MASK) {
+   /* Enable EVT and PPR and GA interrupts again */
+   writel(AMD_IOMMU_INT_MASK,
iommu->mmio_base + MMIO_STATUS_OFFSET);
 
if (status & MMIO_STATUS_EVT_INT_MASK) {
@@ -761,6 +821,13 @@ irqreturn_t amd_iommu_int_thread(int irq, void *data)
iommu_poll_ppr_log(iommu);
}
 
+#ifdef CONFIG_IRQ_REMAP
+   if (status & MMIO_STATUS_GALOG_INT_MASK) {
+   pr_devel("AMD-Vi: Processing IOMMU GA Log\n");
+   iommu_poll_ga_log(iommu);
+   }
+#endif
+
/*
 * Hardware bug: ERBT1312
 * When re-enabling interrupt (by writing 1
diff --git a/include/linux/amd-iommu.h b/include/linux/amd-iommu.h
index 2b08e79..465d096 100644
--- a/include/linux/amd-iommu.h
+++ b/include/linux/amd-iommu.h
@@ -168,11 +168,25 @@ typedef void (*amd_iommu_invalidate_ctx)(struct pci_dev 
*pdev, int pasid);
 
 extern int amd_iommu_set_invalidate_ctx_cb(struct pci_dev *pdev,
   amd_iommu_invalidate_ctx cb);
-
-#else
+#else /* CONFIG_AMD_IOMMU */
 
 static inline int amd_iommu_detect(void) { return -ENODEV; }
 
-#endif
+#endif /* CONFIG_AMD_IOMMU */
+
+#if defined(CONFIG_AMD_IOMMU) && defined(CONFIG_IRQ_REMAP)
+
+/* IOMMU AVIC Function */
+extern int amd_iommu_register_ga_log_notifier(int (*notifier)(u32));
+
+#else /* defined(CONFIG_AMD_IOMMU) && defined(CONFIG_IRQ_REMAP) */
+
+static inline int
+amd_iommu_register_ga_log_notifier(int (*notifier)(u32))
+{
+   return 0;
+}
+
+#endif /* defined(CONFIG_AMD_IOMMU) && defined(CONFIG_IRQ_REMAP) */
 
 #endif /* _ASM_X86_AMD_IOMMU_H */
-- 
1.9.1



[PART2 PATCH v7 00/12] iommu/AMD: Introduce IOMMU AVIC support

2016-08-23 Thread Suravee Suthikulpanit
From: Suravee Suthikulpanit 

CHANGES FROM V6
===

Per Radim:
* No longer expose struct amd_ir_data to SVM.
* Introduce struct amd_svm_iommu_ir (amd_ir_data wrapper).
* Fix logic to manage ir_list where we need to remove
  the posted interrupt from the previous ir_list before
  mapping it to a new vcpu. Tested running smp VM with:
  -  Using irqbalance
  -  No irqbalance (manually set /proc/irq/smp_affinity)

Misc:
* 08/12: Only set ga_root_ptr in amd_ir_set_vcpu_affinity().
* 10/12: Fix bug in #define AVIC_GATAG_TO_VCPUID.

GITHUB
==
Latest git tree can be found at:
http://github.com/ssuthiku/linux.gitavic_part2_v7

OVERVIEW

This patch set is the second part of the two-part patch series to introduce
the new AMD Advance Virtual Interrupt Controller (AVIC) support.

In addition to the SVM AVIC, AMD IOMMU also extends the AVIC capability
to allow I/O interrupts injection directly into the virtualized guest
local APIC without the need for hypervisor intervention.

This patch series introduces a new hardware interrupt remapping (IR) mode
in AMD IOMMU driver, the Guest Virtual APIC (GA) mode. This is in contrast
to the existing "legacy" mode. The IR mode can be specified with a new
kernel parameter:

amd_iommu_guest_ir=[vapic (default) | legacy]

When enabling GA mode, the AMD IOMMU driver will configure device interrupt
remapping in GA mode when possible (i.e. SVM AVIC must be enabled, and if
the interrupt types are supported). Otherewise, the driver will fallback
to using the legacy IR mode.

This patch series also introduces new interfaces between SVM and IOMMU
to allow:
  * SVM driver to communicate to IOMMU with updated vcpu scheduling
information.
  * IOMMU driver to notify SVM driver to schedule vcpu on to physical core
handle IOMMU GALog entry.

DOCUMENTATIONS
==
More information about SVM AVIC can be found in the
AMD64 Architecture Programmer’s Manual Volume 2 - System Programming.

http://support.amd.com/TechDocs/24593.pdf

More information about IOMMU AVIC can be found int the
AMD I/O Virtualization Technology (IOMMU) Specification - Rev 2.62.

http://support.amd.com/TechDocs/48882_IOMMU.pdf

Any feedback and comments are very much appreciated.

Thank you,
Suravee

Suravee Suthikulpanit (12):
  iommu/amd: Detect and enable guest vAPIC support
  iommu/amd: Move and introduce new IRTE-related unions and structures
  iommu/amd: Introduce interrupt remapping ops structure
  iommu/amd: Add support for multiple IRTE formats
  iommu/amd: Detect and initialize guest vAPIC log
  iommu/amd: Adding GALOG interrupt handler
  iommu/amd: Introduce amd_iommu_update_ga()
  iommu/amd: Implements irq_set_vcpu_affinity() hook to setup vapic mode
for pass-through devices
  iommu/amd: Enable vAPIC interrupt remapping mode by default
  svm: Introduces AVIC per-VM ID
  svm: Introduce AMD IOMMU avic_ga_log_notifier
  svm: Implements update_pi_irte hook to setup posted interrupt

 Documentation/kernel-parameters.txt |   9 +
 arch/x86/include/asm/kvm_host.h |   2 +
 arch/x86/kvm/svm.c  | 406 --
 drivers/iommu/amd_iommu.c   | 484 +++-
 drivers/iommu/amd_iommu_init.c  | 181 +-
 drivers/iommu/amd_iommu_proto.h |   1 +
 drivers/iommu/amd_iommu_types.h | 149 +++
 include/linux/amd-iommu.h   |  43 +++-
 8 files changed, 1188 insertions(+), 87 deletions(-)

-- 
1.9.1



[PART2 PATCH v7 05/12] iommu/amd: Detect and initialize guest vAPIC log

2016-08-23 Thread Suravee Suthikulpanit
From: Suravee Suthikulpanit 

This patch adds support to detect and initialize IOMMU Guest vAPIC log
(GALOG). By default, it also enable GALog interrupt to notify IOMMU driver
when GA Log entry is created.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd_iommu_init.c  | 112 +---
 drivers/iommu/amd_iommu_types.h |  28 ++
 2 files changed, 133 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index c17febb..156ab4b 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -84,6 +84,7 @@
 #define ACPI_DEVFLAG_LINT1  0x80
 #define ACPI_DEVFLAG_ATSDIS 0x1000
 
+#define LOOP_TIMEOUT   10
 /*
  * ACPI table definitions
  *
@@ -388,6 +389,10 @@ static void iommu_disable(struct amd_iommu *iommu)
iommu_feature_disable(iommu, CONTROL_EVT_INT_EN);
iommu_feature_disable(iommu, CONTROL_EVT_LOG_EN);
 
+   /* Disable IOMMU GA_LOG */
+   iommu_feature_disable(iommu, CONTROL_GALOG_EN);
+   iommu_feature_disable(iommu, CONTROL_GAINT_EN);
+
/* Disable IOMMU hardware itself */
iommu_feature_disable(iommu, CONTROL_IOMMU_EN);
 }
@@ -673,6 +678,99 @@ static void __init free_ppr_log(struct amd_iommu *iommu)
free_pages((unsigned long)iommu->ppr_log, get_order(PPR_LOG_SIZE));
 }
 
+static void free_ga_log(struct amd_iommu *iommu)
+{
+#ifdef CONFIG_IRQ_REMAP
+   if (iommu->ga_log)
+   free_pages((unsigned long)iommu->ga_log,
+   get_order(GA_LOG_SIZE));
+   if (iommu->ga_log_tail)
+   free_pages((unsigned long)iommu->ga_log_tail,
+   get_order(8));
+#endif
+}
+
+static int iommu_ga_log_enable(struct amd_iommu *iommu)
+{
+#ifdef CONFIG_IRQ_REMAP
+   u32 status, i;
+
+   if (!iommu->ga_log)
+   return -EINVAL;
+
+   status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET);
+
+   /* Check if already running */
+   if (status & (MMIO_STATUS_GALOG_RUN_MASK))
+   return 0;
+
+   iommu_feature_enable(iommu, CONTROL_GAINT_EN);
+   iommu_feature_enable(iommu, CONTROL_GALOG_EN);
+
+   for (i = 0; i < LOOP_TIMEOUT; ++i) {
+   status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET);
+   if (status & (MMIO_STATUS_GALOG_RUN_MASK))
+   break;
+   }
+
+   if (i >= LOOP_TIMEOUT)
+   return -EINVAL;
+#endif /* CONFIG_IRQ_REMAP */
+   return 0;
+}
+
+#ifdef CONFIG_IRQ_REMAP
+static int iommu_init_ga_log(struct amd_iommu *iommu)
+{
+   u64 entry;
+
+   if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir))
+   return 0;
+
+   iommu->ga_log = (u8 *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+   get_order(GA_LOG_SIZE));
+   if (!iommu->ga_log)
+   goto err_out;
+
+   iommu->ga_log_tail = (u8 *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+   get_order(8));
+   if (!iommu->ga_log_tail)
+   goto err_out;
+
+   entry = (u64)virt_to_phys(iommu->ga_log) | GA_LOG_SIZE_512;
+   memcpy_toio(iommu->mmio_base + MMIO_GA_LOG_BASE_OFFSET,
+   &entry, sizeof(entry));
+   entry = ((u64)virt_to_phys(iommu->ga_log) & 0xFULL) & ~7ULL;
+   memcpy_toio(iommu->mmio_base + MMIO_GA_LOG_TAIL_OFFSET,
+   &entry, sizeof(entry));
+   writel(0x00, iommu->mmio_base + MMIO_GA_HEAD_OFFSET);
+   writel(0x00, iommu->mmio_base + MMIO_GA_TAIL_OFFSET);
+
+   return 0;
+err_out:
+   free_ga_log(iommu);
+   return -EINVAL;
+}
+#endif /* CONFIG_IRQ_REMAP */
+
+static int iommu_init_ga(struct amd_iommu *iommu)
+{
+   int ret = 0;
+
+#ifdef CONFIG_IRQ_REMAP
+   /* Note: We have already checked GASup from IVRS table.
+*   Now, we need to make sure that GAMSup is set.
+*/
+   if (AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) &&
+   !iommu_feature(iommu, FEATURE_GAM_VAPIC))
+   amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY_GA;
+
+   ret = iommu_init_ga_log(iommu);
+#endif /* CONFIG_IRQ_REMAP */
+
+   return ret;
+}
+
 static void iommu_enable_gt(struct amd_iommu *iommu)
 {
if (!iommu_feature(iommu, FEATURE_GT))
@@ -1146,6 +1244,7 @@ static void __init free_iommu_one(struct amd_iommu *iommu)
free_command_buffer(iommu);
free_event_buffer(iommu);
free_ppr_log(iommu);
+   free_ga_log(iommu);
iommu_unmap_mmio_space(iommu);
 }
 
@@ -1438,6 +1537,7 @@ static int iommu_init_pci(struct amd_iommu *iommu)
 {
int cap_ptr = iommu->cap_ptr;
u32 range, misc, low, high;
+   int ret;
 
iommu->dev = pci_get_bus_and_slot(PCI_BUS_NUM(iommu->devid),
  

[PART2 PATCH v7 10/12] svm: Introduces AVIC per-VM ID

2016-08-23 Thread Suravee Suthikulpanit
From: Suravee Suthikulpanit 

Introduces per-VM AVIC ID and helper functions to manage the IDs.
Currently, the ID will be used to implement 32-bit AVIC IOMMU GA tag.

The ID is 24-bit one-based indexing value, and is managed via helper
functions to get the next ID, or to free an ID once a VM is destroyed.
There should be no ID conflict for any active VMs.

Reviewed-by: Radim Krčmář 
Signed-off-by: Suravee Suthikulpanit 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c  | 51 +
 2 files changed, 52 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 69e62862..16b4d1d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -776,6 +776,7 @@ struct kvm_arch {
bool disabled_lapic_found;
 
/* Struct members for AVIC */
+   u32 avic_vm_id;
u32 ldr_mode;
struct page *avic_logical_id_table_page;
struct page *avic_physical_id_table_page;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 16ef31b..a718854 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -96,6 +96,19 @@ MODULE_DEVICE_TABLE(x86cpu, svm_cpu_id);
 #define AVIC_UNACCEL_ACCESS_OFFSET_MASK0xFF0
 #define AVIC_UNACCEL_ACCESS_VECTOR_MASK0x
 
+/* AVIC GATAG is encoded using VM and VCPU IDs */
+#define AVIC_VCPU_ID_BITS  8
+#define AVIC_VCPU_ID_MASK  ((1 << AVIC_VCPU_ID_BITS) - 1)
+
+#define AVIC_VM_ID_BITS24
+#define AVIC_VM_ID_NR  (1 << AVIC_VM_ID_BITS)
+#define AVIC_VM_ID_MASK((1 << AVIC_VM_ID_BITS) - 1)
+
+#define AVIC_GATAG(x, y)   (((x & AVIC_VM_ID_MASK) << 
AVIC_VCPU_ID_BITS) | \
+   (y & AVIC_VCPU_ID_MASK))
+#define AVIC_GATAG_TO_VMID(x)  ((x >> AVIC_VCPU_ID_BITS) & 
AVIC_VM_ID_MASK)
+#define AVIC_GATAG_TO_VCPUID(x)(x & AVIC_VCPU_ID_MASK)
+
 static bool erratum_383_found __read_mostly;
 
 static const u32 host_save_user_msrs[] = {
@@ -242,6 +255,10 @@ static int avic;
 module_param(avic, int, S_IRUGO);
 #endif
 
+/* AVIC VM ID bit masks and lock */
+static DECLARE_BITMAP(avic_vm_id_bitmap, AVIC_VM_ID_NR);
+static DEFINE_SPINLOCK(avic_vm_id_lock);
+
 static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
 static void svm_flush_tlb(struct kvm_vcpu *vcpu);
 static void svm_complete_interrupts(struct vcpu_svm *svm);
@@ -1280,10 +1297,40 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu)
return 0;
 }
 
+static inline int avic_get_next_vm_id(void)
+{
+   int id;
+
+   spin_lock(&avic_vm_id_lock);
+
+   /* AVIC VM ID is one-based. */
+   id = find_next_zero_bit(avic_vm_id_bitmap, AVIC_VM_ID_NR, 1);
+   if (id <= AVIC_VM_ID_MASK)
+   __set_bit(id, avic_vm_id_bitmap);
+   else
+   id = -EAGAIN;
+
+   spin_unlock(&avic_vm_id_lock);
+   return id;
+}
+
+static inline int avic_free_vm_id(int id)
+{
+   if (id <= 0 || id > AVIC_VM_ID_MASK)
+   return -EINVAL;
+
+   spin_lock(&avic_vm_id_lock);
+   __clear_bit(id, avic_vm_id_bitmap);
+   spin_unlock(&avic_vm_id_lock);
+   return 0;
+}
+
 static void avic_vm_destroy(struct kvm *kvm)
 {
struct kvm_arch *vm_data = &kvm->arch;
 
+   avic_free_vm_id(vm_data->avic_vm_id);
+
if (vm_data->avic_logical_id_table_page)
__free_page(vm_data->avic_logical_id_table_page);
if (vm_data->avic_physical_id_table_page)
@@ -1300,6 +1347,10 @@ static int avic_vm_init(struct kvm *kvm)
if (!avic)
return 0;
 
+   vm_data->avic_vm_id = avic_get_next_vm_id();
+   if (vm_data->avic_vm_id < 0)
+   return vm_data->avic_vm_id;
+
/* Allocating physical APIC ID table (4KB) */
p_page = alloc_page(GFP_KERNEL);
if (!p_page)
-- 
1.9.1



[PART2 PATCH v7 07/12] iommu/amd: Introduce amd_iommu_update_ga()

2016-08-23 Thread Suravee Suthikulpanit
From: Suravee Suthikulpanit 

Introduces a new IOMMU API, amd_iommu_update_ga(), which allows
KVM (SVM) to update existing posted interrupt IOMMU IRTE when
load/unload vcpu.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd_iommu.c   | 39 +++
 drivers/iommu/amd_iommu_types.h |  1 +
 include/linux/amd-iommu.h   |  9 +
 3 files changed, 49 insertions(+)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 8df3dbf..9f91480 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -4451,4 +4451,43 @@ int amd_iommu_create_irq_domain(struct amd_iommu *iommu)
 
return 0;
 }
+
+int amd_iommu_update_ga(int cpu, bool is_run, void *data)
+{
+   unsigned long flags;
+   struct amd_iommu *iommu;
+   struct irq_remap_table *irt;
+   struct amd_ir_data *ir_data = (struct amd_ir_data *)data;
+   int devid = ir_data->irq_2_irte.devid;
+   struct irte_ga *entry = (struct irte_ga *) ir_data->entry;
+   struct irte_ga *ref = (struct irte_ga *) ir_data->ref;
+
+   if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) ||
+   !ref || !entry || !entry->lo.fields_vapic.guest_mode)
+   return 0;
+
+   iommu = amd_iommu_rlookup_table[devid];
+   if (!iommu)
+   return -ENODEV;
+
+   irt = get_irq_table(devid, false);
+   if (!irt)
+   return -ENODEV;
+
+   spin_lock_irqsave(&irt->lock, flags);
+
+   if (ref->lo.fields_vapic.guest_mode) {
+   if (cpu >= 0)
+   ref->lo.fields_vapic.destination = cpu;
+   ref->lo.fields_vapic.is_run = is_run;
+   barrier();
+   }
+
+   spin_unlock_irqrestore(&irt->lock, flags);
+
+   iommu_flush_irt(iommu, devid);
+   iommu_completion_wait(iommu);
+   return 0;
+}
+EXPORT_SYMBOL(amd_iommu_update_ga);
 #endif
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index a3b6e22..6973952 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -811,6 +811,7 @@ struct amd_ir_data {
struct irq_2_irte irq_2_irte;
struct msi_msg msi_entry;
void *entry;/* Pointer to union irte or struct irte_ga */
+   void *ref;  /* Pointer to the actual irte */
 };
 
 struct amd_irte_ops {
diff --git a/include/linux/amd-iommu.h b/include/linux/amd-iommu.h
index 465d096..d8d48ac 100644
--- a/include/linux/amd-iommu.h
+++ b/include/linux/amd-iommu.h
@@ -179,6 +179,9 @@ static inline int amd_iommu_detect(void) { return -ENODEV; }
 /* IOMMU AVIC Function */
 extern int amd_iommu_register_ga_log_notifier(int (*notifier)(u32));
 
+extern int
+amd_iommu_update_ga(int cpu, bool is_run, void *data);
+
 #else /* defined(CONFIG_AMD_IOMMU) && defined(CONFIG_IRQ_REMAP) */
 
 static inline int
@@ -187,6 +190,12 @@ amd_iommu_register_ga_log_notifier(int (*notifier)(u32))
return 0;
 }
 
+static inline int
+amd_iommu_update_ga(int cpu, bool is_run, void *data)
+{
+   return 0;
+}
+
 #endif /* defined(CONFIG_AMD_IOMMU) && defined(CONFIG_IRQ_REMAP) */
 
 #endif /* _ASM_X86_AMD_IOMMU_H */
-- 
1.9.1



[PART2 PATCH v7 11/12] svm: Introduce AMD IOMMU avic_ga_log_notifier

2016-08-23 Thread Suravee Suthikulpanit
From: Suravee Suthikulpanit 

This patch introduces avic_ga_log_notifier, which will be called
by IOMMU driver whenever it handles the Guest vAPIC (GA) log entry.

Reviewed-by: Radim Krčmář 
Signed-off-by: Suravee Suthikulpanit 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c  | 70 +++--
 2 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 16b4d1d..a9466ad 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -780,6 +780,7 @@ struct kvm_arch {
u32 ldr_mode;
struct page *avic_logical_id_table_page;
struct page *avic_physical_id_table_page;
+   struct hlist_node hnode;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index a718854..8f87a0a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -34,6 +34,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -945,6 +947,55 @@ static void svm_disable_lbrv(struct vcpu_svm *svm)
set_msr_interception(msrpm, MSR_IA32_LASTINTTOIP, 0, 0);
 }
 
+/* Note:
+ * This hash table is used to map VM_ID to a struct kvm_arch,
+ * when handling AMD IOMMU GALOG notification to schedule in
+ * a particular vCPU.
+ */
+#define SVM_VM_DATA_HASH_BITS  8
+DECLARE_HASHTABLE(svm_vm_data_hash, SVM_VM_DATA_HASH_BITS);
+static spinlock_t svm_vm_data_hash_lock;
+
+/* Note:
+ * This function is called from IOMMU driver to notify
+ * SVM to schedule in a particular vCPU of a particular VM.
+ */
+static int avic_ga_log_notifier(u32 ga_tag)
+{
+   unsigned long flags;
+   struct kvm_arch *ka = NULL;
+   struct kvm_vcpu *vcpu = NULL;
+   u32 vm_id = AVIC_GATAG_TO_VMID(ga_tag);
+   u32 vcpu_id = AVIC_GATAG_TO_VCPUID(ga_tag);
+
+   pr_debug("SVM: %s: vm_id=%#x, vcpu_id=%#x\n", __func__, vm_id, vcpu_id);
+
+   spin_lock_irqsave(&svm_vm_data_hash_lock, flags);
+   hash_for_each_possible(svm_vm_data_hash, ka, hnode, vm_id) {
+   struct kvm *kvm = container_of(ka, struct kvm, arch);
+   struct kvm_arch *vm_data = &kvm->arch;
+
+   if (vm_data->avic_vm_id != vm_id)
+   continue;
+   vcpu = kvm_get_vcpu_by_id(kvm, vcpu_id);
+   break;
+   }
+   spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags);
+
+   if (!vcpu)
+   return 0;
+
+   /* Note:
+* At this point, the IOMMU should have already set the pending
+* bit in the vAPIC backing page. So, we just need to schedule
+* in the vcpu.
+*/
+   if (vcpu->mode == OUTSIDE_GUEST_MODE)
+   kvm_vcpu_wake_up(vcpu);
+
+   return 0;
+}
+
 static __init int svm_hardware_setup(void)
 {
int cpu;
@@ -1003,10 +1054,15 @@ static __init int svm_hardware_setup(void)
if (avic) {
if (!npt_enabled ||
!boot_cpu_has(X86_FEATURE_AVIC) ||
-   !IS_ENABLED(CONFIG_X86_LOCAL_APIC))
+   !IS_ENABLED(CONFIG_X86_LOCAL_APIC)) {
avic = false;
-   else
+   } else {
pr_info("AVIC enabled\n");
+
+   hash_init(svm_vm_data_hash);
+   spin_lock_init(&svm_vm_data_hash_lock);
+   
amd_iommu_register_ga_log_notifier(&avic_ga_log_notifier);
+   }
}
 
return 0;
@@ -1327,6 +1383,7 @@ static inline int avic_free_vm_id(int id)
 
 static void avic_vm_destroy(struct kvm *kvm)
 {
+   unsigned long flags;
struct kvm_arch *vm_data = &kvm->arch;
 
avic_free_vm_id(vm_data->avic_vm_id);
@@ -1335,10 +1392,15 @@ static void avic_vm_destroy(struct kvm *kvm)
__free_page(vm_data->avic_logical_id_table_page);
if (vm_data->avic_physical_id_table_page)
__free_page(vm_data->avic_physical_id_table_page);
+
+   spin_lock_irqsave(&svm_vm_data_hash_lock, flags);
+   hash_del(&vm_data->hnode);
+   spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags);
 }
 
 static int avic_vm_init(struct kvm *kvm)
 {
+   unsigned long flags;
int err = -ENOMEM;
struct kvm_arch *vm_data = &kvm->arch;
struct page *p_page;
@@ -1367,6 +1429,10 @@ static int avic_vm_init(struct kvm *kvm)
vm_data->avic_logical_id_table_page = l_page;
clear_page(page_address(l_page));
 
+   spin_lock_irqsave(&svm_vm_data_hash_lock, flags);
+   hash_add(svm_vm_data_hash, &vm_data->hnode, vm_data->avic_vm_id);
+   spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags);
+
return 0;
 
 free_avic:
-- 
1.9.1



[PART2 PATCH v7 12/12] svm: Implements update_pi_irte hook to setup posted interrupt

2016-08-23 Thread Suravee Suthikulpanit
From: Suravee Suthikulpanit 

This patch implements update_pi_irte function hook to allow SVM
communicate to IOMMU driver regarding how to set up IRTE for handling
posted interrupt.

In case AVIC is enabled, during vcpu_load/unload, SVM needs to update
IOMMU IRTE with appropriate host physical APIC ID. Also, when
vcpu_blocking/unblocking, SVM needs to update the is-running bit in
the IOMMU IRTE. Both are achieved via calling amd_iommu_update_ga().

However, if GA mode is not enabled for the pass-through device,
IOMMU driver will simply just return when calling amd_iommu_update_ga.

Signed-off-by: Suravee Suthikulpanit 
---
 arch/x86/kvm/svm.c | 285 +
 1 file changed, 266 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 8f87a0a..c921024 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include "trace.h"
@@ -200,6 +201,23 @@ struct vcpu_svm {
struct page *avic_backing_page;
u64 *avic_physical_id_cache;
bool avic_is_running;
+
+   /*
+* Per-vcpu list of struct amd_svm_iommu_ir:
+* This is used mainly to store interrupt remapping information used
+* when update the vcpu affinity. This avoids the need to scan for
+* IRTE and try to match ga_tag in the IOMMU driver.
+*/
+   struct list_head ir_list;
+   spinlock_t ir_list_lock;
+};
+
+/*
+ * This is a wrapper of struct amd_iommu_ir_data.
+ */
+struct amd_svm_iommu_ir {
+   struct list_head node;  /* Used by SVM for per-vcpu ir_list */
+   void *data; /* Storing pointer to struct amd_ir_data */
 };
 
 #define AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK   (0xFF)
@@ -1440,31 +1458,34 @@ free_avic:
return err;
 }
 
-/**
- * This function is called during VCPU halt/unhalt.
- */
-static void avic_set_running(struct kvm_vcpu *vcpu, bool is_run)
+static inline int
+avic_update_iommu_vcpu_affinity(struct kvm_vcpu *vcpu, int cpu, bool r)
 {
-   u64 entry;
-   int h_physical_id = kvm_cpu_get_apicid(vcpu->cpu);
+   int ret = 0;
+   unsigned long flags;
+   struct amd_svm_iommu_ir *ir;
struct vcpu_svm *svm = to_svm(vcpu);
 
-   if (!kvm_vcpu_apicv_active(vcpu))
-   return;
-
-   svm->avic_is_running = is_run;
+   if (!kvm_arch_has_assigned_device(vcpu->kvm))
+   return 0;
 
-   /* ID = 0xff (broadcast), ID > 0xff (reserved) */
-   if (WARN_ON(h_physical_id >= AVIC_MAX_PHYSICAL_ID_COUNT))
-   return;
+   /*
+* Here, we go through the per-vcpu ir_list to update all existing
+* interrupt remapping table entry targeting this vcpu.
+*/
+   spin_lock_irqsave(&svm->ir_list_lock, flags);
 
-   entry = READ_ONCE(*(svm->avic_physical_id_cache));
-   WARN_ON(is_run == !!(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK));
+   if (list_empty(&svm->ir_list))
+   goto out;
 
-   entry &= ~AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK;
-   if (is_run)
-   entry |= AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK;
-   WRITE_ONCE(*(svm->avic_physical_id_cache), entry);
+   list_for_each_entry(ir, &svm->ir_list, node) {
+   ret = amd_iommu_update_ga(cpu, r, ir->data);
+   if (ret)
+   break;
+   }
+out:
+   spin_unlock_irqrestore(&svm->ir_list_lock, flags);
+   return ret;
 }
 
 static void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
@@ -1491,6 +1512,8 @@ static void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
entry |= AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK;
 
WRITE_ONCE(*(svm->avic_physical_id_cache), entry);
+   avic_update_iommu_vcpu_affinity(vcpu, h_physical_id,
+   svm->avic_is_running);
 }
 
 static void avic_vcpu_put(struct kvm_vcpu *vcpu)
@@ -1502,10 +1525,27 @@ static void avic_vcpu_put(struct kvm_vcpu *vcpu)
return;
 
entry = READ_ONCE(*(svm->avic_physical_id_cache));
+   if (entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK)
+   avic_update_iommu_vcpu_affinity(vcpu, -1, 0);
+
entry &= ~AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK;
WRITE_ONCE(*(svm->avic_physical_id_cache), entry);
 }
 
+/**
+ * This function is called during VCPU halt/unhalt.
+ */
+static void avic_set_running(struct kvm_vcpu *vcpu, bool is_run)
+{
+   struct vcpu_svm *svm = to_svm(vcpu);
+
+   svm->avic_is_running = is_run;
+   if (is_run)
+   avic_vcpu_load(vcpu, vcpu->cpu);
+   else
+   avic_vcpu_put(vcpu);
+}
+
 static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
struct vcpu_svm *svm = to_svm(vcpu);
@@ -1567,6 +1607,9 @@ static struct kvm_vcpu *svm_create

Re: [PATCH 03/16] irqdomain: Allow irq domain lookup by fwnode

2015-10-12 Thread Suravee Suthikulpanit

[RESEND] Not sure if the email went out the first time.

Hi Marc,

On 10/6/15 12:36, Marc Zyngier wrote:

So far, our irq domains are still looked up by device node.
Let's change this and allow a domain to be looked up using
a fwnode_handle pointer.

The existing interfaces are preserved with a couple of helpers.

Signed-off-by: Marc Zyngier 
---
  include/linux/irqdomain.h | 11 +--
  kernel/irq/irqdomain.c| 14 ++
  2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index 2f508f4..607c185 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -183,10 +183,17 @@ struct irq_domain *irq_domain_add_legacy(struct 
device_node *of_node,
 irq_hw_number_t first_hwirq,
 const struct irq_domain_ops *ops,
 void *host_data);
-extern struct irq_domain *irq_find_matching_host(struct device_node *node,
-enum irq_domain_bus_token 
bus_token);
+extern struct irq_domain *irq_find_matching_fwnode(struct fwnode_handle 
*fwnode,
+  enum irq_domain_bus_token 
bus_token);
  extern void irq_set_default_host(struct irq_domain *host);

+static inline struct irq_domain *irq_find_matching_host(struct device_node 
*node,
+   enum 
irq_domain_bus_token bus_token)
+{
+   return irq_find_matching_fwnode(node ? &node->fwnode : NULL,
+   bus_token);
+}
+
  static inline struct irq_domain *irq_find_host(struct device_node *node)
  {
return irq_find_matching_host(node, DOMAIN_BUS_ANY);
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 1aee5c1..10b6105 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -191,12 +191,12 @@ struct irq_domain *irq_domain_add_legacy(struct 
device_node *of_node,
  EXPORT_SYMBOL_GPL(irq_domain_add_legacy);

  /**
- * irq_find_matching_host() - Locates a domain for a given device node
- * @node: device-tree node of the interrupt controller
+ * irq_find_matching_fwnode() - Locates a domain for a given fwnode
+ * @fwnode: FW descriptor of the interrupt controller
   * @bus_token: domain-specific data
   */
-struct irq_domain *irq_find_matching_host(struct device_node *node,
- enum irq_domain_bus_token bus_token)
+struct irq_domain *irq_find_matching_fwnode(struct fwnode_handle *fwnode,
+   enum irq_domain_bus_token bus_token)
  {
struct irq_domain *h, *found = NULL;
int rc;
@@ -212,12 +212,10 @@ struct irq_domain *irq_find_matching_host(struct 
device_node *node,
 */
mutex_lock(&irq_domain_mutex);
list_for_each_entry(h, &irq_domain_list, link) {
-   struct device_node *of_node;
-   of_node = irq_domain_get_of_node(h);
if (h->ops->match)
-   rc = h->ops->match(h, node, bus_token);
+   rc = h->ops->match(h, to_of_node(fwnode), bus_token);
else
-   rc = ((of_node != NULL) && (of_node == node) &&
+   rc = ((fwnode != NULL) && (h->fwnode == fwnode) &&
  ((bus_token == DOMAIN_BUS_ANY) ||
   (h->bus_token == bus_token)));




In the kernel/irq/irq_domain.c, shouldn't you also change the export 
symbol from:


EXPORT_SYMBOL_GPL(irq_find_matching_host);

to:

EXPORT_SYMBOL_GPL(irq_find_matching_fwnode);

at the end of this function as well?

Thanks,
Suravee


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 0/4] PCI: ACPI: Setting up DMA coherency for PCI device from _CCA attribute

2015-10-12 Thread Suravee Suthikulpanit

Hi Rafael,

On 9/9/15 21:48, Suthikulpanit, Suravee wrote:

Hi Rafael,

On 9/10/2015 3:38 AM, Rafael J. Wysocki wrote:

On Wednesday, September 09, 2015 07:16:49 PM Suthikulpanit, Suravee
wrote:

>Hi All,
>
>Are there any other concerns about this patch series?

I have none, but then it sort of missed the merge window.

I can easily queue it up for the next one unless it is super-urgent,
but in that case I need to know why that's the case.



This is not urgent. This mainly needed to enable the ACPI PCI supports
for ARM64 which is still work in progress.

Thanks,
Suravee
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Just wondering if you are planning to queue this series up for 4.4 as well?

Thank you,
Suravee
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 0/4] PCI: ACPI: Setting up DMA coherency for PCI device from _CCA attribute

2015-10-12 Thread Suravee Suthikulpanit

Hi

On 10/12/15 15:27, Rafael J. Wysocki wrote:

Just wondering if you are planning to queue this series up for 4.4 as well?

You don't seem to have addressed the Bjorn's comments on patch [2/4].

They need to be addressed before I can take this series.

Thanks,
Rafael



Ah, I missed that one. Sorry. I'll get back on that.

Suravee
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 2/4] ACPI/scan: Clean up acpi_check_dma

2015-10-13 Thread Suravee Suthikulpanit

Hi Bjorn,

Thanks for your feedback. And sorry for late response. Some how I didn't 
see this earlier.  Please see my comments below.


On 09/14/2015 09:34 AM, Bjorn Helgaas wrote:

[..]
So, in order to simplify the function, this patch renames acpi_check_dma()
to acpi_check_dma_coherency() to clearly indicate the purpose of this
function, and only returns an integer where -1 means DMA not supported,
1 means coherent DMA, and 0 means non-coherent DMA.


I think acpi_check_dma_coherency() is better, but only slightly.  It
still doesn't give a hint about the *sense* of the return value.  I
think it'd be easier to read if there were two functions, e.g.,


I have been going back-and-forth between the current version, and the 
two-function-approach in the past. I can definitely go with this route 
if you would prefer. Although, if acpi_dma_is_coherent() == 0, it would 
be ambiguous whether DMA is not supported or non-coherent DMA is 
supported. Then, we would need to call acpi_dma_is_supported() to find 
out. So, that's okay with you?



[...]
+
+   /**
+* Currently, we only support _CCA=1 (i.e. coherent_dma=1)
+* This should be equivalent to specifying dma-coherent for
+* a device in OF.
+*
+* For the case when _CCA=0 (i.e. coherent_dma=0 && cca_seen=1),
+* we have two choices:
+*   1. Do not support and disable DMA.


I know you didn't write this comment, but do we actually *disable* DMA in
the sense of turning off PCI bus mastering or calling an ACPI method that
disables DMA by this device?  I suspect we just don't set up DMA ops and
masks for this device.


Actually, I wrote this comment. When we disable DMA, we basically set 
dma-mask=0 and do not setup DMA ops as you mentioned. We don't actually 
mess with the hardware.


Thanks,
Suravee
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/4] pci: msi: Add support to query MSI domain for pci device

2015-10-13 Thread Suravee Suthikulpanit
This patch introduces an interface for irqchip to register a callback,
to provide a way to determine appropriate MSI domain for a pci device.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/pci/msi.c   | 30 ++
 include/linux/msi.h |  7 +++
 2 files changed, 37 insertions(+)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index ddd59fe..2c87843 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1327,4 +1327,34 @@ struct irq_domain 
*pci_msi_create_default_irq_domain(struct fwnode_handle *fwnod
 
return domain;
 }
+
+static struct fwnode_handle *(*pci_msi_get_fwnode_cb)(struct device *dev);
+
+/**
+ * pci_msi_register_fwnode_provider - Register callback to retrieve fwnode
+ * @fn:The interrupt domain to retrieve
+ *
+ * This should be called by irqchip driver, which is the parent of
+ * the MSI domain to provide callback interface to query fwnode.
+ */
+void
+pci_msi_register_fwnode_provider(struct fwnode_handle *(*fn)(struct device *))
+{
+   pci_msi_get_fwnode_cb = fn;
+}
+
+/**
+ * pci_msi_get_fwnode - Query fwnode for MSI controller of the @dev
+ * @dev:   The device that we try to query MSI domain token for
+ *
+ * This is used to query MSI domain token when setting up MSI domain
+ * for a device. Returns fwnode_handle * if token found / NULL if not found
+ */
+struct fwnode_handle *pci_msi_get_fwnode(struct device *dev)
+{
+   if (pci_msi_get_fwnode_cb)
+   return pci_msi_get_fwnode_cb(dev);
+
+   return NULL;
+}
 #endif /* CONFIG_PCI_MSI_IRQ_DOMAIN */
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 32a24b9..ceaebf6 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -3,6 +3,7 @@
 
 #include 
 #include 
+#include 
 
 struct msi_msg {
u32 address_lo; /* low 32 bits of msi message address */
@@ -294,6 +295,12 @@ irq_hw_number_t pci_msi_domain_calc_hwirq(struct pci_dev 
*dev,
  struct msi_desc *desc);
 int pci_msi_domain_check_cap(struct irq_domain *domain,
 struct msi_domain_info *info, struct device *dev);
+
+void
+pci_msi_register_fwnode_provider(struct fwnode_handle *(*fn)(struct device *));
+
+struct fwnode_handle *pci_msi_get_fwnode(struct device *dev);
+
 #endif /* CONFIG_PCI_MSI_IRQ_DOMAIN */
 
 #endif /* LINUX_MSI_H */
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/4] gicv2m: acpi: Introducing GICv2m ACPI support

2015-10-13 Thread Suravee Suthikulpanit
This patch introduces gicv2m_acpi_init(), which uses information
in MADT GIC MSI frames structure to initialize GICv2m driver.

Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Hanjun Guo 
---
 drivers/irqchip/irq-gic-v2m.c   | 106 
 drivers/irqchip/irq-gic.c   |   3 ++
 include/linux/irqchip/arm-gic.h |   6 +++
 3 files changed, 115 insertions(+)

diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c
index 97d1bf4..b52560b 100644
--- a/drivers/irqchip/irq-gic-v2m.c
+++ b/drivers/irqchip/irq-gic-v2m.c
@@ -15,7 +15,10 @@
 
 #define pr_fmt(fmt) "GICv2m: " fmt
 
+#include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -51,6 +54,7 @@
 #define GICV2M_NEEDS_SPI_OFFSET0x0001
 
 struct v2m_data {
+   struct list_head list;
spinlock_t msi_cnt_lock;
struct resource res;/* GICv2m resource */
void __iomem *base; /* GICv2m virt address */
@@ -58,8 +62,11 @@ struct v2m_data {
u32 nr_spis;/* The number of SPIs for MSIs */
unsigned long *bm;  /* MSI vector bitmap */
u32 flags;  /* v2m flags for specific implementation */
+   struct fwnode_handle *fwnode;
 };
 
+static LIST_HEAD(v2m_data_list);
+
 static void gicv2m_mask_msi_irq(struct irq_data *d)
 {
pci_msi_mask_irq(d);
@@ -134,6 +141,12 @@ static int gicv2m_irq_gic_domain_alloc(struct irq_domain 
*domain,
fwspec.param[0] = 0;
fwspec.param[1] = hwirq - 32;
fwspec.param[2] = IRQ_TYPE_EDGE_RISING;
+   } else if (domain->parent->fwnode->type == FWNODE_IRQCHIP) {
+   /* Note: This is mainly for GICv2m ACPI. */
+   fwspec.fwnode = domain->parent->fwnode;
+   fwspec.param_count = 2;
+   fwspec.param[0] = hwirq;
+   fwspec.param[1] = IRQ_TYPE_EDGE_RISING & IRQ_TYPE_SENSE_MASK;
} else {
return -EINVAL;
}
@@ -317,6 +330,8 @@ static int __init gicv2m_init_one(struct irq_domain *parent,
}
 
spin_lock_init(&v2m->msi_cnt_lock);
+   v2m->fwnode = fwnode;
+   list_add(&v2m->list, &v2m_data_list);
 
pr_info("range[%#lx:%#lx], SPI[%d:%d]\n",
(unsigned long)res->start, (unsigned long)res->end,
@@ -379,3 +394,94 @@ int __init gicv2m_of_init(struct device_node *node, struct 
irq_domain *parent)
 
return ret;
 }
+
+#ifdef CONFIG_ACPI
+static int acpi_num_msi;
+
+/**
+ * Note:
+ * This is used as a temporary variable since we cannot
+ * pass args into acpi_parse_masdt_msi() when calling
+ * acpi_parse_entries(),
+ */
+struct irq_domain *acpi_parent_domain;
+
+static int __init
+acpi_parse_madt_msi(struct acpi_subtable_header *header,
+   const unsigned long end)
+{
+   int ret;
+   struct resource res;
+   u32 spi_start = 0, nr_spis = 0;
+   struct acpi_madt_generic_msi_frame *m;
+   struct fwnode_handle *domain_handle = NULL;
+
+   m = (struct acpi_madt_generic_msi_frame *)header;
+   if (BAD_MADT_ENTRY(m, end))
+   return -EINVAL;
+
+   res.start = m->base_address;
+   res.end = m->base_address + 0x1000;
+
+   if (m->flags & ACPI_MADT_OVERRIDE_SPI_VALUES) {
+   spi_start = m->spi_base;
+   nr_spis = m->spi_count;
+
+   pr_info("ACPI overriding V2M MSI_TYPER (base:%u, num:%u)\n",
+   spi_start, nr_spis);
+   }
+
+   domain_handle = irq_domain_alloc_fwnode((void *)m->base_address);
+   if (!domain_handle) {
+   pr_err("Unable to allocate GICv2m domain token\n");
+   return -EINVAL;
+   }
+
+   if (gicv2m_init_one(acpi_parent_domain, spi_start, nr_spis, &res,
+   domain_handle)) {
+   ret = -EINVAL;
+   goto err_out;
+   }
+
+   return 0;
+err_out:
+   if (domain_handle)
+   irq_domain_free_fwnode(domain_handle);
+   return ret;
+}
+
+static struct fwnode_handle *gicv2m_get_fwnode(struct device *dev)
+{
+   struct v2m_data *data;
+
+   if (!acpi_num_msi)
+   return NULL;
+
+   /* We only support one MSI frame at the moment. */
+   data = list_first_entry_or_null(&v2m_data_list,
+   struct v2m_data, list);
+   if (!data)
+   return NULL;
+
+   return data->fwnode;
+}
+
+int __init gicv2m_acpi_init(struct irq_domain *parent)
+{
+   if (acpi_num_msi > 0)
+   return 0;
+
+   acpi_parent_domain = parent;
+
+   acpi_num_msi = acpi_table_parse_madt(ACPI_MADT_TYPE_GENERIC_MSI_FRAME,
+ acpi_parse_madt_msi, 0);
+
+   if (acpi_num_msi)
+   pci_msi_register_fwnode_provider(&gicv

[PATCH 3/4] gicv2m: Refactor to prepare for ACPI support

2015-10-13 Thread Suravee Suthikulpanit
This patch refactors gicv2m_init_one() to prepare for ACPI support.
It also replaces the irq_domain_add_tree() w/ irq_domain_create_tree()
since we will need to pass the struct fwnode_handle, instead of
struct device_node, when adding ACPI support later.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/irqchip/irq-gic-v2m.c | 51 ++-
 1 file changed, 31 insertions(+), 20 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c
index bf9b3c0..97d1bf4 100644
--- a/drivers/irqchip/irq-gic-v2m.c
+++ b/drivers/irqchip/irq-gic-v2m.c
@@ -239,8 +239,10 @@ static struct msi_domain_info gicv2m_pmsi_domain_info = {
.chip   = &gicv2m_pmsi_irq_chip,
 };
 
-static int __init gicv2m_init_one(struct device_node *node,
- struct irq_domain *parent)
+static int __init gicv2m_init_one(struct irq_domain *parent,
+ u32 spi_start, u32 nr_spis,
+ struct resource *res,
+ struct fwnode_handle *fwnode)
 {
int ret;
struct v2m_data *v2m;
@@ -252,23 +254,17 @@ static int __init gicv2m_init_one(struct device_node 
*node,
return -ENOMEM;
}
 
-   ret = of_address_to_resource(node, 0, &v2m->res);
-   if (ret) {
-   pr_err("Failed to allocate v2m resource.\n");
-   goto err_free_v2m;
-   }
-
-   v2m->base = ioremap(v2m->res.start, resource_size(&v2m->res));
+   v2m->base = ioremap(res->start, resource_size(res));
if (!v2m->base) {
pr_err("Failed to map GICv2m resource\n");
ret = -ENOMEM;
goto err_free_v2m;
}
+   memcpy(&v2m->res, res, sizeof(struct resource));
 
-   if (!of_property_read_u32(node, "arm,msi-base-spi", &v2m->spi_start) &&
-   !of_property_read_u32(node, "arm,msi-num-spis", &v2m->nr_spis)) {
-   pr_info("Overriding V2M MSI_TYPER (base:%u, num:%u)\n",
-   v2m->spi_start, v2m->nr_spis);
+   if (spi_start && nr_spis) {
+   v2m->spi_start = spi_start;
+   v2m->nr_spis = nr_spis;
} else {
u32 typer = readl_relaxed(v2m->base + V2M_MSI_TYPER);
 
@@ -299,7 +295,7 @@ static int __init gicv2m_init_one(struct device_node *node,
goto err_iounmap;
}
 
-   inner_domain = irq_domain_add_tree(node, &gicv2m_domain_ops, v2m);
+   inner_domain = irq_domain_create_tree(fwnode, &gicv2m_domain_ops, v2m);
if (!inner_domain) {
pr_err("Failed to create GICv2m domain\n");
ret = -ENOMEM;
@@ -308,10 +304,10 @@ static int __init gicv2m_init_one(struct device_node 
*node,
 
inner_domain->bus_token = DOMAIN_BUS_NEXUS;
inner_domain->parent = parent;
-   pci_domain = pci_msi_create_irq_domain(of_node_to_fwnode(node),
+   pci_domain = pci_msi_create_irq_domain(fwnode,
   &gicv2m_msi_domain_info,
   inner_domain);
-   plat_domain = platform_msi_create_irq_domain(of_node_to_fwnode(node),
+   plat_domain = platform_msi_create_irq_domain(fwnode,
 &gicv2m_pmsi_domain_info,
 inner_domain);
if (!pci_domain || !plat_domain) {
@@ -322,10 +318,9 @@ static int __init gicv2m_init_one(struct device_node *node,
 
spin_lock_init(&v2m->msi_cnt_lock);
 
-   pr_info("Node %s: range[%#lx:%#lx], SPI[%d:%d]\n", node->name,
-   (unsigned long)v2m->res.start, (unsigned long)v2m->res.end,
+   pr_info("range[%#lx:%#lx], SPI[%d:%d]\n",
+   (unsigned long)res->start, (unsigned long)res->end,
v2m->spi_start, (v2m->spi_start + v2m->nr_spis));
-
return 0;
 
 err_free_domains:
@@ -356,10 +351,26 @@ int __init gicv2m_of_init(struct device_node *node, 
struct irq_domain *parent)
 
for (child = of_find_matching_node(node, gicv2m_device_id); child;
 child = of_find_matching_node(child, gicv2m_device_id)) {
+   u32 spi_start = 0, nr_spis = 0;
+   struct resource res;
+
if (!of_find_property(child, "msi-controller", NULL))
continue;
 
-   ret = gicv2m_init_one(child, parent);
+   ret = of_address_to_resource(child, 0, &res);
+   if (ret) {
+   pr_err("Failed to allocate v2m resource.\n");
+   break;
+   }
+
+   if (!of_property_read_u32

[PATCH 2/4] acpi: pci: Setup MSI domain for ACPI based pci devices

2015-10-13 Thread Suravee Suthikulpanit
This patch introduces pci_host_bridge_acpi_msi_domain(), which returns
the MSI domain of the specified PCI host bridge with DOMAIN_BUS_PCI_MSI
bus token. Then, it is assigned to pci device.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/pci/pci-acpi.c | 13 +
 drivers/pci/probe.c|  2 ++
 include/linux/pci.h|  7 +++
 3 files changed, 22 insertions(+)

diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index a32ba75..0e21ef4 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -9,7 +9,9 @@
 
 #include 
 #include 
+#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -689,6 +691,17 @@ static struct acpi_bus_type acpi_pci_bus = {
.cleanup = pci_acpi_cleanup,
 };
 
+struct irq_domain *pci_host_bridge_acpi_msi_domain(struct pci_bus *bus)
+{
+   struct irq_domain *dom = NULL;
+   struct fwnode_handle *fwnode = pci_msi_get_fwnode(&bus->dev);
+
+   if (fwnode)
+   dom = irq_find_matching_fwnode(fwnode,
+  DOMAIN_BUS_PCI_MSI);
+   return dom;
+}
+
 static int __init acpi_pci_init(void)
 {
int ret;
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 0dbc7fb..bea1840 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -671,6 +671,8 @@ static struct irq_domain *pci_host_bridge_msi_domain(struct 
pci_bus *bus)
 * should be called from here.
 */
d = pci_host_bridge_of_msi_domain(bus);
+   if (!d)
+   d = pci_host_bridge_acpi_msi_domain(bus);
 
return d;
 }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e90eb22..4a7f6a9 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1925,6 +1925,13 @@ static inline struct irq_domain *
 pci_host_bridge_of_msi_domain(struct pci_bus *bus) { return NULL; }
 #endif  /* CONFIG_OF */
 
+#ifdef CONFIG_ACPI
+struct irq_domain *pci_host_bridge_acpi_msi_domain(struct pci_bus *bus);
+#else
+static inline struct irq_domain *
+pci_host_bridge_acpi_msi_domain(struct pci_bus *bus) { return NULL; }
+#endif
+
 #ifdef CONFIG_EEH
 static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev)
 {
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/4] gicv2m: acpi: Add ACPI support for GICv2m MSI

2015-10-13 Thread Suravee Suthikulpanit
This patch series has been forked from the following patch series since
it no longer depends on the rest of the patches.

  [PATCH v4 00/10] ACPI GIC Self-probing, GICv2m and GICv3 support
  https://lkml.org/lkml/2015/7/29/234

It has been ported to use the newly introduced device fwnode_handle 
for ACPI irqdmain introduced by Marc in the following patch series:

  [PATCH v2 00/17] Divorcing irqdomain and device_node
  http://git.kernel.org/cgit/linux/kernel/git/maz/arm-platforms.git 
irq/irq-domain-fwnode-v2

The following git branch contains the submitted patches along with
the pre-requsite patches (mainly for ARM64 PCI support for ACPI).

  https://github.com/ssuthiku/linux.git irq-domain-fwnode-v2-v2m

This has been tested on AMD Seattle (Overdrive) RevB system. 

Suravee Suthikulpanit (4):
  pci: msi: Add support to query MSI domain for pci device
  acpi: pci: Setup MSI domain for ACPI based pci devices
  gicv2m: Refactor to prepare for ACPI support
  gicv2m: acpi: Introducing GICv2m ACPI support

 drivers/irqchip/irq-gic-v2m.c   | 157 +++-
 drivers/irqchip/irq-gic.c   |   3 +
 drivers/pci/msi.c   |  30 
 drivers/pci/pci-acpi.c  |  13 
 drivers/pci/probe.c |   2 +
 include/linux/irqchip/arm-gic.h |   6 ++
 include/linux/msi.h |   7 ++
 include/linux/pci.h |   7 ++
 8 files changed, 205 insertions(+), 20 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 4/6] irqdomain: Introduce irq_domain_get_irqchip_fwnode_name helper function

2015-10-14 Thread Suravee Suthikulpanit
This patch adds an accessor function to retrieve struct irqchip_fwid.name.

Signed-off-by: Suravee Suthikulpanit 
---
 include/linux/irqdomain.h |  1 +
 kernel/irq/irqdomain.c| 18 ++
 2 files changed, 19 insertions(+)

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index 4950a71..006633d 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -187,6 +187,7 @@ static inline struct device_node 
*irq_domain_get_of_node(struct irq_domain *d)
 #ifdef CONFIG_IRQ_DOMAIN
 struct fwnode_handle *irq_domain_alloc_fwnode(void *data);
 void irq_domain_free_fwnode(struct fwnode_handle *fwnode);
+const char *irq_domain_get_irqchip_fwnode_name(struct fwnode_handle *fwnode);
 struct irq_domain *__irq_domain_add(struct fwnode_handle *fwnode, int size,
irq_hw_number_t hwirq_max, int direct_max,
const struct irq_domain_ops *ops,
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 7f34d98..a8c1cf6 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -79,6 +79,24 @@ void irq_domain_free_fwnode(struct fwnode_handle *fwnode)
 }
 
 /**
+ * irq_domain_get_irqchip_fwnode_name - Retrieve associated name of
+ *  specified irqchip fwnode
+ * @fwnode: Specified fwnode_handle
+ *
+ * Returns associated name of the specified fwnode, or NULL on failure.
+ */
+const char *irq_domain_get_irqchip_fwnode_name(struct fwnode_handle *fwnode)
+{
+   struct irqchip_fwid *fwid;
+
+   if (!is_fwnode_irqchip(fwnode))
+   return NULL;
+
+   fwid = container_of(fwnode, struct irqchip_fwid, fwnode);
+   return fwid->name;
+}
+
+/**
  * __irq_domain_add() - Allocate a new irq_domain data structure
  * @of_node: optional device-tree node of the interrupt controller
  * @size: Size of linear map; 0 for radix mapping only
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 0/6] gicv2m: acpi: Add ACPI support for GICv2m MSI

2015-10-14 Thread Suravee Suthikulpanit
This patch series has been forked from the following patch series since
it no longer depends on the rest of the patches.

  [PATCH v4 00/10] ACPI GIC Self-probing, GICv2m and GICv3 support
  https://lkml.org/lkml/2015/7/29/234

It has been ported to use the newly introduced device fwnode_handle 
for ACPI irqdmain introduced by Marc in the following patch series:

  [PATCH v2 00/17] Divorcing irqdomain and device_node
  http://git.kernel.org/cgit/linux/kernel/git/maz/arm-platforms.git 
irq/irq-domain-fwnode-v2

The following git branch contains the submitted patches along with
the pre-requsite patches (mainly for ARM64 PCI support for ACPI).

  https://github.com/ssuthiku/linux.git irq-domain-fwnode-v2-v2m-multiframe

This has been tested on AMD Seattle (Overdrive) RevB system. 

NOTE: I have not tested ACPI GICv2m multiframe support since
I don't have access to such system. Any helps are appreciated.

Thanks,
Suravee

Changes from V1: (https://lkml.org/lkml/2015/10/13/859)
  - Rebase on top of Marc's patch to addng support for multiple MSI frames
(https://lkml.org/lkml/2015/10/14/271)
  - Adding fwnode convenient functions (patch 3 and 4)

Suravee Suthikulpanit (6):
  pci: msi: Add support to query MSI domain for pci device
  acpi: pci: Setup MSI domain for ACPI based pci devices
  irqdomain: introduce is_fwnode_irqchip helper
  irqdomain: Introduce irq_domain_get_irqchip_fwnode_name helper
function
  gicv2m: Refactor to prepare for ACPI support
  gicv2m: acpi: Introducing GICv2m ACPI support

 drivers/irqchip/irq-gic-v2m.c   | 151 ++--
 drivers/irqchip/irq-gic.c   |   5 +-
 drivers/pci/msi.c   |  30 
 drivers/pci/pci-acpi.c  |  13 
 drivers/pci/probe.c |   2 +
 include/linux/irqchip/arm-gic.h |   6 ++
 include/linux/irqdomain.h   |   6 ++
 include/linux/msi.h |   7 ++
 include/linux/pci.h |   7 ++
 kernel/irq/irqdomain.c  |  20 +-
 10 files changed, 224 insertions(+), 23 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 5/6] gicv2m: Refactor to prepare for ACPI support

2015-10-14 Thread Suravee Suthikulpanit
This patch replaces the struct device_node in v2m_data with
struct fwnode_handle since this structure is common between DT and ACPI.

It also refactors gicv2m_init_one() to prepare for ACPI support.
There should be no functional changes.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/irqchip/irq-gic-v2m.c | 57 +++
 1 file changed, 36 insertions(+), 21 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c
index 87f8d10..7e60f7e 100644
--- a/drivers/irqchip/irq-gic-v2m.c
+++ b/drivers/irqchip/irq-gic-v2m.c
@@ -55,7 +55,7 @@ static DEFINE_SPINLOCK(v2m_lock);
 
 struct v2m_data {
struct list_head entry;
-   struct device_node *node;
+   struct fwnode_handle *fwnode;
struct resource res;/* GICv2m resource */
void __iomem *base; /* GICv2m virt address */
u32 spi_start;  /* The SPI number that MSIs start */
@@ -254,7 +254,7 @@ static void gicv2m_teardown(void)
list_del(&v2m->entry);
kfree(v2m->bm);
iounmap(v2m->base);
-   of_node_put(v2m->node);
+   of_node_put(to_of_node(v2m->fwnode));
kfree(v2m);
}
 }
@@ -268,7 +268,7 @@ static int gicv2m_allocate_domains(struct irq_domain 
*parent)
if (!v2m)
return 0;
 
-   inner_domain = irq_domain_create_tree(of_node_to_fwnode(v2m->node),
+   inner_domain = irq_domain_create_tree(v2m->fwnode,
  &gicv2m_domain_ops, v2m);
if (!inner_domain) {
pr_err("Failed to create GICv2m domain\n");
@@ -277,10 +277,10 @@ static int gicv2m_allocate_domains(struct irq_domain 
*parent)
 
inner_domain->bus_token = DOMAIN_BUS_NEXUS;
inner_domain->parent = parent;
-   pci_domain = pci_msi_create_irq_domain(of_node_to_fwnode(v2m->node),
+   pci_domain = pci_msi_create_irq_domain(v2m->fwnode,
   &gicv2m_msi_domain_info,
   inner_domain);
-   plat_domain = 
platform_msi_create_irq_domain(of_node_to_fwnode(v2m->node),
+   plat_domain = platform_msi_create_irq_domain(v2m->fwnode,
 &gicv2m_pmsi_domain_info,
 inner_domain);
if (!pci_domain || !plat_domain) {
@@ -296,11 +296,13 @@ static int gicv2m_allocate_domains(struct irq_domain 
*parent)
return 0;
 }
 
-static int __init gicv2m_init_one(struct device_node *node,
- struct irq_domain *parent)
+static int __init gicv2m_init_one(struct fwnode_handle *fwnode,
+ u32 spi_start, u32 nr_spis,
+ struct resource *res)
 {
int ret;
struct v2m_data *v2m;
+   const char *name = NULL;
 
v2m = kzalloc(sizeof(struct v2m_data), GFP_KERNEL);
if (!v2m) {
@@ -309,13 +311,9 @@ static int __init gicv2m_init_one(struct device_node *node,
}
 
INIT_LIST_HEAD(&v2m->entry);
-   v2m->node = node;
+   v2m->fwnode = fwnode;
 
-   ret = of_address_to_resource(node, 0, &v2m->res);
-   if (ret) {
-   pr_err("Failed to allocate v2m resource.\n");
-   goto err_free_v2m;
-   }
+   memcpy(&v2m->res, res, sizeof(struct resource));
 
v2m->base = ioremap(v2m->res.start, resource_size(&v2m->res));
if (!v2m->base) {
@@ -324,10 +322,9 @@ static int __init gicv2m_init_one(struct device_node *node,
goto err_free_v2m;
}
 
-   if (!of_property_read_u32(node, "arm,msi-base-spi", &v2m->spi_start) &&
-   !of_property_read_u32(node, "arm,msi-num-spis", &v2m->nr_spis)) {
-   pr_info("Overriding V2M MSI_TYPER (base:%u, num:%u)\n",
-   v2m->spi_start, v2m->nr_spis);
+   if (spi_start && nr_spis) {
+   v2m->spi_start = spi_start;
+   v2m->nr_spis = nr_spis;
} else {
u32 typer = readl_relaxed(v2m->base + V2M_MSI_TYPER);
 
@@ -359,10 +356,13 @@ static int __init gicv2m_init_one(struct device_node 
*node,
}
 
list_add_tail(&v2m->entry, &v2m_nodes);
-   pr_info("Node %s: range[%#lx:%#lx], SPI[%d:%d]\n", node->name,
-   (unsigned long)v2m->res.start, (unsigned long)v2m->res.end,
-   v2m->spi_start, (v2m->spi_start + v2m->nr_spis));
 
+   if (to_of_node(fwnode))
+   name = to_of_node(fwnode)->name;
+
+   pr_info("Frame %s: range[%#lx:%#lx], SPI[%d:%d]\n", name,
+   (unsigned long)res->start,

[PATCH V2 2/6] acpi: pci: Setup MSI domain for ACPI based pci devices

2015-10-14 Thread Suravee Suthikulpanit
This patch introduces pci_host_bridge_acpi_msi_domain(), which returns
the MSI domain of the specified PCI host bridge with DOMAIN_BUS_PCI_MSI
bus token. Then, it is assigned to pci device.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/pci/pci-acpi.c | 13 +
 drivers/pci/probe.c|  2 ++
 include/linux/pci.h|  7 +++
 3 files changed, 22 insertions(+)

diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index a32ba75..0e21ef4 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -9,7 +9,9 @@
 
 #include 
 #include 
+#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -689,6 +691,17 @@ static struct acpi_bus_type acpi_pci_bus = {
.cleanup = pci_acpi_cleanup,
 };
 
+struct irq_domain *pci_host_bridge_acpi_msi_domain(struct pci_bus *bus)
+{
+   struct irq_domain *dom = NULL;
+   struct fwnode_handle *fwnode = pci_msi_get_fwnode(&bus->dev);
+
+   if (fwnode)
+   dom = irq_find_matching_fwnode(fwnode,
+  DOMAIN_BUS_PCI_MSI);
+   return dom;
+}
+
 static int __init acpi_pci_init(void)
 {
int ret;
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 0dbc7fb..bea1840 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -671,6 +671,8 @@ static struct irq_domain *pci_host_bridge_msi_domain(struct 
pci_bus *bus)
 * should be called from here.
 */
d = pci_host_bridge_of_msi_domain(bus);
+   if (!d)
+   d = pci_host_bridge_acpi_msi_domain(bus);
 
return d;
 }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e90eb22..4a7f6a9 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1925,6 +1925,13 @@ static inline struct irq_domain *
 pci_host_bridge_of_msi_domain(struct pci_bus *bus) { return NULL; }
 #endif  /* CONFIG_OF */
 
+#ifdef CONFIG_ACPI
+struct irq_domain *pci_host_bridge_acpi_msi_domain(struct pci_bus *bus);
+#else
+static inline struct irq_domain *
+pci_host_bridge_acpi_msi_domain(struct pci_bus *bus) { return NULL; }
+#endif
+
 #ifdef CONFIG_EEH
 static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev)
 {
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 3/6] irqdomain: introduce is_fwnode_irqchip helper

2015-10-14 Thread Suravee Suthikulpanit
Since there will be several places checking if fwnode.type
is equal FWNODE_IRQCHIP, this patch adds a convenient function
for this purpose.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/irqchip/irq-gic.c | 2 +-
 include/linux/irqdomain.h | 5 +
 kernel/irq/irqdomain.c| 2 +-
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index 1d0e768..6685b33 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -939,7 +939,7 @@ static int gic_irq_domain_translate(struct irq_domain *d,
return 0;
}
 
-   if (fwspec->fwnode->type == FWNODE_IRQCHIP) {
+   if (is_fwnode_irqchip(fwspec->fwnode)) {
if(fwspec->param_count != 2)
return -EINVAL;
 
diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index d5e5c5b..4950a71 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -211,6 +211,11 @@ static inline struct fwnode_handle 
*of_node_to_fwnode(struct device_node *node)
return node ? &node->fwnode : NULL;
 }
 
+static inline bool is_fwnode_irqchip(struct fwnode_handle *fwnode)
+{
+   return fwnode && fwnode->type == FWNODE_IRQCHIP;
+}
+
 static inline struct irq_domain *irq_find_matching_host(struct device_node 
*node,
enum 
irq_domain_bus_token bus_token)
 {
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 22aa961..7f34d98 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -70,7 +70,7 @@ void irq_domain_free_fwnode(struct fwnode_handle *fwnode)
 {
struct irqchip_fwid *fwid;
 
-   if (WARN_ON(fwnode->type != FWNODE_IRQCHIP))
+   if (WARN_ON(!is_fwnode_irqchip(fwnode)))
return;
 
fwid = container_of(fwnode, struct irqchip_fwid, fwnode);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 6/6] gicv2m: acpi: Introducing GICv2m ACPI support

2015-10-14 Thread Suravee Suthikulpanit
This patch introduces gicv2m_acpi_init(), which uses information
in MADT GIC MSI frames structure to initialize GICv2m driver.

Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Hanjun Guo 
---
 drivers/irqchip/irq-gic-v2m.c   | 94 +
 drivers/irqchip/irq-gic.c   |  3 ++
 include/linux/irqchip/arm-gic.h |  6 +++
 3 files changed, 103 insertions(+)

diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c
index 7e60f7e..290f5b3 100644
--- a/drivers/irqchip/irq-gic-v2m.c
+++ b/drivers/irqchip/irq-gic-v2m.c
@@ -15,9 +15,11 @@
 
 #define pr_fmt(fmt) "GICv2m: " fmt
 
+#include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -138,6 +140,11 @@ static int gicv2m_irq_gic_domain_alloc(struct irq_domain 
*domain,
fwspec.param[0] = 0;
fwspec.param[1] = hwirq - 32;
fwspec.param[2] = IRQ_TYPE_EDGE_RISING;
+   } else if (is_fwnode_irqchip(domain->parent->fwnode)) {
+   fwspec.fwnode = domain->parent->fwnode;
+   fwspec.param_count = 2;
+   fwspec.param[0] = hwirq;
+   fwspec.param[1] = IRQ_TYPE_EDGE_RISING & IRQ_TYPE_SENSE_MASK;
} else {
return -EINVAL;
}
@@ -255,6 +262,8 @@ static void gicv2m_teardown(void)
kfree(v2m->bm);
iounmap(v2m->base);
of_node_put(to_of_node(v2m->fwnode));
+   if (is_fwnode_irqchip(v2m->fwnode))
+   irq_domain_free_fwnode(v2m->fwnode);
kfree(v2m);
}
 }
@@ -359,6 +368,8 @@ static int __init gicv2m_init_one(struct fwnode_handle 
*fwnode,
 
if (to_of_node(fwnode))
name = to_of_node(fwnode)->name;
+   else
+   name = irq_domain_get_irqchip_fwnode_name(fwnode);
 
pr_info("Frame %s: range[%#lx:%#lx], SPI[%d:%d]\n", name,
(unsigned long)res->start, (unsigned long)res->end,
@@ -415,3 +426,86 @@ int __init gicv2m_of_init(struct device_node *node, struct 
irq_domain *parent)
gicv2m_teardown();
return ret;
 }
+
+#ifdef CONFIG_ACPI
+static int acpi_num_msi;
+
+static struct fwnode_handle *gicv2m_get_fwnode(struct device *dev)
+{
+   struct v2m_data *data;
+
+   if (WARN_ON(acpi_num_msi <= 0))
+   return NULL;
+
+   /* We only return the fwnode of the first MSI frame. */
+   data = list_first_entry_or_null(&v2m_nodes,
+   struct v2m_data, entry);
+   if (!data)
+   return NULL;
+
+   return data->fwnode;
+}
+
+static int __init
+acpi_parse_madt_msi(struct acpi_subtable_header *header,
+   const unsigned long end)
+{
+   int ret;
+   struct resource res;
+   u32 spi_start = 0, nr_spis = 0;
+   struct acpi_madt_generic_msi_frame *m;
+   struct fwnode_handle *fwnode = NULL;
+
+   m = (struct acpi_madt_generic_msi_frame *)header;
+   if (BAD_MADT_ENTRY(m, end))
+   return -EINVAL;
+
+   res.start = m->base_address;
+   res.end = m->base_address + 0x1000;
+
+   if (m->flags & ACPI_MADT_OVERRIDE_SPI_VALUES) {
+   spi_start = m->spi_base;
+   nr_spis = m->spi_count;
+
+   pr_info("ACPI overriding V2M MSI_TYPER (base:%u, num:%u)\n",
+   spi_start, nr_spis);
+   }
+
+   fwnode = irq_domain_alloc_fwnode((void *)m->base_address);
+   if (!fwnode) {
+   pr_err("Unable to allocate GICv2m domain token\n");
+   return -EINVAL;
+   }
+
+   ret = gicv2m_init_one(fwnode, spi_start, nr_spis, &res);
+
+   return ret;
+}
+
+int __init gicv2m_acpi_init(struct irq_domain *parent)
+{
+   int ret;
+
+   if (acpi_num_msi > 0)
+   return 0;
+
+   acpi_num_msi = acpi_table_parse_madt(ACPI_MADT_TYPE_GENERIC_MSI_FRAME,
+ acpi_parse_madt_msi, 0);
+
+   if (acpi_num_msi <= 0)
+   goto err_out;
+
+   ret = gicv2m_allocate_domains(parent);
+   if (ret)
+   goto err_out;
+
+   pci_msi_register_fwnode_provider(&gicv2m_get_fwnode);
+
+   return 0;
+
+err_out:
+   gicv2m_teardown();
+   return -EINVAL;
+}
+
+#endif /* CONFIG_ACPI */
diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index 6685b33..bb3e1f2 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -1329,6 +1329,9 @@ gic_v2_acpi_init(struct acpi_table_header *table)
 
__gic_init_bases(0, -1, dist_base, cpu_base, 0, domain_handle);
 
+   if (IS_ENABLED(CONFIG_ARM_GIC_V2M))
+   gicv2m_acpi_init(gic_data[0].domain);
+
acpi_set_irq_model(ACPI_IRQ_MODEL_GIC, domain_handle);
return 0;
 }
diff --git a/include/linux

[PATCH V2 1/6] pci: msi: Add support to query MSI domain for pci device

2015-10-14 Thread Suravee Suthikulpanit
This patch introduces an interface for irqchip to register a callback,
to provide a way to determine appropriate MSI domain for a pci device.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/pci/msi.c   | 30 ++
 include/linux/msi.h |  7 +++
 2 files changed, 37 insertions(+)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index ddd59fe..2c87843 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1327,4 +1327,34 @@ struct irq_domain 
*pci_msi_create_default_irq_domain(struct fwnode_handle *fwnod
 
return domain;
 }
+
+static struct fwnode_handle *(*pci_msi_get_fwnode_cb)(struct device *dev);
+
+/**
+ * pci_msi_register_fwnode_provider - Register callback to retrieve fwnode
+ * @fn:The interrupt domain to retrieve
+ *
+ * This should be called by irqchip driver, which is the parent of
+ * the MSI domain to provide callback interface to query fwnode.
+ */
+void
+pci_msi_register_fwnode_provider(struct fwnode_handle *(*fn)(struct device *))
+{
+   pci_msi_get_fwnode_cb = fn;
+}
+
+/**
+ * pci_msi_get_fwnode - Query fwnode for MSI controller of the @dev
+ * @dev:   The device that we try to query MSI domain token for
+ *
+ * This is used to query MSI domain token when setting up MSI domain
+ * for a device. Returns fwnode_handle * if token found / NULL if not found
+ */
+struct fwnode_handle *pci_msi_get_fwnode(struct device *dev)
+{
+   if (pci_msi_get_fwnode_cb)
+   return pci_msi_get_fwnode_cb(dev);
+
+   return NULL;
+}
 #endif /* CONFIG_PCI_MSI_IRQ_DOMAIN */
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 32a24b9..ceaebf6 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -3,6 +3,7 @@
 
 #include 
 #include 
+#include 
 
 struct msi_msg {
u32 address_lo; /* low 32 bits of msi message address */
@@ -294,6 +295,12 @@ irq_hw_number_t pci_msi_domain_calc_hwirq(struct pci_dev 
*dev,
  struct msi_desc *desc);
 int pci_msi_domain_check_cap(struct irq_domain *domain,
 struct msi_domain_info *info, struct device *dev);
+
+void
+pci_msi_register_fwnode_provider(struct fwnode_handle *(*fn)(struct device *));
+
+struct fwnode_handle *pci_msi_get_fwnode(struct device *dev);
+
 #endif /* CONFIG_PCI_MSI_IRQ_DOMAIN */
 
 #endif /* LINUX_MSI_H */
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V4 8/8] PCI: ACPI: Add support for PCI device DMA coherency

2015-10-21 Thread Suravee Suthikulpanit
This patch adds support for setting up PCI device DMA coherency from
ACPI _CCA object that should normally be specified in the DSDT node
of its PCI host bridge.

Signed-off-by: Suravee Suthikulpanit 
CC: Bjorn Helgaas 
CC: Catalin Marinas 
CC: Rob Herring 
CC: Will Deacon 
CC: Rafael J. Wysocki 
CC: Murali Karicheri 
---
 drivers/pci/probe.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 09264f8..6e9f7e6 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1642,17 +1642,26 @@ static void pci_set_msi_domain(struct pci_dev *dev)
  * @dev: ptr to pci_dev struct of the PCI device
  *
  * Function to update PCI devices's DMA configuration using the same
- * info from the OF node of host bridge's parent (if any).
+ * info from the OF node or ACPI node of host bridge's parent (if any).
  */
 static void pci_dma_configure(struct pci_dev *dev)
 {
struct device *bridge = pci_get_host_bridge_device(dev);
 
if (IS_ENABLED(CONFIG_OF) && dev->dev.of_node) {
-   if (!bridge->parent)
-   return;
-
-   of_dma_configure(&dev->dev, bridge->parent->of_node);
+   if (bridge->parent)
+   of_dma_configure(&dev->dev,
+bridge->parent->of_node);
+   } else if (has_acpi_companion(bridge)) {
+   struct acpi_device *adev = to_acpi_node(bridge->fwnode);
+   enum dev_dma_attr attr = acpi_get_dma_attr(adev);
+
+   if (attr != DEV_DMA_NOT_SUPPORTED)
+   arch_setup_dma_ops(&dev->dev, 0, 0, NULL,
+  attr == DEV_DMA_COHERENT);
+   else
+   WARN(1, FW_BUG "PCI device %s fail to setup DMA.\n",
+pci_name(dev));
}
 
pci_put_host_bridge_device(bridge);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V4 7/8] PCI: OF: Move of_pci_dma_configure() to pci_dma_configure()

2015-10-21 Thread Suravee Suthikulpanit
This patch move of_pci_dma_configure() to a more generic
pci_dma_configure(), which can be extended by non-OF code (e.g. ACPI).

This has no functional change.

Signed-off-by: Suravee Suthikulpanit 
Acked-by: Rob Herring 
CC: Bjorn Helgaas 
CC: Catalin Marinas 
CC: Will Deacon 
CC: Rafael J. Wysocki 
CC: Murali Karicheri 
---
 drivers/of/of_pci.c| 20 
 drivers/pci/probe.c| 27 +--
 include/linux/of_pci.h |  3 ---
 3 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
index 5751dc5..b66ee4e 100644
--- a/drivers/of/of_pci.c
+++ b/drivers/of/of_pci.c
@@ -117,26 +117,6 @@ int of_get_pci_domain_nr(struct device_node *node)
 }
 EXPORT_SYMBOL_GPL(of_get_pci_domain_nr);
 
-/**
- * of_pci_dma_configure - Setup DMA configuration
- * @dev: ptr to pci_dev struct of the PCI device
- *
- * Function to update PCI devices's DMA configuration using the same
- * info from the OF node of host bridge's parent (if any).
- */
-void of_pci_dma_configure(struct pci_dev *pci_dev)
-{
-   struct device *dev = &pci_dev->dev;
-   struct device *bridge = pci_get_host_bridge_device(pci_dev);
-
-   if (!bridge->parent)
-   return;
-
-   of_dma_configure(dev, bridge->parent->of_node);
-   pci_put_host_bridge_device(bridge);
-}
-EXPORT_SYMBOL_GPL(of_pci_dma_configure);
-
 #if defined(CONFIG_OF_ADDRESS)
 /**
  * of_pci_get_host_bridge_resources - Parse PCI host bridge resources from DT
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index eea8b42..09264f8 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -6,12 +6,14 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include "pci.h"
 
@@ -1635,6 +1637,27 @@ static void pci_set_msi_domain(struct pci_dev *dev)
   dev_get_msi_domain(&dev->bus->dev));
 }
 
+/**
+ * pci_dma_configure - Setup DMA configuration
+ * @dev: ptr to pci_dev struct of the PCI device
+ *
+ * Function to update PCI devices's DMA configuration using the same
+ * info from the OF node of host bridge's parent (if any).
+ */
+static void pci_dma_configure(struct pci_dev *dev)
+{
+   struct device *bridge = pci_get_host_bridge_device(dev);
+
+   if (IS_ENABLED(CONFIG_OF) && dev->dev.of_node) {
+   if (!bridge->parent)
+   return;
+
+   of_dma_configure(&dev->dev, bridge->parent->of_node);
+   }
+
+   pci_put_host_bridge_device(bridge);
+}
+
 void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
 {
int ret;
@@ -1648,7 +1671,7 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus 
*bus)
dev->dev.dma_mask = &dev->dma_mask;
dev->dev.dma_parms = &dev->dma_parms;
dev->dev.coherent_dma_mask = 0xull;
-   of_pci_dma_configure(dev);
+   pci_dma_configure(dev);
 
pci_set_dma_max_seg_size(dev, 65536);
pci_set_dma_seg_boundary(dev, 0x);
diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
index 29fd3fe..ce0e5ab 100644
--- a/include/linux/of_pci.h
+++ b/include/linux/of_pci.h
@@ -16,7 +16,6 @@ int of_pci_get_devfn(struct device_node *np);
 int of_irq_parse_and_map_pci(const struct pci_dev *dev, u8 slot, u8 pin);
 int of_pci_parse_bus_range(struct device_node *node, struct resource *res);
 int of_get_pci_domain_nr(struct device_node *node);
-void of_pci_dma_configure(struct pci_dev *pci_dev);
 #else
 static inline int of_irq_parse_pci(const struct pci_dev *pdev, struct 
of_phandle_args *out_irq)
 {
@@ -51,8 +50,6 @@ of_get_pci_domain_nr(struct device_node *node)
 {
return -1;
 }
-
-static inline void of_pci_dma_configure(struct pci_dev *pci_dev) { }
 #endif
 
 #if defined(CONFIG_OF_ADDRESS)
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V4 2/8] device property: Introducing enum dev_dma_attr

2015-10-21 Thread Suravee Suthikulpanit
A device could have one of the following DMA attributes:
* DMA not supported
* DMA non-coherent
* DMA coherent

So, this patch introduces enum dev_dma_attribute. This will be used by
new APIs introduced in later patches.

Signed-off-by: Suravee Suthikulpanit 
CC: Rafael J. Wysocki 
CC: Bjorn Helgaas 
---
 include/linux/property.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/property.h b/include/linux/property.h
index a59c6ee..522c1bf 100644
--- a/include/linux/property.h
+++ b/include/linux/property.h
@@ -27,6 +27,12 @@ enum dev_prop_type {
DEV_PROP_MAX,
 };
 
+enum dev_dma_attr {
+   DEV_DMA_NOT_SUPPORTED,
+   DEV_DMA_NON_COHERENT,
+   DEV_DMA_COHERENT,
+};
+
 bool device_property_present(struct device *dev, const char *propname);
 int device_property_read_u8_array(struct device *dev, const char *propname,
  u8 *val, size_t nval);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V4 6/8] device property: acpi: Remove unused DMA APIs

2015-10-21 Thread Suravee Suthikulpanit
These DMA APIs are replaced with the newer versions, which return
the enum dev_dma_attr. So, we can safely remove them.

Signed-off-by: Suravee Suthikulpanit 
CC: Rafael J. Wysocki 
---
 drivers/base/property.c  | 13 -
 include/acpi/acpi_bus.h  | 34 --
 include/linux/acpi.h |  5 -
 include/linux/property.h |  2 --
 4 files changed, 54 deletions(-)

diff --git a/drivers/base/property.c b/drivers/base/property.c
index baac186..c79611e 100644
--- a/drivers/base/property.c
+++ b/drivers/base/property.c
@@ -534,19 +534,6 @@ unsigned int device_get_child_node_count(struct device 
*dev)
 }
 EXPORT_SYMBOL_GPL(device_get_child_node_count);
 
-bool device_dma_is_coherent(struct device *dev)
-{
-   bool coherent = false;
-
-   if (IS_ENABLED(CONFIG_OF) && dev->of_node)
-   coherent = of_dma_is_coherent(dev->of_node);
-   else
-   acpi_check_dma(ACPI_COMPANION(dev), &coherent);
-
-   return coherent;
-}
-EXPORT_SYMBOL_GPL(device_dma_is_coherent);
-
 bool device_dma_supported(struct device *dev)
 {
/* For DT, this is always supported.
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 13417d0..273b909 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -378,40 +378,6 @@ struct acpi_device {
void (*remove)(struct acpi_device *);
 };
 
-static inline bool acpi_check_dma(struct acpi_device *adev, bool *coherent)
-{
-   bool ret = false;
-
-   if (!adev)
-   return ret;
-
-   /**
-* Currently, we only support _CCA=1 (i.e. coherent_dma=1)
-* This should be equivalent to specifyig dma-coherent for
-* a device in OF.
-*
-* For the case when _CCA=0 (i.e. coherent_dma=0 && cca_seen=1),
-* There are two cases:
-* case 1. Do not support and disable DMA.
-* case 2. Support but rely on arch-specific cache maintenance for
-* non-coherence DMA operations.
-* Currently, we implement case 2 above.
-*
-* For the case when _CCA is missing (i.e. cca_seen=0) and
-* platform specifies ACPI_CCA_REQUIRED, we do not support DMA,
-* and fallback to arch-specific default handling.
-*
-* See acpi_init_coherency() for more info.
-*/
-   if (adev->flags.coherent_dma ||
-   (adev->flags.cca_seen && IS_ENABLED(CONFIG_ARM64))) {
-   ret = true;
-   if (coherent)
-   *coherent = adev->flags.coherent_dma;
-   }
-   return ret;
-}
-
 static inline bool is_acpi_node(struct fwnode_handle *fwnode)
 {
return fwnode && fwnode->type == FWNODE_ACPI;
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index c47892c..08bd395 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -564,11 +564,6 @@ static inline int acpi_device_modalias(struct device *dev,
return -ENODEV;
 }
 
-static inline bool acpi_check_dma(struct acpi_device *adev, bool *coherent)
-{
-   return false;
-}
-
 static inline bool acpi_dma_supported(struct acpi_device *adev)
 {
return false;
diff --git a/include/linux/property.h b/include/linux/property.h
index bde8de3..8b69a88 100644
--- a/include/linux/property.h
+++ b/include/linux/property.h
@@ -170,8 +170,6 @@ struct property_set {
 
 void device_add_property_set(struct device *dev, struct property_set *pset);
 
-bool device_dma_is_coherent(struct device *dev);
-
 bool device_dma_supported(struct device *dev);
 
 enum dev_dma_attr device_get_dma_attr(struct device *dev);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V4 1/8] acpi: Honor ACPI _CCA attribute setting

2015-10-21 Thread Suravee Suthikulpanit
From: Jeremy Linton 

ACPI configurations can now mark devices as noncoherent,
support that choice.

NOTE: This is required to support USB on ARM Juno Development Board.

Signed-off-by: Jeremy Linton 
Signed-off-by: Suravee Suthikulpanit 
CC: Bjorn Helgaas 
CC: Catalin Marinas 
CC: Rob Herring 
CC: Will Deacon 
CC: Rafael J. Wysocki 
---
 include/acpi/acpi_bus.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 5ba8fb6..5a42204 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -395,7 +395,7 @@ static inline bool acpi_check_dma(struct acpi_device *adev, 
bool *coherent)
 * case 1. Do not support and disable DMA.
 * case 2. Support but rely on arch-specific cache maintenance for
 * non-coherence DMA operations.
-* Currently, we implement case 1 above.
+* Currently, we implement case 2 above.
 *
 * For the case when _CCA is missing (i.e. cca_seen=0) and
 * platform specifies ACPI_CCA_REQUIRED, we do not support DMA,
@@ -403,7 +403,8 @@ static inline bool acpi_check_dma(struct acpi_device *adev, 
bool *coherent)
 *
 * See acpi_init_coherency() for more info.
 */
-   if (adev->flags.coherent_dma) {
+   if (adev->flags.coherent_dma ||
+   (adev->flags.cca_seen && IS_ENABLED(CONFIG_ARM64))) {
ret = true;
if (coherent)
*coherent = adev->flags.coherent_dma;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V4 0/8] PCI: ACPI: Setting up DMA coherency for PCI device from _CCA attribute

2015-10-21 Thread Suravee Suthikulpanit
This patch series adds support to setup DMA coherency for PCI device using
the ACPI _CCA attribute. According to the ACPI spec, the _CCA attribute
is required for ARM64. Therefore, this patch is a pre-req for ACPI PCI
support for ARM64 which is currently in development.  Also, this should
not affect other architectures that does not define 
CONFIG_ACPI_CCA_REQUIRED, since the default value is coherent.

In the process, this series also introduces enum dev_dma_attr and a set
of APIs to query device DMA attribute. These APIs replace the obsolete
device_dma_is_coherent(), and acpi_check_dma().

I have also included a patch from Jeremy posted here:
http://www.spinics.net/lists/linux-usb/msg128582.html

This patch series  has been tested on AMD Seattle RevB platform.
The git tree containing tested code and pre-req patches are posted here:

http://github.com/ssuthiku/linux.git pci-cca-v4

Changes from V3: (https://lkml.org/lkml/2015/8/26/389)
* Clean up suggested by Bjorn
* Introduce enum dev_dma_attr
* Replace device_dma_is_coherent() and acpi_check_dma() with
  new APIs.

Changes from V2: (https://lkml.org/lkml/2015/8/25/549)
* Return -ENOSUPP instead of -1 (per Rafael's suggestion)
* Add WARN() when fail to setup DMA for PCI device when booting
  ACPI (per Arnd's suggestion)
* Added Acked-by from Rob.
* Minor clean up

Changes from V1: (https://lkml.org/lkml/2015/8/13/182)
* Include patch 1 from Jeremy to enable support for _CCA=0
* Clean up acpi_check_dma() per Bjorn suggestions
* Split the original V1 patch into two patches (patch 3 and 4)

Jeremy Linton (1):
  Honor ACPI _CCA attribute setting

Suravee Suthikulpanit (7):
  device property: Introducing enum dev_dma_attr
  acpi: Adding DMA Attribute APIs for ACPI Device
  device property: Adding DMA Attribute APIs for Generic Devices
  device property: acpi: Make use of the new DMA Attribute APIs
  device property: acpi: Remove unused DMA APIs
  PCI: OF: Move of_pci_dma_configure() to pci_dma_configure()
  PCI: ACPI: Add support for PCI device DMA coherency

 drivers/acpi/acpi_platform.c  |  7 +-
 drivers/acpi/glue.c   |  8 +++---
 drivers/acpi/scan.c   | 42 +++
 drivers/base/property.c   | 32 +--
 drivers/crypto/ccp/ccp-platform.c |  9 ++-
 drivers/net/ethernet/amd/xgbe/xgbe-main.c |  9 ++-
 drivers/of/of_pci.c   | 20 ---
 drivers/pci/probe.c   | 36 --
 include/acpi/acpi_bus.h   | 36 +++---
 include/linux/acpi.h  |  7 +-
 include/linux/of_pci.h|  3 ---
 include/linux/property.h  | 10 +++-
 12 files changed, 145 insertions(+), 74 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V4 5/8] device property: acpi: Make use of the new DMA Attribute APIs

2015-10-21 Thread Suravee Suthikulpanit
Now that we have the new DMA attribute APIs, we can replace the older
acpi_check_dma() and device_dma_is_coherent().

Signed-off-by: Suravee Suthikulpanit 
CC: Rafael J. Wysocki 
CC: Tom Lendacky 
CC: Herbert Xu 
CC: David S. Miller 
---
 drivers/acpi/acpi_platform.c  | 7 ++-
 drivers/acpi/glue.c   | 8 +---
 drivers/crypto/ccp/ccp-platform.c | 9 -
 drivers/net/ethernet/amd/xgbe/xgbe-main.c | 9 -
 4 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/acpi/acpi_platform.c b/drivers/acpi/acpi_platform.c
index 06a67d5..296b7a1 100644
--- a/drivers/acpi/acpi_platform.c
+++ b/drivers/acpi/acpi_platform.c
@@ -103,7 +103,12 @@ struct platform_device *acpi_create_platform_device(struct 
acpi_device *adev)
pdevinfo.res = resources;
pdevinfo.num_res = count;
pdevinfo.fwnode = acpi_fwnode_handle(adev);
-   pdevinfo.dma_mask = acpi_check_dma(adev, NULL) ? DMA_BIT_MASK(32) : 0;
+
+   if (acpi_dma_supported(adev))
+   pdevinfo.dma_mask = DMA_BIT_MASK(32);
+   else
+   pdevinfo.dma_mask = 0;
+
pdev = platform_device_register_full(&pdevinfo);
if (IS_ERR(pdev))
dev_err(&adev->dev, "platform device creation failed: %ld\n",
diff --git a/drivers/acpi/glue.c b/drivers/acpi/glue.c
index b9657af..a66e776 100644
--- a/drivers/acpi/glue.c
+++ b/drivers/acpi/glue.c
@@ -168,7 +168,7 @@ int acpi_bind_one(struct device *dev, struct acpi_device 
*acpi_dev)
struct list_head *physnode_list;
unsigned int node_id;
int retval = -EINVAL;
-   bool coherent;
+   enum dev_dma_attr attr;
 
if (has_acpi_companion(dev)) {
if (acpi_dev) {
@@ -225,8 +225,10 @@ int acpi_bind_one(struct device *dev, struct acpi_device 
*acpi_dev)
if (!has_acpi_companion(dev))
ACPI_COMPANION_SET(dev, acpi_dev);
 
-   if (acpi_check_dma(acpi_dev, &coherent))
-   arch_setup_dma_ops(dev, 0, 0, NULL, coherent);
+   attr = acpi_get_dma_attr(acpi_dev);
+   if (attr != DEV_DMA_NOT_SUPPORTED)
+   arch_setup_dma_ops(dev, 0, 0, NULL,
+  attr == DEV_DMA_COHERENT);
 
acpi_physnode_link_name(physical_node_name, node_id);
retval = sysfs_create_link(&acpi_dev->dev.kobj, &dev->kobj,
diff --git a/drivers/crypto/ccp/ccp-platform.c 
b/drivers/crypto/ccp/ccp-platform.c
index bb241c3..edd9e16 100644
--- a/drivers/crypto/ccp/ccp-platform.c
+++ b/drivers/crypto/ccp/ccp-platform.c
@@ -96,6 +96,7 @@ static int ccp_platform_probe(struct platform_device *pdev)
struct ccp_platform *ccp_platform;
struct device *dev = &pdev->dev;
struct acpi_device *adev = ACPI_COMPANION(dev);
+   enum dev_dma_attr attr;
struct resource *ior;
int ret;
 
@@ -122,13 +123,19 @@ static int ccp_platform_probe(struct platform_device 
*pdev)
}
ccp->io_regs = ccp->io_map;
 
+   attr = device_get_dma_attr(dev);
+   if (attr == DEV_DMA_NOT_SUPPORTED) {
+   dev_err(dev, "DMA is not supported");
+   goto e_err;
+   }
+   ccp_platform->coherent = (attr == DEV_DMA_COHERENT);
+
ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(48));
if (ret) {
dev_err(dev, "dma_set_mask_and_coherent failed (%d)\n", ret);
goto e_err;
}
 
-   ccp_platform->coherent = device_dma_is_coherent(ccp->dev);
if (ccp_platform->coherent)
ccp->axcache = CACHE_WB_NO_ALLOC;
else
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-main.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-main.c
index e83bd76..b596c7f 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-main.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-main.c
@@ -342,6 +342,7 @@ static int xgbe_probe(struct platform_device *pdev)
struct resource *res;
const char *phy_mode;
unsigned int i, phy_memnum, phy_irqnum;
+   enum dev_dma_attr attr;
int ret;
 
DBGPR("--> xgbe_probe\n");
@@ -608,8 +609,14 @@ static int xgbe_probe(struct platform_device *pdev)
if (ret)
goto err_io;
 
+   attr = device_get_dma_attr(dev);
+   if (attr == DEV_DMA_NOT_SUPPORTED) {
+   dev_err(dev, "DMA is not supported");
+   goto err_io;
+   }
+   pdata->coherent = (attr == DEV_DMA_COHERENT);
+
/* Set the DMA coherency values */
-   pdata->coherent = device_dma_is_coherent(pdata->dev);
if (pdata->coherent) {
pdata->axdomain = XGBE_DMA_OS_AXDOMAIN;
pdata->arcache = XGBE_DMA_OS_ARCACHE;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V4 4/8] device property: Adding DMA Attribute APIs for Generic Devices

2015-10-21 Thread Suravee Suthikulpanit
The function device_dma_is_coherent() does not sufficiently
communicate device DMA attributes. Instead, this patch introduces
device_get_dma_attr(), which returns enum dev_dma_attr.
It replaces the acpi_check_dma(), which will be removed in
subsequent patch.

This also provides a convenient function, device_dma_supported(),
to check DMA support of the specified device.

Signed-off-by: Suravee Suthikulpanit 
CC: Rafael J. Wysocki 
---
 drivers/base/property.c  | 29 +
 include/linux/property.h |  4 
 2 files changed, 33 insertions(+)

diff --git a/drivers/base/property.c b/drivers/base/property.c
index 2d75366..baac186 100644
--- a/drivers/base/property.c
+++ b/drivers/base/property.c
@@ -547,6 +547,35 @@ bool device_dma_is_coherent(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(device_dma_is_coherent);
 
+bool device_dma_supported(struct device *dev)
+{
+   /* For DT, this is always supported.
+* For ACPI, this depends on CCA, which
+* is determined by the acpi_dma_supported().
+*/
+   if (IS_ENABLED(CONFIG_OF) && dev->of_node)
+   return true;
+
+   return acpi_dma_supported(ACPI_COMPANION(dev));
+}
+EXPORT_SYMBOL_GPL(device_dma_supported);
+
+enum dev_dma_attr device_get_dma_attr(struct device *dev)
+{
+   enum dev_dma_attr attr = DEV_DMA_NOT_SUPPORTED;
+
+   if (IS_ENABLED(CONFIG_OF) && dev->of_node) {
+   if (of_dma_is_coherent(dev->of_node))
+   attr = DEV_DMA_COHERENT;
+   else
+   attr = DEV_DMA_NON_COHERENT;
+   } else
+   attr = acpi_get_dma_attr(ACPI_COMPANION(dev));
+
+   return attr;
+}
+EXPORT_SYMBOL_GPL(device_get_dma_attr);
+
 /**
  * device_get_phy_mode - Get phy mode for given device
  * @dev:   Pointer to the given device
diff --git a/include/linux/property.h b/include/linux/property.h
index 522c1bf..bde8de3 100644
--- a/include/linux/property.h
+++ b/include/linux/property.h
@@ -172,6 +172,10 @@ void device_add_property_set(struct device *dev, struct 
property_set *pset);
 
 bool device_dma_is_coherent(struct device *dev);
 
+bool device_dma_supported(struct device *dev);
+
+enum dev_dma_attr device_get_dma_attr(struct device *dev);
+
 int device_get_phy_mode(struct device *dev);
 
 void *device_get_mac_address(struct device *dev, char *addr, int alen);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V4 3/8] acpi: Adding DMA Attribute APIs for ACPI Device

2015-10-21 Thread Suravee Suthikulpanit
Adding acpi_get_dma_attr() to query DMA attributes of ACPI devices.
It returns the enum dev_dma_attr, which communicates DMA information
more clearly. This API replaces the acpi_check_dma(), which will be
removed in subsequent patch.

This patch also provides a convenient function, acpi_dma_supported(),
to check DMA support of the specified ACPI device.

Signed-off-by: Suravee Suthikulpanit 
Suggested-by: Bjorn Helgaas 
CC: Rafael J. Wysocki 
---
 drivers/acpi/scan.c | 42 ++
 include/acpi/acpi_bus.h |  3 +++
 include/linux/acpi.h| 10 ++
 3 files changed, 55 insertions(+)

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 01136b8..3be213e 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -1328,6 +1328,48 @@ void acpi_free_pnp_ids(struct acpi_device_pnp *pnp)
kfree(pnp->unique_id);
 }
 
+/**
+ * acpi_dma_supported - Check DMA support for the specified device.
+ * @adev: The pointer to acpi device
+ *
+ * Return false if DMA is not supported. Otherwise, return true
+ */
+bool acpi_dma_supported(struct acpi_device *adev)
+{
+   if (!adev)
+   return false;
+
+   if (adev->flags.cca_seen)
+   return true;
+
+   /*
+   * Per ACPI 6.0 sec 6.2.17, assume devices can do cache-coherent
+   * DMA on "Intel platforms".  Presumably that includes all x86 and
+   * ia64, and other arches will set CONFIG_ACPI_CCA_REQUIRED=y.
+   */
+   if (!IS_ENABLED(CONFIG_ACPI_CCA_REQUIRED))
+   return true;
+
+   return false;
+}
+
+/**
+ * acpi_get_dma_attr - Check the supported DMA attr for the specified device.
+ * @adev: The pointer to acpi device
+ *
+ * Return enum dev_dma_attr.
+ */
+enum dev_dma_attr acpi_get_dma_attr(struct acpi_device *adev)
+{
+   if (!acpi_dma_supported(adev))
+   return DEV_DMA_NOT_SUPPORTED;
+
+   if (adev->flags.coherent_dma)
+   return DEV_DMA_COHERENT;
+   else
+   return DEV_DMA_NON_COHERENT;
+}
+
 static void acpi_init_coherency(struct acpi_device *adev)
 {
unsigned long long cca = 0;
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 5a42204..13417d0 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -567,6 +567,9 @@ struct acpi_pci_root {
 
 /* helper */
 
+bool acpi_dma_supported(struct acpi_device *adev);
+enum dev_dma_attr acpi_get_dma_attr(struct acpi_device *adev);
+
 struct acpi_device *acpi_find_child_device(struct acpi_device *parent,
   u64 address, bool check_children);
 int acpi_is_root_bridge(acpi_handle);
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index dd39202..c47892c 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -569,6 +569,16 @@ static inline bool acpi_check_dma(struct acpi_device 
*adev, bool *coherent)
return false;
 }
 
+static inline bool acpi_dma_supported(struct acpi_device *adev)
+{
+   return false;
+}
+
+static inline enum dev_dma_attr acpi_get_dma_attr(struct acpi_device *adev)
+{
+   return DEV_DMA_NOT_SUPPORTED;
+}
+
 #define ACPI_PTR(_ptr) (NULL)
 
 #endif /* !CONFIG_ACPI */
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V3 1/6] pci: msi: Add support to query MSI domain for pci device

2015-10-21 Thread Suravee Suthikulpanit
This patch introduces an interface for irqchip to register a callback,
to provide a way to determine appropriate MSI domain for a pci device.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/pci/msi.c   | 30 ++
 include/linux/msi.h |  7 +++
 2 files changed, 37 insertions(+)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index ddd59fe..2c87843 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1327,4 +1327,34 @@ struct irq_domain 
*pci_msi_create_default_irq_domain(struct fwnode_handle *fwnod
 
return domain;
 }
+
+static struct fwnode_handle *(*pci_msi_get_fwnode_cb)(struct device *dev);
+
+/**
+ * pci_msi_register_fwnode_provider - Register callback to retrieve fwnode
+ * @fn:The interrupt domain to retrieve
+ *
+ * This should be called by irqchip driver, which is the parent of
+ * the MSI domain to provide callback interface to query fwnode.
+ */
+void
+pci_msi_register_fwnode_provider(struct fwnode_handle *(*fn)(struct device *))
+{
+   pci_msi_get_fwnode_cb = fn;
+}
+
+/**
+ * pci_msi_get_fwnode - Query fwnode for MSI controller of the @dev
+ * @dev:   The device that we try to query MSI domain token for
+ *
+ * This is used to query MSI domain token when setting up MSI domain
+ * for a device. Returns fwnode_handle * if token found / NULL if not found
+ */
+struct fwnode_handle *pci_msi_get_fwnode(struct device *dev)
+{
+   if (pci_msi_get_fwnode_cb)
+   return pci_msi_get_fwnode_cb(dev);
+
+   return NULL;
+}
 #endif /* CONFIG_PCI_MSI_IRQ_DOMAIN */
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 32a24b9..ceaebf6 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -3,6 +3,7 @@
 
 #include 
 #include 
+#include 
 
 struct msi_msg {
u32 address_lo; /* low 32 bits of msi message address */
@@ -294,6 +295,12 @@ irq_hw_number_t pci_msi_domain_calc_hwirq(struct pci_dev 
*dev,
  struct msi_desc *desc);
 int pci_msi_domain_check_cap(struct irq_domain *domain,
 struct msi_domain_info *info, struct device *dev);
+
+void
+pci_msi_register_fwnode_provider(struct fwnode_handle *(*fn)(struct device *));
+
+struct fwnode_handle *pci_msi_get_fwnode(struct device *dev);
+
 #endif /* CONFIG_PCI_MSI_IRQ_DOMAIN */
 
 #endif /* LINUX_MSI_H */
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V3 6/6] gicv2m: acpi: Introducing GICv2m ACPI support

2015-10-21 Thread Suravee Suthikulpanit
This patch introduces gicv2m_acpi_init(), which uses information
in MADT GIC MSI frames structure to initialize GICv2m driver.

Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Hanjun Guo 
---
 drivers/irqchip/irq-gic-v2m.c   | 95 +
 drivers/irqchip/irq-gic.c   |  3 ++
 include/linux/irqchip/arm-gic.h |  4 ++
 3 files changed, 102 insertions(+)

diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c
index 7e60f7e..4f52e9a 100644
--- a/drivers/irqchip/irq-gic-v2m.c
+++ b/drivers/irqchip/irq-gic-v2m.c
@@ -15,9 +15,11 @@
 
 #define pr_fmt(fmt) "GICv2m: " fmt
 
+#include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -138,6 +140,11 @@ static int gicv2m_irq_gic_domain_alloc(struct irq_domain 
*domain,
fwspec.param[0] = 0;
fwspec.param[1] = hwirq - 32;
fwspec.param[2] = IRQ_TYPE_EDGE_RISING;
+   } else if (is_fwnode_irqchip(domain->parent->fwnode)) {
+   fwspec.fwnode = domain->parent->fwnode;
+   fwspec.param_count = 2;
+   fwspec.param[0] = hwirq;
+   fwspec.param[1] = IRQ_TYPE_EDGE_RISING;
} else {
return -EINVAL;
}
@@ -255,6 +262,8 @@ static void gicv2m_teardown(void)
kfree(v2m->bm);
iounmap(v2m->base);
of_node_put(to_of_node(v2m->fwnode));
+   if (is_fwnode_irqchip(v2m->fwnode))
+   irq_domain_free_fwnode(v2m->fwnode);
kfree(v2m);
}
 }
@@ -359,6 +368,8 @@ static int __init gicv2m_init_one(struct fwnode_handle 
*fwnode,
 
if (to_of_node(fwnode))
name = to_of_node(fwnode)->name;
+   else
+   name = irq_domain_get_irqchip_fwnode_name(fwnode);
 
pr_info("Frame %s: range[%#lx:%#lx], SPI[%d:%d]\n", name,
(unsigned long)res->start, (unsigned long)res->end,
@@ -415,3 +426,87 @@ int __init gicv2m_of_init(struct device_node *node, struct 
irq_domain *parent)
gicv2m_teardown();
return ret;
 }
+
+#ifdef CONFIG_ACPI
+static int acpi_num_msi;
+
+static struct fwnode_handle *gicv2m_get_fwnode(struct device *dev)
+{
+   struct v2m_data *data;
+
+   if (WARN_ON(acpi_num_msi <= 0))
+   return NULL;
+
+   /* We only return the fwnode of the first MSI frame. */
+   data = list_first_entry_or_null(&v2m_nodes, struct v2m_data, entry);
+   if (!data)
+   return NULL;
+
+   return data->fwnode;
+}
+
+static int __init
+acpi_parse_madt_msi(struct acpi_subtable_header *header,
+   const unsigned long end)
+{
+   int ret;
+   struct resource res;
+   u32 spi_start = 0, nr_spis = 0;
+   struct acpi_madt_generic_msi_frame *m;
+   struct fwnode_handle *fwnode = NULL;
+
+   m = (struct acpi_madt_generic_msi_frame *)header;
+   if (BAD_MADT_ENTRY(m, end))
+   return -EINVAL;
+
+   res.start = m->base_address;
+   res.end = m->base_address + 0x1000;
+
+   if (m->flags & ACPI_MADT_OVERRIDE_SPI_VALUES) {
+   spi_start = m->spi_base;
+   nr_spis = m->spi_count;
+
+   pr_info("ACPI overriding V2M MSI_TYPER (base:%u, num:%u)\n",
+   spi_start, nr_spis);
+   }
+
+   fwnode = irq_domain_alloc_fwnode((void *)m->base_address);
+   if (!fwnode) {
+   pr_err("Unable to allocate GICv2m domain token\n");
+   return -EINVAL;
+   }
+
+   ret = gicv2m_init_one(fwnode, spi_start, nr_spis, &res);
+   if (ret)
+   irq_domain_free_fwnode(fwnode);
+
+   return ret;
+}
+
+int __init gicv2m_acpi_init(struct irq_domain *parent)
+{
+   int ret;
+
+   if (acpi_num_msi > 0)
+   return 0;
+
+   acpi_num_msi = acpi_table_parse_madt(ACPI_MADT_TYPE_GENERIC_MSI_FRAME,
+ acpi_parse_madt_msi, 0);
+
+   if (acpi_num_msi <= 0)
+   goto err_out;
+
+   ret = gicv2m_allocate_domains(parent);
+   if (ret)
+   goto err_out;
+
+   pci_msi_register_fwnode_provider(&gicv2m_get_fwnode);
+
+   return 0;
+
+err_out:
+   gicv2m_teardown();
+   return -EINVAL;
+}
+
+#endif /* CONFIG_ACPI */
diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index 6685b33..bb3e1f2 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -1329,6 +1329,9 @@ gic_v2_acpi_init(struct acpi_table_header *table)
 
__gic_init_bases(0, -1, dist_base, cpu_base, 0, domain_handle);
 
+   if (IS_ENABLED(CONFIG_ARM_GIC_V2M))
+   gicv2m_acpi_init(gic_data[0].domain);
+
acpi_set_irq_model(ACPI_IRQ_MODEL_GIC, domain_handle);
return 0;
 }
diff --git a/include/linux/irq

[PATCH V3 5/6] gicv2m: Refactor to prepare for ACPI support

2015-10-21 Thread Suravee Suthikulpanit
This patch replaces the struct device_node with struct fwnode_handle
since this structure is common between DT and ACPI.

It also refactors gicv2m_init_one() to prepare for ACPI support.
There should be no functional changes.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/irqchip/irq-gic-v2m.c | 57 +++
 1 file changed, 36 insertions(+), 21 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c
index 87f8d10..7e60f7e 100644
--- a/drivers/irqchip/irq-gic-v2m.c
+++ b/drivers/irqchip/irq-gic-v2m.c
@@ -55,7 +55,7 @@ static DEFINE_SPINLOCK(v2m_lock);
 
 struct v2m_data {
struct list_head entry;
-   struct device_node *node;
+   struct fwnode_handle *fwnode;
struct resource res;/* GICv2m resource */
void __iomem *base; /* GICv2m virt address */
u32 spi_start;  /* The SPI number that MSIs start */
@@ -254,7 +254,7 @@ static void gicv2m_teardown(void)
list_del(&v2m->entry);
kfree(v2m->bm);
iounmap(v2m->base);
-   of_node_put(v2m->node);
+   of_node_put(to_of_node(v2m->fwnode));
kfree(v2m);
}
 }
@@ -268,7 +268,7 @@ static int gicv2m_allocate_domains(struct irq_domain 
*parent)
if (!v2m)
return 0;
 
-   inner_domain = irq_domain_create_tree(of_node_to_fwnode(v2m->node),
+   inner_domain = irq_domain_create_tree(v2m->fwnode,
  &gicv2m_domain_ops, v2m);
if (!inner_domain) {
pr_err("Failed to create GICv2m domain\n");
@@ -277,10 +277,10 @@ static int gicv2m_allocate_domains(struct irq_domain 
*parent)
 
inner_domain->bus_token = DOMAIN_BUS_NEXUS;
inner_domain->parent = parent;
-   pci_domain = pci_msi_create_irq_domain(of_node_to_fwnode(v2m->node),
+   pci_domain = pci_msi_create_irq_domain(v2m->fwnode,
   &gicv2m_msi_domain_info,
   inner_domain);
-   plat_domain = 
platform_msi_create_irq_domain(of_node_to_fwnode(v2m->node),
+   plat_domain = platform_msi_create_irq_domain(v2m->fwnode,
 &gicv2m_pmsi_domain_info,
 inner_domain);
if (!pci_domain || !plat_domain) {
@@ -296,11 +296,13 @@ static int gicv2m_allocate_domains(struct irq_domain 
*parent)
return 0;
 }
 
-static int __init gicv2m_init_one(struct device_node *node,
- struct irq_domain *parent)
+static int __init gicv2m_init_one(struct fwnode_handle *fwnode,
+ u32 spi_start, u32 nr_spis,
+ struct resource *res)
 {
int ret;
struct v2m_data *v2m;
+   const char *name = NULL;
 
v2m = kzalloc(sizeof(struct v2m_data), GFP_KERNEL);
if (!v2m) {
@@ -309,13 +311,9 @@ static int __init gicv2m_init_one(struct device_node *node,
}
 
INIT_LIST_HEAD(&v2m->entry);
-   v2m->node = node;
+   v2m->fwnode = fwnode;
 
-   ret = of_address_to_resource(node, 0, &v2m->res);
-   if (ret) {
-   pr_err("Failed to allocate v2m resource.\n");
-   goto err_free_v2m;
-   }
+   memcpy(&v2m->res, res, sizeof(struct resource));
 
v2m->base = ioremap(v2m->res.start, resource_size(&v2m->res));
if (!v2m->base) {
@@ -324,10 +322,9 @@ static int __init gicv2m_init_one(struct device_node *node,
goto err_free_v2m;
}
 
-   if (!of_property_read_u32(node, "arm,msi-base-spi", &v2m->spi_start) &&
-   !of_property_read_u32(node, "arm,msi-num-spis", &v2m->nr_spis)) {
-   pr_info("Overriding V2M MSI_TYPER (base:%u, num:%u)\n",
-   v2m->spi_start, v2m->nr_spis);
+   if (spi_start && nr_spis) {
+   v2m->spi_start = spi_start;
+   v2m->nr_spis = nr_spis;
} else {
u32 typer = readl_relaxed(v2m->base + V2M_MSI_TYPER);
 
@@ -359,10 +356,13 @@ static int __init gicv2m_init_one(struct device_node 
*node,
}
 
list_add_tail(&v2m->entry, &v2m_nodes);
-   pr_info("Node %s: range[%#lx:%#lx], SPI[%d:%d]\n", node->name,
-   (unsigned long)v2m->res.start, (unsigned long)v2m->res.end,
-   v2m->spi_start, (v2m->spi_start + v2m->nr_spis));
 
+   if (to_of_node(fwnode))
+   name = to_of_node(fwnode)->name;
+
+   pr_info("Frame %s: range[%#lx:%#lx], SPI[%d:%d]\n", name,
+   (unsigned long)res->start, (unsigned long)res->

[PATCH V3 2/6] acpi: pci: Setup MSI domain for ACPI based pci devices

2015-10-21 Thread Suravee Suthikulpanit
This patch introduces pci_host_bridge_acpi_msi_domain(), which returns
the MSI domain of the specified PCI host bridge with DOMAIN_BUS_PCI_MSI
bus token. Then, it is assigned to pci device.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/pci/pci-acpi.c | 13 +
 drivers/pci/probe.c|  2 ++
 include/linux/pci.h|  7 +++
 3 files changed, 22 insertions(+)

diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index a32ba75..0e21ef4 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -9,7 +9,9 @@
 
 #include 
 #include 
+#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -689,6 +691,17 @@ static struct acpi_bus_type acpi_pci_bus = {
.cleanup = pci_acpi_cleanup,
 };
 
+struct irq_domain *pci_host_bridge_acpi_msi_domain(struct pci_bus *bus)
+{
+   struct irq_domain *dom = NULL;
+   struct fwnode_handle *fwnode = pci_msi_get_fwnode(&bus->dev);
+
+   if (fwnode)
+   dom = irq_find_matching_fwnode(fwnode,
+  DOMAIN_BUS_PCI_MSI);
+   return dom;
+}
+
 static int __init acpi_pci_init(void)
 {
int ret;
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 0dbc7fb..bea1840 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -671,6 +671,8 @@ static struct irq_domain *pci_host_bridge_msi_domain(struct 
pci_bus *bus)
 * should be called from here.
 */
d = pci_host_bridge_of_msi_domain(bus);
+   if (!d)
+   d = pci_host_bridge_acpi_msi_domain(bus);
 
return d;
 }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e90eb22..4a7f6a9 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1925,6 +1925,13 @@ static inline struct irq_domain *
 pci_host_bridge_of_msi_domain(struct pci_bus *bus) { return NULL; }
 #endif  /* CONFIG_OF */
 
+#ifdef CONFIG_ACPI
+struct irq_domain *pci_host_bridge_acpi_msi_domain(struct pci_bus *bus);
+#else
+static inline struct irq_domain *
+pci_host_bridge_acpi_msi_domain(struct pci_bus *bus) { return NULL; }
+#endif
+
 #ifdef CONFIG_EEH
 static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev)
 {
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V3 3/6] irqdomain: introduce is_fwnode_irqchip helper

2015-10-21 Thread Suravee Suthikulpanit
Since there will be several places checking if fwnode.type
is equal FWNODE_IRQCHIP, this patch adds a convenient function
for this purpose.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/irqchip/irq-gic.c | 2 +-
 include/linux/irqdomain.h | 5 +
 kernel/irq/irqdomain.c| 2 +-
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index 1d0e768..6685b33 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -939,7 +939,7 @@ static int gic_irq_domain_translate(struct irq_domain *d,
return 0;
}
 
-   if (fwspec->fwnode->type == FWNODE_IRQCHIP) {
+   if (is_fwnode_irqchip(fwspec->fwnode)) {
if(fwspec->param_count != 2)
return -EINVAL;
 
diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index d5e5c5b..4950a71 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -211,6 +211,11 @@ static inline struct fwnode_handle 
*of_node_to_fwnode(struct device_node *node)
return node ? &node->fwnode : NULL;
 }
 
+static inline bool is_fwnode_irqchip(struct fwnode_handle *fwnode)
+{
+   return fwnode && fwnode->type == FWNODE_IRQCHIP;
+}
+
 static inline struct irq_domain *irq_find_matching_host(struct device_node 
*node,
enum 
irq_domain_bus_token bus_token)
 {
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 22aa961..7f34d98 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -70,7 +70,7 @@ void irq_domain_free_fwnode(struct fwnode_handle *fwnode)
 {
struct irqchip_fwid *fwid;
 
-   if (WARN_ON(fwnode->type != FWNODE_IRQCHIP))
+   if (WARN_ON(!is_fwnode_irqchip(fwnode)))
return;
 
fwid = container_of(fwnode, struct irqchip_fwid, fwnode);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V3 0/6] gicv2m: acpi: Add ACPI support for GICv2m MSI

2015-10-21 Thread Suravee Suthikulpanit
This patch series has been forked from the following patch series since
it no longer depends on the rest of the patches.

  [PATCH v4 00/10] ACPI GIC Self-probing, GICv2m and GICv3 support
  https://lkml.org/lkml/2015/7/29/234

It has been ported to use the newly introduced device fwnode_handle 
for ACPI irqdmain introduced by Marc in the following patch series:

  [PATCH v2 00/17] Divorcing irqdomain and device_node
  http://git.kernel.org/cgit/linux/kernel/git/maz/arm-platforms.git 
irq/irq-domain-fwnode-v2

The following git branch contains the submitted patches along with
the pre-requsite patches (mainly for ARM64 PCI support for ACPI).

  https://github.com/ssuthiku/linux.git v2m-multiframe-v3

This has been tested on AMD Seattle (Overdrive) RevB system. 

NOTE: I have not tested ACPI GICv2m multiframe support since
I don't have access to such system. Any helps are appreciated.

Thanks,
Suravee

Changes from V2: (https://lkml.org/lkml/2015/10/14/1010)
  - Minor clean up from Tomasz review comment in patch 6/6.

Changes from V1: (https://lkml.org/lkml/2015/10/13/859)
  - Rebase on top of Marc's patch to addng support for multiple MSI frames
(https://lkml.org/lkml/2015/10/14/271)
  - Adding fwnode convenient functions (patch 3 and 4)


Suravee Suthikulpanit (6):
  pci: msi: Add support to query MSI domain for pci device
  acpi: pci: Setup MSI domain for ACPI based pci devices
  irqdomain: introduce is_fwnode_irqchip helper
  irqdomain: Introduce irq_domain_get_irqchip_fwnode_name helper
function
  gicv2m: Refactor to prepare for ACPI support
  gicv2m: acpi: Introducing GICv2m ACPI support

 drivers/irqchip/irq-gic-v2m.c   | 152 ++--
 drivers/irqchip/irq-gic.c   |   5 +-
 drivers/pci/msi.c   |  30 
 drivers/pci/pci-acpi.c  |  13 
 drivers/pci/probe.c |   2 +
 include/linux/irqchip/arm-gic.h |   4 ++
 include/linux/irqdomain.h   |   6 ++
 include/linux/msi.h |   7 ++
 include/linux/pci.h |   7 ++
 kernel/irq/irqdomain.c  |  20 +-
 10 files changed, 223 insertions(+), 23 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V3 4/6] irqdomain: Introduce irq_domain_get_irqchip_fwnode_name helper function

2015-10-21 Thread Suravee Suthikulpanit
This patch adds an accessor function to retrieve struct irqchip_fwid.name.

Signed-off-by: Suravee Suthikulpanit 
---
 include/linux/irqdomain.h |  1 +
 kernel/irq/irqdomain.c| 18 ++
 2 files changed, 19 insertions(+)

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index 4950a71..006633d 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -187,6 +187,7 @@ static inline struct device_node 
*irq_domain_get_of_node(struct irq_domain *d)
 #ifdef CONFIG_IRQ_DOMAIN
 struct fwnode_handle *irq_domain_alloc_fwnode(void *data);
 void irq_domain_free_fwnode(struct fwnode_handle *fwnode);
+const char *irq_domain_get_irqchip_fwnode_name(struct fwnode_handle *fwnode);
 struct irq_domain *__irq_domain_add(struct fwnode_handle *fwnode, int size,
irq_hw_number_t hwirq_max, int direct_max,
const struct irq_domain_ops *ops,
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 7f34d98..a8c1cf6 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -79,6 +79,24 @@ void irq_domain_free_fwnode(struct fwnode_handle *fwnode)
 }
 
 /**
+ * irq_domain_get_irqchip_fwnode_name - Retrieve associated name of
+ *  specified irqchip fwnode
+ * @fwnode: Specified fwnode_handle
+ *
+ * Returns associated name of the specified fwnode, or NULL on failure.
+ */
+const char *irq_domain_get_irqchip_fwnode_name(struct fwnode_handle *fwnode)
+{
+   struct irqchip_fwid *fwid;
+
+   if (!is_fwnode_irqchip(fwnode))
+   return NULL;
+
+   fwid = container_of(fwnode, struct irqchip_fwid, fwnode);
+   return fwid->name;
+}
+
+/**
  * __irq_domain_add() - Allocate a new irq_domain data structure
  * @of_node: optional device-tree node of the interrupt controller
  * @size: Size of linear map; 0 for radix mapping only
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 2/4] ACPI/scan: Clean up acpi_check_dma

2015-10-20 Thread Suravee Suthikulpanit

Hi Bjorn/Rafael,

Let me redo the patch with enum then. At least, that's more clear to 
everyone.


Thanks,

Suravee

On 10/19/15 21:17, Bjorn Helgaas wrote:

On Tue, Oct 13, 2015 at 06:53:28PM -0500, Suravee Suthikulanit wrote:

Bjorn / Rafael,

On 10/13/2015 10:52 AM, Suravee Suthikulpanit wrote:


On 09/14/2015 09:34 AM, Bjorn Helgaas wrote:

[..]
I think acpi_check_dma_coherency() is better, but only slightly.  It
still doesn't give a hint about the *sense* of the return value.  I
think it'd be easier to read if there were two functions, e.g.,


I have been going back-and-forth between the current version, and the
two-function-approach in the past. I can definitely go with this route
if you would prefer. Although, if acpi_dma_is_coherent() == 0, it would
be ambiguous whether DMA is not supported or non-coherent DMA is
supported. Then, we would need to call acpi_dma_is_supported() to find
out. So, that's okay with you?


Thinking about this again, I still think having one API (which can
tell whether DMA is supported or not, and if so whether it is
coherent or non-coherent) would be the least confusing and least
error prone.

What if we would just have:

 enum dev_dma_type acpi_get_dev_dma_type(struct acpi_device *adev);

where:
 enum dev_dma_type {
 DEV_DMA_NOT_SUPPORTED,
 DEV_DMA_NON_COHERENT,
 DEV_DMA_COHERENT,
 };

This would probably mean that we should modify
drivers/base/property.c to replace:
 bool device_dma_is_coherent()
to:
 enum dev_dma_type device_get_dma_type()

We used to discuss the enum approach in the past
(https://lkml.org/lkml/2015/8/25/868). But we only considered at the
ACPI level at the time. Actually, this should also reflect in the
property.c.

At this point, only drivers/crypto/ccp/ccp-platform.c and
drivers/net/ethernet/amd/xgbe/xgbe-main.c are calling the
device_dma_is_coherent(). So, it should be easy to change this API.


OK, I'm fine with either the enum or Rafael's 0/1/-ENOTSUPP idea.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] iommu/amd: Add support for higher 64-bit IOMMU Control Register

2018-06-22 Thread Suravee Suthikulpanit
Currently, the driver only supports lower 32-bit of IOMMU Control register.
However, newer AMD IOMMU specification has extended this register
to 64-bit. Therefore, replace the accessing API with the 64-bit version.

Cc: Joerg Roedel 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd_iommu_init.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 904c575..7d494f2 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -280,9 +280,9 @@ static void clear_translation_pre_enabled(struct amd_iommu 
*iommu)
 
 static void init_translation_status(struct amd_iommu *iommu)
 {
-   u32 ctrl;
+   u64 ctrl;
 
-   ctrl = readl(iommu->mmio_base + MMIO_CONTROL_OFFSET);
+   ctrl = readq(iommu->mmio_base + MMIO_CONTROL_OFFSET);
if (ctrl & (1<flags |= AMD_IOMMU_FLAG_TRANS_PRE_ENABLED;
 }
@@ -386,30 +386,30 @@ static void iommu_set_device_table(struct amd_iommu 
*iommu)
 /* Generic functions to enable/disable certain features of the IOMMU. */
 static void iommu_feature_enable(struct amd_iommu *iommu, u8 bit)
 {
-   u32 ctrl;
+   u64 ctrl;
 
-   ctrl = readl(iommu->mmio_base + MMIO_CONTROL_OFFSET);
-   ctrl |= (1 << bit);
-   writel(ctrl, iommu->mmio_base + MMIO_CONTROL_OFFSET);
+   ctrl = readq(iommu->mmio_base +  MMIO_CONTROL_OFFSET);
+   ctrl |= (1ULL << bit);
+   writeq(ctrl, iommu->mmio_base +  MMIO_CONTROL_OFFSET);
 }
 
 static void iommu_feature_disable(struct amd_iommu *iommu, u8 bit)
 {
-   u32 ctrl;
+   u64 ctrl;
 
-   ctrl = readl(iommu->mmio_base + MMIO_CONTROL_OFFSET);
-   ctrl &= ~(1 << bit);
-   writel(ctrl, iommu->mmio_base + MMIO_CONTROL_OFFSET);
+   ctrl = readq(iommu->mmio_base + MMIO_CONTROL_OFFSET);
+   ctrl &= ~(1ULL << bit);
+   writeq(ctrl, iommu->mmio_base + MMIO_CONTROL_OFFSET);
 }
 
 static void iommu_set_inv_tlb_timeout(struct amd_iommu *iommu, int timeout)
 {
-   u32 ctrl;
+   u64 ctrl;
 
-   ctrl = readl(iommu->mmio_base + MMIO_CONTROL_OFFSET);
+   ctrl = readq(iommu->mmio_base + MMIO_CONTROL_OFFSET);
ctrl &= ~CTRL_INV_TO_MASK;
ctrl |= (timeout << CONTROL_INV_TIMEOUT) & CTRL_INV_TO_MASK;
-   writel(ctrl, iommu->mmio_base + MMIO_CONTROL_OFFSET);
+   writeq(ctrl, iommu->mmio_base + MMIO_CONTROL_OFFSET);
 }
 
 /* Function to enable the hardware */
-- 
2.7.4



Re: [PATCH v3] vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs

2018-01-23 Thread Suravee Suthikulpanit

Alex/Joerg,

On 1/24/18 5:04 AM, Alex Williamson wrote:

+static size_t try_unmap_unpin_fast(struct vfio_domain *domain, dma_addr_t iova,
+  size_t len, phys_addr_t phys,
+  struct list_head *unmapped_regions)
+{
+   struct vfio_regions *entry;
+   size_t unmapped;
+
+   entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+   if (!entry)
+   return -ENOMEM;

size_t is unsigned, so pushing -ENOMEM out though this unsigned
function and the callee interpreting it as unsigned, means we're going
to see this as a very large unmap, not an error condition.  Looks like
the IOMMU API has problems in this space too, ex. iommu_unmap(), Joerg?




I can clean up the APIs to use ssize_t, where it can return errors.

Suravee


Re: [PATCH v3] vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs

2018-01-24 Thread Suravee Suthikulpanit

Alex / Joerg,

On 1/24/18 5:04 AM, Alex Williamson wrote:

@@ -648,12 +685,40 @@ static int vfio_iommu_type1_unpin_pages(void *iommu_data,
return i > npage ? npage : (i > 0 ? i : -EINVAL);
  }
  
+static size_t try_unmap_unpin_fast(struct vfio_domain *domain, dma_addr_t iova,

+  size_t len, phys_addr_t phys,
+  struct list_head *unmapped_regions)
+{
+   struct vfio_regions *entry;
+   size_t unmapped;
+
+   entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+   if (!entry)
+   return -ENOMEM;
+
+   unmapped = iommu_unmap_fast(domain->domain, iova, len);
+   if (WARN_ON(!unmapped)) {
+   kfree(entry);
+   return -EINVAL;
+   }

Not sure about the handling of this, the zero check is just a
consistency validation.  If there's nothing mapped where we think there
should be something mapped, we warn and throw out the whole vfio_dma.
After this patch, such an error gets warned twice, which doesn't
really seem to be an improvement.



Since iommu_unmap() and iommu_unmap_fast() can return errors, instead of just 
zero check,
we should also check for errors, warn, and bail out the whole vfio_dma.

Thanks,
Suravee


Re: [RFC PATCH v2 2/2] iommu/amd: Add support for fast IOTLB flushing

2018-01-24 Thread Suravee Suthikulpanit

Hi Joerg,

On 12/27/17 4:20 PM, Suravee Suthikulpanit wrote:

Implement the newly added IOTLB flushing interface for AMD IOMMU.

Signed-off-by: Suravee Suthikulpanit 
---
  drivers/iommu/amd_iommu.c   | 73 -
  drivers/iommu/amd_iommu_init.c  |  7 
  drivers/iommu/amd_iommu_types.h |  7 
  3 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 7d5eb00..42fe365 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
 ...
@@ -3163,6 +3168,69 @@ static bool amd_iommu_is_attach_deferred(struct 
iommu_domain *domain,
return dev_data->defer_attach;
  }
  
+static void amd_iommu_flush_iotlb_all(struct iommu_domain *domain)

+{
+   struct protection_domain *dom = to_pdomain(domain);
+
+   domain_flush_tlb_pde(dom);
+}


I made a mistake here ...


...
+static void amd_iommu_iotlb_sync(struct iommu_domain *domain)
+{
+   struct protection_domain *pdom = to_pdomain(domain);
+   struct amd_iommu_flush_entries *entry, *next;
+   unsigned long flags;
+
+   /* Note:
+* Currently, IOMMU driver just flushes the whole IO/TLB for
+* a given domain. So, just remove entries from the list here.
+*/
+   spin_lock_irqsave(&amd_iommu_flush_list_lock, flags);
+   list_for_each_entry_safe(entry, next, &amd_iommu_flush_list, list) {
+   list_del(&entry->list);
+   kfree(entry);
+   }
+   spin_unlock_irqrestore(&amd_iommu_flush_list_lock, flags);
+
+   domain_flush_tlb_pde(pdom);
+}


... and here where I should also call domain_flush_complete() after
domain_flush_tlb_pde(). I'll send out a new version (v3) as a separate patch
from the series.

Thanks,
Suravee


[PATCH 1/2] iommu: Fix iommu_unmap and iommu_unmap_fast return type

2018-01-30 Thread Suravee Suthikulpanit
Currently, iommu_unmap and iommu_unmap_fast return unmapped
pages with size_t.  However, the actual value returned could
be error codes (< 0), which can be misinterpreted as large
number of unmapped pages. Therefore, change the return type to ssize_t.

Cc: Joerg Roedel 
Cc: Alex Williamson 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd_iommu.c   |  6 +++---
 drivers/iommu/intel-iommu.c |  4 ++--
 drivers/iommu/iommu.c   | 16 
 include/linux/iommu.h   | 20 ++--
 4 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 7d5eb00..3609f51 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3030,11 +3030,11 @@ static int amd_iommu_map(struct iommu_domain *dom, 
unsigned long iova,
return ret;
 }
 
-static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
-  size_t page_size)
+static ssize_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
+  size_t page_size)
 {
struct protection_domain *domain = to_pdomain(dom);
-   size_t unmap_size;
+   ssize_t unmap_size;
 
if (domain->mode == PAGE_MODE_NONE)
return -EINVAL;
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 4a2de34..15ba866 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5068,8 +5068,8 @@ static int intel_iommu_map(struct iommu_domain *domain,
return ret;
 }
 
-static size_t intel_iommu_unmap(struct iommu_domain *domain,
-   unsigned long iova, size_t size)
+static ssize_t intel_iommu_unmap(struct iommu_domain *domain,
+unsigned long iova, size_t size)
 {
struct dmar_domain *dmar_domain = to_dmar_domain(domain);
struct page *freelist = NULL;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3de5c0b..8f7da8a 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1557,12 +1557,12 @@ int iommu_map(struct iommu_domain *domain, unsigned 
long iova,
 }
 EXPORT_SYMBOL_GPL(iommu_map);
 
-static size_t __iommu_unmap(struct iommu_domain *domain,
-   unsigned long iova, size_t size,
-   bool sync)
+static ssize_t __iommu_unmap(struct iommu_domain *domain,
+unsigned long iova, size_t size,
+bool sync)
 {
const struct iommu_ops *ops = domain->ops;
-   size_t unmapped_page, unmapped = 0;
+   ssize_t unmapped_page, unmapped = 0;
unsigned long orig_iova = iova;
unsigned int min_pagesz;
 
@@ -1617,15 +1617,15 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
return unmapped;
 }
 
-size_t iommu_unmap(struct iommu_domain *domain,
-  unsigned long iova, size_t size)
+ssize_t iommu_unmap(struct iommu_domain *domain,
+   unsigned long iova, size_t size)
 {
return __iommu_unmap(domain, iova, size, true);
 }
 EXPORT_SYMBOL_GPL(iommu_unmap);
 
-size_t iommu_unmap_fast(struct iommu_domain *domain,
-   unsigned long iova, size_t size)
+ssize_t iommu_unmap_fast(struct iommu_domain *domain,
+unsigned long iova, size_t size)
 {
return __iommu_unmap(domain, iova, size, false);
 }
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 41b8c57..78df048 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -199,8 +199,8 @@ struct iommu_ops {
void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
int (*map)(struct iommu_domain *domain, unsigned long iova,
   phys_addr_t paddr, size_t size, int prot);
-   size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
-size_t size);
+   ssize_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
+size_t size);
size_t (*map_sg)(struct iommu_domain *domain, unsigned long iova,
 struct scatterlist *sg, unsigned int nents, int prot);
void (*flush_iotlb_all)(struct iommu_domain *domain);
@@ -299,10 +299,10 @@ extern void iommu_detach_device(struct iommu_domain 
*domain,
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 phys_addr_t paddr, size_t size, int prot);
-extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
- size_t size);
-extern size_t iommu_unmap_fast(struct iommu_domain *domain,
-  unsigned long iova, size_t size);
+extern ssize_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
+  size_t size);
+extern ssize_t iommu_unmap_fast(struct iommu_domai

[PATCH 0/2] iommu / vfio: Clean up iommu_map[_fast] interface

2018-01-30 Thread Suravee Suthikulpanit
Change iommu_unmap[_fast] interfaces return type to ssize_t since
it can also return error code.

Cc: Joerg Roedel 
Cc: Alex Williamson 

Suravee Suthikulpanit (2):
  iommu: Fix iommu_unmap and iommu_unmap_fast return type
  vfio/type1: Add iommu_unmap error check when vfio_unmap_unpin

 drivers/iommu/amd_iommu.c   |  6 +++---
 drivers/iommu/intel-iommu.c |  4 ++--
 drivers/iommu/iommu.c   | 16 
 drivers/vfio/vfio_iommu_type1.c |  5 +++--
 include/linux/iommu.h   | 20 ++--
 5 files changed, 26 insertions(+), 25 deletions(-)

-- 
1.8.3.1



[PATCH 2/2] vfio/type1: Add iommu_unmap error check when vfio_unmap_unpin

2018-01-30 Thread Suravee Suthikulpanit
Besides zero check the number of unmapped page, also check
and handle iommu_unmap errors.

Cc: Alex Williamson 
Cc: Joerg Roedel 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/vfio/vfio_iommu_type1.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index e30e29a..c580518 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -677,7 +677,8 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
struct vfio_dma *dma,
}
 
while (iova < end) {
-   size_t unmapped, len;
+   ssize_t unmapped;
+   size_t len;
phys_addr_t phys, next;
 
phys = iommu_iova_to_phys(domain->domain, iova);
@@ -699,7 +700,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
struct vfio_dma *dma,
}
 
unmapped = iommu_unmap(domain->domain, iova, len);
-   if (WARN_ON(!unmapped))
+   if (WARN_ON(unmapped <= 0))
break;
 
unlocked += vfio_unpin_pages_remote(dma, iova,
-- 
1.8.3.1



[PATCH v3] iommu/amd: Add support for fast IOTLB flushing

2018-01-30 Thread Suravee Suthikulpanit
Implement the newly added IOTLB flushing interface for AMD IOMMU.

Cc: Joerg Roedel 
Signed-off-by: Suravee Suthikulpanit 
---
Changes from v2 (https://lkml.org/lkml/2017/12/27/44)
 * Call domain_flush_complete() after domain_flush_tlb_pde().

 drivers/iommu/amd_iommu.c   | 77 +++--
 drivers/iommu/amd_iommu_init.c  |  7 
 drivers/iommu/amd_iommu_types.h |  7 
 3 files changed, 88 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 3609f51..6c7ac3f 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -129,6 +129,12 @@ struct dma_ops_domain {
 static struct iova_domain reserved_iova_ranges;
 static struct lock_class_key reserved_rbtree_key;
 
+struct amd_iommu_flush_entries {
+   struct list_head list;
+   unsigned long iova;
+   size_t size;
+};
+
 /
  *
  * Helper functions
@@ -3043,9 +3049,6 @@ static ssize_t amd_iommu_unmap(struct iommu_domain *dom, 
unsigned long iova,
unmap_size = iommu_unmap_page(domain, iova, page_size);
mutex_unlock(&domain->api_lock);
 
-   domain_flush_tlb_pde(domain);
-   domain_flush_complete(domain);
-
return unmap_size;
 }
 
@@ -3163,6 +3166,71 @@ static bool amd_iommu_is_attach_deferred(struct 
iommu_domain *domain,
return dev_data->defer_attach;
 }
 
+static void amd_iommu_flush_iotlb_all(struct iommu_domain *domain)
+{
+   struct protection_domain *dom = to_pdomain(domain);
+
+   domain_flush_tlb_pde(dom);
+   domain_flush_complete(dom);
+}
+
+static void amd_iommu_iotlb_range_add(struct iommu_domain *domain,
+ unsigned long iova, size_t size)
+{
+   struct amd_iommu_flush_entries *entry, *p;
+   unsigned long flags;
+   bool found = false;
+
+   spin_lock_irqsave(&amd_iommu_flush_list_lock, flags);
+   list_for_each_entry(p, &amd_iommu_flush_list, list) {
+   if (iova != p->iova)
+   continue;
+
+   if (size > p->size) {
+   p->size = size;
+   pr_debug("%s: update range: iova=%#lx, size = %#lx\n",
+__func__, p->iova, p->size);
+   }
+   found = true;
+   break;
+   }
+
+   if (!found) {
+   entry = kzalloc(sizeof(struct amd_iommu_flush_entries),
+   GFP_KERNEL);
+   if (entry) {
+   pr_debug("%s: new range: iova=%lx, size=%#lx\n",
+__func__, iova, size);
+
+   entry->iova = iova;
+   entry->size = size;
+   list_add(&entry->list, &amd_iommu_flush_list);
+   }
+   }
+   spin_unlock_irqrestore(&amd_iommu_flush_list_lock, flags);
+}
+
+static void amd_iommu_iotlb_sync(struct iommu_domain *domain)
+{
+   struct protection_domain *pdom = to_pdomain(domain);
+   struct amd_iommu_flush_entries *entry, *next;
+   unsigned long flags;
+
+   /* Note:
+* Currently, IOMMU driver just flushes the whole IO/TLB for
+* a given domain. So, just remove entries from the list here.
+*/
+   spin_lock_irqsave(&amd_iommu_flush_list_lock, flags);
+   list_for_each_entry_safe(entry, next, &amd_iommu_flush_list, list) {
+   list_del(&entry->list);
+   kfree(entry);
+   }
+   spin_unlock_irqrestore(&amd_iommu_flush_list_lock, flags);
+
+   domain_flush_tlb_pde(pdom);
+   domain_flush_complete(pdom);
+}
+
 const struct iommu_ops amd_iommu_ops = {
.capable = amd_iommu_capable,
.domain_alloc = amd_iommu_domain_alloc,
@@ -3181,6 +3249,9 @@ static bool amd_iommu_is_attach_deferred(struct 
iommu_domain *domain,
.apply_resv_region = amd_iommu_apply_resv_region,
.is_attach_deferred = amd_iommu_is_attach_deferred,
.pgsize_bitmap  = AMD_IOMMU_PGSIZES,
+   .flush_iotlb_all = amd_iommu_flush_iotlb_all,
+   .iotlb_range_add = amd_iommu_iotlb_range_add,
+   .iotlb_sync = amd_iommu_iotlb_sync,
 };
 
 /*
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 6fe2d03..e8f8cee 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -185,6 +185,12 @@ struct ivmd_header {
 bool amd_iommu_force_isolation __read_mostly;
 
 /*
+ * IOTLB flush list
+ */
+LIST_HEAD(amd_iommu_flush_list);
+spinlock_t amd_iommu_flush_list_lock;
+
+/*
  * List of protection domains - used during resume
  */
 LIST_HEAD(amd_iommu_pd_list);
@@ -2490,6 +2496,7 @@ static int __init early_amd_iommu_init(void)
__set_bit(0, 

[PATCH v4] vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs

2018-01-31 Thread Suravee Suthikulpanit
Currently, VFIO IOMMU type1 unmaps IOVA pages synchronously, which requires
IOTLB flush for every IOVA unmap. This results in a large number of IOTLB
flushes during initialization of pass-through devices.

This can be avoided using the asynchronous (fast) IOTLB flush interface.

Cc: Alex Williamson 
Cc: Joerg Roedel 
Signed-off-by: Suravee Suthikulpanit 
---

Changes from v3 (https://lkml.org/lkml/2018/1/21/244)
 * Refactor the code to unmap_unpin_fast() and unmap_unpin_slow()
   to improve code readability.
 * Fix logic in vfio_unmap_unpin() to fallback to unmap_unpin_slow()
   only for the failing iova unmapping, and continue the next unmapping
   with the unmap_unpin_fast(). (per Alex)
 * Fix error handling in case of failing to do fast unmapping to warn
   only once.
 * Remove reference to GPU in the commit message.

 drivers/vfio/vfio_iommu_type1.c | 127 
 1 file changed, 116 insertions(+), 11 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index c580518..bec8512 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -102,6 +102,13 @@ struct vfio_pfn {
atomic_tref_count;
 };
 
+struct vfio_regions {
+   struct list_head list;
+   dma_addr_t iova;
+   phys_addr_t phys;
+   size_t len;
+};
+
 #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\
(!list_empty(&iommu->domain_list))
 
@@ -479,6 +486,29 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, 
dma_addr_t iova,
return unlocked;
 }
 
+static long vfio_sync_unpin(struct vfio_dma *dma, struct vfio_domain *domain,
+   struct list_head *regions)
+{
+   long unlocked = 0;
+   struct vfio_regions *entry, *next;
+
+   iommu_tlb_sync(domain->domain);
+
+   list_for_each_entry_safe(entry, next, regions, list) {
+   unlocked += vfio_unpin_pages_remote(dma,
+   entry->iova,
+   entry->phys >> PAGE_SHIFT,
+   entry->len >> PAGE_SHIFT,
+   false);
+   list_del(&entry->list);
+   kfree(entry);
+   }
+
+   cond_resched();
+
+   return unlocked;
+}
+
 static int vfio_pin_page_external(struct vfio_dma *dma, unsigned long vaddr,
  unsigned long *pfn_base, bool do_accounting)
 {
@@ -648,12 +678,78 @@ static int vfio_iommu_type1_unpin_pages(void *iommu_data,
return i > npage ? npage : (i > 0 ? i : -EINVAL);
 }
 
+static ssize_t unmap_unpin_slow(struct vfio_domain *domain,
+   struct vfio_dma *dma, dma_addr_t *iova,
+   size_t len, phys_addr_t phys,
+   long *unlocked)
+{
+   ssize_t unmapped = iommu_unmap(domain->domain, *iova, len);
+
+   if (unmapped <= 0)
+   return unmapped;
+
+   *unlocked += vfio_unpin_pages_remote(dma, *iova,
+phys >> PAGE_SHIFT,
+unmapped >> PAGE_SHIFT,
+false);
+   *iova += unmapped;
+   cond_resched();
+   return unmapped;
+}
+
+/*
+ * Generally, VFIO needs to unpin remote pages after each IOTLB flush.
+ * Therefore, when using IOTLB flush sync interface, VFIO need to keep track
+ * of these regions (currently using a list).
+ *
+ * This value specifies maximum number of regions for each IOTLB flush sync.
+ */
+#define VFIO_IOMMU_TLB_SYNC_MAX512
+
+static ssize_t unmap_unpin_fast(struct vfio_domain *domain,
+   struct vfio_dma *dma, dma_addr_t *iova,
+   size_t len, phys_addr_t phys,
+   struct list_head *unmapped_regions,
+   long *unlocked, int *cnt)
+{
+   struct vfio_regions *entry;
+   ssize_t unmapped;
+
+   entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+   if (!entry)
+   return -ENOMEM;
+
+   unmapped = iommu_unmap_fast(domain->domain, *iova, len);
+   if (unmapped <= 0) {
+   kfree(entry);
+   } else {
+   iommu_tlb_range_add(domain->domain, *iova, unmapped);
+   entry->iova = *iova;
+   entry->phys = phys;
+   entry->len  = unmapped;
+   list_add_tail(&entry->list, unmapped_regions);
+
+   *iova += unmapped;
+   (*cnt)++;
+   }
+
+   if (*cnt >= VFIO_IOMMU_TLB_SYNC_MAX || unmapped <= 0) {
+   *unlocked += vfio_sync_unpin(dma, domain,
+unmapp

Re: [PATCH 1/2] iommu: Fix iommu_unmap and iommu_unmap_fast return type

2018-01-31 Thread Suravee Suthikulpanit

Hi Robin,

On 2/1/18 1:02 AM, Robin Murphy wrote:

Hi Suravee,

On 31/01/18 01:48, Suravee Suthikulpanit wrote:

Currently, iommu_unmap and iommu_unmap_fast return unmapped
pages with size_t.  However, the actual value returned could
be error codes (< 0), which can be misinterpreted as large
number of unmapped pages. Therefore, change the return type to ssize_t.

Cc: Joerg Roedel 
Cc: Alex Williamson 
Signed-off-by: Suravee Suthikulpanit 
---
  drivers/iommu/amd_iommu.c   |  6 +++---
  drivers/iommu/intel-iommu.c |  4 ++--


Er, there are a few more drivers than that implementing iommu_ops ;)


Ahh right.


It seems like it might be more sensible to fix the single instance of a driver returning 
-EINVAL (which appears to be a "should never happen if used correctly" kinda 
thing anyway) and leave the API-internal callback prototype as-is. I do agree the 
inconsistency of iommu_unmap() itself wants sorting, though (particularly the !IOMMU_API 
stubs which are wrong either way).

Robin.


Make sense. I'll leave the API alone, and change the code to not returning 
error then.
There are a few places to fix.

Thanks,
Suravee


Re: [PATCH v4] vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs

2018-01-31 Thread Suravee Suthikulpanit

Alex,

On 1/31/18 4:45 PM, Suravee Suthikulpanit wrote:

Currently, VFIO IOMMU type1 unmaps IOVA pages synchronously, which requires
IOTLB flush for every IOVA unmap. This results in a large number of IOTLB
flushes during initialization of pass-through devices.

This can be avoided using the asynchronous (fast) IOTLB flush interface.

Cc: Alex Williamson
Cc: Joerg Roedel
Signed-off-by: Suravee Suthikulpanit
---

Changes from v3 (https://lkml.org/lkml/2018/1/21/244)
  * Refactor the code to unmap_unpin_fast() and unmap_unpin_slow()
to improve code readability.
  * Fix logic in vfio_unmap_unpin() to fallback to unmap_unpin_slow()
only for the failing iova unmapping, and continue the next unmapping
with the unmap_unpin_fast(). (per Alex)
  * Fix error handling in case of failing to do fast unmapping to warn
only once.
  * Remove reference to GPU in the commit message.


Please ignore v4. I found an issue in error handling logic. Also, I need to 
change
the return value back to size_t (as this was in a discussed in a separate 
thread).

Sorry for confusion. I'll clean up and send out v5.

Thanks,
Suravee


[PATCH v5] vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs

2018-01-31 Thread Suravee Suthikulpanit
VFIO IOMMU type1 currently upmaps IOVA pages synchronously, which requires
IOTLB flushing for every unmapping. This results in large IOTLB flushing
overhead when handling pass-through devices has a large number of mapped
IOVAs. This can be avoided by using the new IOTLB flushing interface.

Cc: Alex Williamson 
Cc: Joerg Roedel 
Signed-off-by: Suravee Suthikulpanit 
---

Changes from v4 (https://lkml.org/lkml/2018/1/31/153)
 * Change return type from ssize_t back to size_t since we no longer
   changing IOMMU API. Also update error handling logic accordingly.
 * In unmap_unpin_fast(), also sync when failing to allocate entry.
 * Some code restructuring and variable renaming.

 drivers/vfio/vfio_iommu_type1.c | 128 
 1 file changed, 117 insertions(+), 11 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index e30e29a..6041530 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -102,6 +102,13 @@ struct vfio_pfn {
atomic_tref_count;
 };
 
+struct vfio_regions {
+   struct list_head list;
+   dma_addr_t iova;
+   phys_addr_t phys;
+   size_t len;
+};
+
 #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\
(!list_empty(&iommu->domain_list))
 
@@ -648,11 +655,102 @@ static int vfio_iommu_type1_unpin_pages(void *iommu_data,
return i > npage ? npage : (i > 0 ? i : -EINVAL);
 }
 
+static long vfio_sync_unpin(struct vfio_dma *dma, struct vfio_domain *domain,
+   struct list_head *regions)
+{
+   long unlocked = 0;
+   struct vfio_regions *entry, *next;
+
+   iommu_tlb_sync(domain->domain);
+
+   list_for_each_entry_safe(entry, next, regions, list) {
+   unlocked += vfio_unpin_pages_remote(dma,
+   entry->iova,
+   entry->phys >> PAGE_SHIFT,
+   entry->len >> PAGE_SHIFT,
+   false);
+   list_del(&entry->list);
+   kfree(entry);
+   }
+
+   cond_resched();
+
+   return unlocked;
+}
+
+/*
+ * Generally, VFIO needs to unpin remote pages after each IOTLB flush.
+ * Therefore, when using IOTLB flush sync interface, VFIO need to keep track
+ * of these regions (currently using a list).
+ *
+ * This value specifies maximum number of regions for each IOTLB flush sync.
+ */
+#define VFIO_IOMMU_TLB_SYNC_MAX512
+
+static size_t unmap_unpin_fast(struct vfio_domain *domain,
+  struct vfio_dma *dma, dma_addr_t *iova,
+  size_t len, phys_addr_t phys, long *unlocked,
+  struct list_head *unmapped_list,
+  int *unmapped_cnt)
+{
+   size_t unmapped = 0;
+   struct vfio_regions *entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+
+   if (entry) {
+   unmapped = iommu_unmap_fast(domain->domain, *iova, len);
+
+   if (!unmapped) {
+   kfree(entry);
+   } else {
+   iommu_tlb_range_add(domain->domain, *iova, unmapped);
+   entry->iova = *iova;
+   entry->phys = phys;
+   entry->len  = unmapped;
+   list_add_tail(&entry->list, unmapped_list);
+
+   *iova += unmapped;
+   (*unmapped_cnt)++;
+   }
+   }
+
+   /*
+* Sync if the number of fast-unmap regions hits the limit
+* or in case of errors.
+*/
+   if (*unmapped_cnt >= VFIO_IOMMU_TLB_SYNC_MAX || !unmapped) {
+   *unlocked += vfio_sync_unpin(dma, domain,
+unmapped_list);
+   *unmapped_cnt = 0;
+   }
+
+   return unmapped;
+}
+
+static size_t unmap_unpin_slow(struct vfio_domain *domain,
+  struct vfio_dma *dma, dma_addr_t *iova,
+  size_t len, phys_addr_t phys,
+  long *unlocked)
+{
+   size_t unmapped = iommu_unmap(domain->domain, *iova, len);
+
+   if (unmapped) {
+   *unlocked += vfio_unpin_pages_remote(dma, *iova,
+phys >> PAGE_SHIFT,
+unmapped >> PAGE_SHIFT,
+false);
+   *iova += unmapped;
+   cond_resched();
+   }
+   return unmapped;
+}
+
 static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
 bool do_accounting)
 {
dma_addr

Re: [RFC PATCH v2 2/2] iommu/amd: Add support for fast IOTLB flushing

2018-01-21 Thread Suravee Suthikulpanit

Hi Joerg,

Do you have any feedback regarding this patch for AMD IOMMU? I'm re-sending the 
patch 1/2
separately per Alex's suggestion.

Thanks,
Suravee

On 12/27/17 4:20 PM, Suravee Suthikulpanit wrote:

Implement the newly added IOTLB flushing interface for AMD IOMMU.

Signed-off-by: Suravee Suthikulpanit 
---
  drivers/iommu/amd_iommu.c   | 73 -
  drivers/iommu/amd_iommu_init.c  |  7 
  drivers/iommu/amd_iommu_types.h |  7 
  3 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 7d5eb00..42fe365 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -129,6 +129,12 @@ struct dma_ops_domain {
  static struct iova_domain reserved_iova_ranges;
  static struct lock_class_key reserved_rbtree_key;
  
+struct amd_iommu_flush_entries {

+   struct list_head list;
+   unsigned long iova;
+   size_t size;
+};
+
  /
   *
   * Helper functions
@@ -3043,7 +3049,6 @@ static size_t amd_iommu_unmap(struct iommu_domain *dom, 
unsigned long iova,
unmap_size = iommu_unmap_page(domain, iova, page_size);
mutex_unlock(&domain->api_lock);
  
-	domain_flush_tlb_pde(domain);

domain_flush_complete(domain);
  
  	return unmap_size;

@@ -3163,6 +3168,69 @@ static bool amd_iommu_is_attach_deferred(struct 
iommu_domain *domain,
return dev_data->defer_attach;
  }
  
+static void amd_iommu_flush_iotlb_all(struct iommu_domain *domain)

+{
+   struct protection_domain *dom = to_pdomain(domain);
+
+   domain_flush_tlb_pde(dom);
+}
+
+static void amd_iommu_iotlb_range_add(struct iommu_domain *domain,
+ unsigned long iova, size_t size)
+{
+   struct amd_iommu_flush_entries *entry, *p;
+   unsigned long flags;
+   bool found = false;
+
+   spin_lock_irqsave(&amd_iommu_flush_list_lock, flags);
+   list_for_each_entry(p, &amd_iommu_flush_list, list) {
+   if (iova != p->iova)
+   continue;
+
+   if (size > p->size) {
+   p->size = size;
+   pr_debug("%s: update range: iova=%#lx, size = %#lx\n",
+__func__, p->iova, p->size);
+   }
+   found = true;
+   break;
+   }
+
+   if (!found) {
+   entry = kzalloc(sizeof(struct amd_iommu_flush_entries),
+   GFP_KERNEL);
+   if (entry) {
+   pr_debug("%s: new range: iova=%lx, size=%#lx\n",
+__func__, iova, size);
+
+   entry->iova = iova;
+   entry->size = size;
+   list_add(&entry->list, &amd_iommu_flush_list);
+   }
+   }
+   spin_unlock_irqrestore(&amd_iommu_flush_list_lock, flags);
+}
+
+static void amd_iommu_iotlb_sync(struct iommu_domain *domain)
+{
+   struct protection_domain *pdom = to_pdomain(domain);
+   struct amd_iommu_flush_entries *entry, *next;
+   unsigned long flags;
+
+   /* Note:
+* Currently, IOMMU driver just flushes the whole IO/TLB for
+* a given domain. So, just remove entries from the list here.
+*/
+   spin_lock_irqsave(&amd_iommu_flush_list_lock, flags);
+   list_for_each_entry_safe(entry, next, &amd_iommu_flush_list, list) {
+   list_del(&entry->list);
+   kfree(entry);
+   }
+   spin_unlock_irqrestore(&amd_iommu_flush_list_lock, flags);
+
+   domain_flush_tlb_pde(pdom);
+}
+
  const struct iommu_ops amd_iommu_ops = {
.capable = amd_iommu_capable,
.domain_alloc = amd_iommu_domain_alloc,
@@ -3181,6 +3249,9 @@ static bool amd_iommu_is_attach_deferred(struct 
iommu_domain *domain,
.apply_resv_region = amd_iommu_apply_resv_region,
.is_attach_deferred = amd_iommu_is_attach_deferred,
.pgsize_bitmap  = AMD_IOMMU_PGSIZES,
+   .flush_iotlb_all = amd_iommu_flush_iotlb_all,
+   .iotlb_range_add = amd_iommu_iotlb_range_add,
+   .iotlb_sync = amd_iommu_iotlb_sync,
  };
  
  /*

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 6fe2d03..e8f8cee 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -185,6 +185,12 @@ struct ivmd_header {
  bool amd_iommu_force_isolation __read_mostly;
  
  /*

+ * IOTLB flush list
+ */
+LIST_HEAD(amd_iommu_flush_list);
+spinlock_t amd_iommu_flush_list_lock;
+
+/*
   * List of protection domains - used during resume
   */
  LIST_HEAD(amd_iommu_pd_list);
@@ -2490,6 +2496,7 @@ static int __init early_amd_iommu_init(void)

[PATCH v3] vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs

2018-01-21 Thread Suravee Suthikulpanit
VFIO IOMMU type1 currently upmaps IOVA pages synchronously, which requires
IOTLB flushing for every unmapping. This results in large IOTLB flushing
overhead when handling pass-through devices has a large number of mapped
IOVAs (e.g. GPUs). This could also cause IOTLB invalidate time-out issue
on AMD IOMMU with certain dGPUs.

This can be avoided by using the new IOTLB flushing interface.

Cc: Alex Williamson 
Cc: Joerg Roedel 
Signed-off-by: Suravee Suthikulpanit 
---
Changes from V2: (https://lkml.org/lkml/2017/12/27/43)

  * In vfio_unmap_unpin(), fallback to use slow IOTLB flush
when fast IOTLB flush fails (per Alex).

  * Do not adopt fast IOTLB flush in map_try_harder().

  * Submit VFIO and AMD IOMMU patches separately.

 drivers/vfio/vfio_iommu_type1.c | 98 +
 1 file changed, 98 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index e30e29a..5c40b00 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -102,6 +102,13 @@ struct vfio_pfn {
atomic_tref_count;
 };
 
+struct vfio_regions {
+   struct list_head list;
+   dma_addr_t iova;
+   phys_addr_t phys;
+   size_t len;
+};
+
 #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\
(!list_empty(&iommu->domain_list))
 
@@ -479,6 +486,36 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, 
dma_addr_t iova,
return unlocked;
 }
 
+/*
+ * Generally, VFIO needs to unpin remote pages after each IOTLB flush.
+ * Therefore, when using IOTLB flush sync interface, VFIO need to keep track
+ * of these regions (currently using a list).
+ *
+ * This value specifies maximum number of regions for each IOTLB flush sync.
+ */
+#define VFIO_IOMMU_TLB_SYNC_MAX512
+
+static long vfio_sync_unpin(struct vfio_dma *dma, struct vfio_domain *domain,
+   struct list_head *regions)
+{
+   long unlocked = 0;
+   struct vfio_regions *entry, *next;
+
+   iommu_tlb_sync(domain->domain);
+
+   list_for_each_entry_safe(entry, next, regions, list) {
+   unlocked += vfio_unpin_pages_remote(dma,
+   entry->iova,
+   entry->phys >> PAGE_SHIFT,
+   entry->len >> PAGE_SHIFT,
+   false);
+   list_del(&entry->list);
+   kfree(entry);
+   }
+   cond_resched();
+   return unlocked;
+}
+
 static int vfio_pin_page_external(struct vfio_dma *dma, unsigned long vaddr,
  unsigned long *pfn_base, bool do_accounting)
 {
@@ -648,12 +685,40 @@ static int vfio_iommu_type1_unpin_pages(void *iommu_data,
return i > npage ? npage : (i > 0 ? i : -EINVAL);
 }
 
+static size_t try_unmap_unpin_fast(struct vfio_domain *domain, dma_addr_t iova,
+  size_t len, phys_addr_t phys,
+  struct list_head *unmapped_regions)
+{
+   struct vfio_regions *entry;
+   size_t unmapped;
+
+   entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+   if (!entry)
+   return -ENOMEM;
+
+   unmapped = iommu_unmap_fast(domain->domain, iova, len);
+   if (WARN_ON(!unmapped)) {
+   kfree(entry);
+   return -EINVAL;
+   }
+
+   iommu_tlb_range_add(domain->domain, iova, unmapped);
+   entry->iova = iova;
+   entry->phys = phys;
+   entry->len  = unmapped;
+   list_add_tail(&entry->list, unmapped_regions);
+   return unmapped;
+}
+
 static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
 bool do_accounting)
 {
dma_addr_t iova = dma->iova, end = dma->iova + dma->size;
struct vfio_domain *domain, *d;
+   struct list_head unmapped_regions;
+   bool use_fastpath = true;
long unlocked = 0;
+   int cnt = 0;
 
if (!dma->size)
return 0;
@@ -661,6 +726,8 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
struct vfio_dma *dma,
if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu))
return 0;
 
+   INIT_LIST_HEAD(&unmapped_regions);
+
/*
 * We use the IOMMU to track the physical addresses, otherwise we'd
 * need a much more complicated tracking system.  Unfortunately that
@@ -698,6 +765,33 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
struct vfio_dma *dma,
break;
}
 
+   /*
+* First, try to use fast unmap/unpin. In case of failure,
+* sync upto the current point, and continue the slow
+ 

Re: [RFC PATCH v2 1/2] vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs

2018-01-17 Thread Suravee Suthikulpanit

Hi Alex,

On 1/9/18 4:07 AM, Alex Williamson wrote:

@@ -661,6 +705,8 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
struct vfio_dma *dma,
if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu))
return 0;
  
+	INIT_LIST_HEAD(&unmapped_regions);

+
/*
 * We use the IOMMU to track the physical addresses, otherwise we'd
 * need a much more complicated tracking system.  Unfortunately that
@@ -698,24 +744,36 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
struct vfio_dma *dma,
break;
}
  
-		unmapped = iommu_unmap(domain->domain, iova, len);

-   if (WARN_ON(!unmapped))
+   entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+   if (!entry)
break;


Turns out this nagged at me a bit too, this function only gets called
once to dump the vfio_dma, so bailing out here leaves pages pinned and
IOMMU mappings in place, for a performance optimization that we could
just skip.  We could sync&unpin anything collected up to this point and
continue this step with a synchronous unmap/unpin.  Thanks,


Ah, that's an over look in my part also. Thanks for catching this. I'll 
implement
the fallback mechanism per your suggestion in v3.

Thanks,
Suravee


Re: [RFC PATCH v2 1/2] vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs

2018-01-17 Thread Suravee Suthikulpanit

Hi Alex,

On 1/9/18 3:53 AM, Alex Williamson wrote:

On Wed, 27 Dec 2017 04:20:34 -0500
Suravee Suthikulpanit  wrote:

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index e30e29a..f000844 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c

... >>
@@ -479,6 +486,40 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, 
dma_addr_t iova,
return unlocked;
  }
  
+/*

+ * Generally, VFIO needs to unpin remote pages after each IOTLB flush.
+ * Therefore, when using IOTLB flush sync interface, VFIO need to keep track
+ * of these regions (currently using a list).
+ *
+ * This value specifies maximum number of regions for each IOTLB flush sync.
+ */
+#define VFIO_IOMMU_TLB_SYNC_MAX512


Is this an arbitrary value or are there non-obvious considerations for
this value should we want to further tune it in the future?


This is just an arbitrary value for now, which we could try further tuning.
On some dGPUs that I have been using, I have seen max of ~1500 regions within 
an unmap call.
In most case, I see less than 100 regions in an unmap call. The structure is 
currently 40 bytes.
So, I figured capping at 512 entry in the list is 20KB is reasonable. Let me 
know what you think.




@@ -887,8 +946,14 @@ static int map_try_harder(struct vfio_domain *domain, 
dma_addr_t iova,
break;
}
  
-	for (; i < npage && i > 0; i--, iova -= PAGE_SIZE)

-   iommu_unmap(domain->domain, iova, PAGE_SIZE);
+   for (; i < npage && i > 0; i--, iova -= PAGE_SIZE) {
+   unmapped = iommu_unmap_fast(domain->domain, iova, PAGE_SIZE);
+   if (WARN_ON(!unmapped))
+   break;
+   iommu_tlb_range_add(domain->domain, iova, unmapped);
+   }
+   if (unmapped)
+   iommu_tlb_sync(domain->domain);


Using unmapped here seems a little sketchy, for instance if we got back
zero on the last call to iommu_unmap_fast() but had other ranges queued
for flush.  Do we even need a WARN_ON and break here, are we just
trying to skip adding a zero range?  The intent is that we either leave
this function with everything mapped or nothing mapped, so perhaps we
should warn and continue.  Assuming a spurious sync is ok, we could
check (i < npage) for the sync condition, the only risk being we had no
mappings at all and therefore no unmaps.

TBH, I wonder if this function is even needed anymore or if the mapping
problem in amd_iommu has since ben fixed.


Actually, I never hit this execution path in my test runs. I could just left 
this
unchanged and use the slow unmap path to simplify the logic. I'm not aware of
the history of why this logic is needed for AMD IOMMU. Is this a bug in the 
driver or
the hardware?


Also, I'm not sure why you're gating adding fast flushing to amd_iommu
on vfio making use of it.  These can be done independently.  Thanks,


Currently, the fast unmap interface is mainly called by VFIO. So, I thought I 
would
submit the patches together for review. If you would prefer, I can submit the 
IOMMU part
separately.

Thanks,
Suravee




[PATCH 4/4] x86/CPU/AMD: Calculate LLC ID from number of sharing threads

2018-03-25 Thread Suravee Suthikulpanit
Last-Level-Cache ID can be calculated from the number of threads sharing
the cache, which is available from CPUID Fn0x801D (Cache Properties).
This is used to left-shift the APIC ID to derive LLC ID.

Therefore, default to this method unless the APIC ID enumeration does not
follow the scheme.

Signed-off-by: Suravee Suthikulpanit 
---
 arch/x86/include/asm/cacheinfo.h |  7 +++
 arch/x86/kernel/cpu/amd.c| 19 +++
 arch/x86/kernel/cpu/cacheinfo.c  | 37 +
 3 files changed, 47 insertions(+), 16 deletions(-)
 create mode 100644 arch/x86/include/asm/cacheinfo.h

diff --git a/arch/x86/include/asm/cacheinfo.h b/arch/x86/include/asm/cacheinfo.h
new file mode 100644
index 000..e958e28
--- /dev/null
+++ b/arch/x86/include/asm/cacheinfo.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_CACHEINFO_H
+#define _ASM_X86_CACHEINFO_H
+
+void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, int cpu, u8 node_id);
+
+#endif /* _ASM_X86_CACHEINFO_H */
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 922f43c..2c1a9f2 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -343,22 +344,8 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
c->x86_max_cores /= smp_num_siblings;
}
 
-   /*
-* We may have multiple LLCs if L3 caches exist, so check if we
-* have an L3 cache by looking at the L3 cache CPUID leaf.
-*/
-   if (cpuid_edx(0x8006)) {
-   if (c->x86 == 0x17) {
-   /*
-* LLC is at the core complex level.
-* Core complex id is ApicId[3].
-*/
-   per_cpu(cpu_llc_id, cpu) = c->apicid >> 3;
-   } else {
-   /* LLC is at the node level. */
-   per_cpu(cpu_llc_id, cpu) = node_id;
-   }
-   }
+   cacheinfo_amd_init_llc_id(c, cpu, node_id);
+
} else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) {
u64 value;
 
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 54d04d5..67f4790 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -637,6 +637,43 @@ static int find_num_cache_leaves(struct cpuinfo_x86 *c)
return i;
 }
 
+void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, int cpu, u8 node_id)
+{
+   /*
+* We may have multiple LLCs if L3 caches exist, so check if we
+* have an L3 cache by looking at the L3 cache CPUID leaf.
+*/
+   if (!cpuid_edx(0x8006))
+   return;
+
+   if (c->x86 < 0x17) {
+   /* LLC is at the node level. */
+   per_cpu(cpu_llc_id, cpu) = node_id;
+   } else if (c->x86 == 0x17 &&
+  c->x86_model >= 0 && c->x86_model <= 0x1F) {
+   /*
+* LLC is at the core complex level.
+* Core complex id is ApicId[3] for these processors.
+*/
+   per_cpu(cpu_llc_id, cpu) = c->apicid >> 3;
+   } else {
+   /* LLC ID is calculated from the number of thread sharing. */
+   u32 eax, ebx, ecx, edx, num_sharing_cache = 0;
+   u32 llc_index = find_num_cache_leaves(c) - 1;
+
+   cpuid_count(0x801d, llc_index, &eax, &ebx, &ecx, &edx);
+   if (eax)
+   num_sharing_cache = ((eax >> 14) & 0xfff) + 1;
+
+   if (num_sharing_cache) {
+   int bits = get_count_order(num_sharing_cache) - 1;
+
+   per_cpu(cpu_llc_id, cpu) = c->apicid >> bits;
+   }
+   }
+}
+EXPORT_SYMBOL_GPL(cacheinfo_amd_init_llc_id);
+
 void init_amd_cacheinfo(struct cpuinfo_x86 *c)
 {
 
-- 
2.7.4



[PATCH 0/4] x86/CPU: Update AMD Last-Level-Cache Information

2018-03-25 Thread Suravee Suthikulpanit
First, clean up last-level-cache parameters so that it could not
require ifdef CONFIG_SMP. Then, consolidate cache-info-related
code for x86 into arch/x86/kernel/cpu/cacheinfo.c.

Finally, for AMD, introduce new logic to derive LLC ID from APIC ID.

Thanks,
Suravee

Borislav Petkov (2):
  x86/CPU/AMD: Remove unnecessary check for CONFIG_SMP
  x86/CPU: Rename intel_cacheinfo.c to cacheinfo.c

Suravee Suthikulpanit (2):
  perf/x86/amd/uncore: Fix amd_uncore_llc ID to use pre-defined
cpu_llc_id
  x86/CPU/AMD: Calculate LLC ID from number of sharing threads

 arch/x86/events/amd/uncore.c   | 21 ++--
 arch/x86/include/asm/cacheinfo.h   |  7 
 arch/x86/include/asm/smp.h |  1 -
 arch/x86/kernel/cpu/Makefile   |  2 +-
 arch/x86/kernel/cpu/amd.c  | 25 ++-
 .../kernel/cpu/{intel_cacheinfo.c => cacheinfo.c}  | 37 ++
 arch/x86/kernel/cpu/common.c   |  7 
 arch/x86/kernel/smpboot.c  |  7 
 8 files changed, 57 insertions(+), 50 deletions(-)
 create mode 100644 arch/x86/include/asm/cacheinfo.h
 rename arch/x86/kernel/cpu/{intel_cacheinfo.c => cacheinfo.c} (96%)

-- 
2.7.4



[PATCH 2/4] perf/x86/amd/uncore: Fix amd_uncore_llc ID to use pre-defined cpu_llc_id

2018-03-25 Thread Suravee Suthikulpanit
Current logic iterates over CPUID Fn801d leafs (Cache Properties)
to detect the last level cache, and derive the last-level cache ID.
However, this information is already available in the cpu_llc_id.
Therefore, make use of it instead.

Reviewed-by: Borislav Petkov 
Cc: Janakarajan Natarajan 
Signed-off-by: Suravee Suthikulpanit 
---
 arch/x86/events/amd/uncore.c | 21 ++---
 1 file changed, 2 insertions(+), 19 deletions(-)

diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index f5cbbba..981ba5e 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define NUM_COUNTERS_NB4
 #define NUM_COUNTERS_L24
@@ -399,26 +400,8 @@ static int amd_uncore_cpu_starting(unsigned int cpu)
}
 
if (amd_uncore_llc) {
-   unsigned int apicid = cpu_data(cpu).apicid;
-   unsigned int nshared, subleaf, prev_eax = 0;
-
uncore = *per_cpu_ptr(amd_uncore_llc, cpu);
-   /*
-* Iterate over Cache Topology Definition leaves until no
-* more cache descriptions are available.
-*/
-   for (subleaf = 0; subleaf < 5; subleaf++) {
-   cpuid_count(0x801d, subleaf, &eax, &ebx, &ecx, 
&edx);
-
-   /* EAX[0:4] gives type of cache */
-   if (!(eax & 0x1f))
-   break;
-
-   prev_eax = eax;
-   }
-   nshared = ((prev_eax >> 14) & 0xfff) + 1;
-
-   uncore->id = apicid - (apicid % nshared);
+   uncore->id = per_cpu(cpu_llc_id, cpu);
 
uncore = amd_uncore_find_online_sibling(uncore, amd_uncore_llc);
*per_cpu_ptr(amd_uncore_llc, cpu) = uncore;
-- 
2.7.4



[PATCH 1/4] x86/CPU/AMD: Remove unnecessary check for CONFIG_SMP

2018-03-25 Thread Suravee Suthikulpanit
From: Borislav Petkov 

Move smp_num_siblings and cpu_llc_id to cpu/common.c so that they're
always present as symbols and not only in the CONFIG_SMP case. Then,
other code using them doesn't need ugly ifdeffery anymore.

Signed-off-by: Borislav Petkov 
Signed-off-by: Suravee Suthikulpanit 
---
 arch/x86/include/asm/smp.h   | 1 -
 arch/x86/kernel/cpu/amd.c| 6 --
 arch/x86/kernel/cpu/common.c | 7 +++
 arch/x86/kernel/smpboot.c| 7 ---
 4 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index a418976..59a01f6 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -171,7 +171,6 @@ static inline int wbinvd_on_all_cpus(void)
wbinvd();
return 0;
 }
-#define smp_num_siblings   1
 #endif /* CONFIG_SMP */
 
 extern unsigned disabled_cpus;
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index f0e6456..922f43c 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -297,7 +297,6 @@ static int nearby_node(int apicid)
 }
 #endif
 
-#ifdef CONFIG_SMP
 /*
  * Fix up cpu_core_id for pre-F17h systems to be in the
  * [0 .. cores_per_node - 1] range. Not really needed but
@@ -375,7 +374,6 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
legacy_fixup_core_id(c);
}
 }
-#endif
 
 /*
  * On a AMD dual core setup the lower bits of the APIC id distinguish the 
cores.
@@ -383,7 +381,6 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
  */
 static void amd_detect_cmp(struct cpuinfo_x86 *c)
 {
-#ifdef CONFIG_SMP
unsigned bits;
int cpu = smp_processor_id();
 
@@ -395,15 +392,12 @@ static void amd_detect_cmp(struct cpuinfo_x86 *c)
/* use socket ID also for last level cache */
per_cpu(cpu_llc_id, cpu) = c->phys_proc_id;
amd_get_topology(c);
-#endif
 }
 
 u16 amd_get_nb_id(int cpu)
 {
u16 id = 0;
-#ifdef CONFIG_SMP
id = per_cpu(cpu_llc_id, cpu);
-#endif
return id;
 }
 EXPORT_SYMBOL_GPL(amd_get_nb_id);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 348cf48..2afd854 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -66,6 +66,13 @@ cpumask_var_t cpu_callin_mask;
 /* representing cpus for which sibling maps can be computed */
 cpumask_var_t cpu_sibling_setup_mask;
 
+/* Number of siblings per CPU package */
+int smp_num_siblings = 1;
+EXPORT_SYMBOL(smp_num_siblings);
+
+/* Last level cache ID of each logical CPU */
+DEFINE_PER_CPU_READ_MOSTLY(u16, cpu_llc_id) = BAD_APICID;
+
 /* correctly size the local cpu masks */
 void __init setup_cpu_local_masks(void)
 {
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ff99e2b..91d48f3 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -78,13 +78,6 @@
 #include 
 #include 
 
-/* Number of siblings per CPU package */
-int smp_num_siblings = 1;
-EXPORT_SYMBOL(smp_num_siblings);
-
-/* Last level cache ID of each logical CPU */
-DEFINE_PER_CPU_READ_MOSTLY(u16, cpu_llc_id) = BAD_APICID;
-
 /* representing HT siblings of each logical CPU */
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
 EXPORT_PER_CPU_SYMBOL(cpu_sibling_map);
-- 
2.7.4



[PATCH 3/4] x86/CPU: Rename intel_cacheinfo.c to cacheinfo.c

2018-03-25 Thread Suravee Suthikulpanit
From: Borislav Petkov 

Since this file contains general cache-related information for x86,
rename the file to a more appropriate name.

Signed-off-by: Borislav Petkov 
Signed-off-by: Suravee Suthikulpanit 
---
 arch/x86/kernel/cpu/Makefile   | 2 +-
 arch/x86/kernel/cpu/{intel_cacheinfo.c => cacheinfo.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename arch/x86/kernel/cpu/{intel_cacheinfo.c => cacheinfo.c} (100%)

diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 570e8bb..32591f2 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -17,7 +17,7 @@ KCOV_INSTRUMENT_perf_event.o := n
 nostackp := $(call cc-option, -fno-stack-protector)
 CFLAGS_common.o:= $(nostackp)
 
-obj-y  := intel_cacheinfo.o scattered.o topology.o
+obj-y  := cacheinfo.o scattered.o topology.o
 obj-y  += common.o
 obj-y  += rdrand.o
 obj-y  += match.o
diff --git a/arch/x86/kernel/cpu/intel_cacheinfo.c 
b/arch/x86/kernel/cpu/cacheinfo.c
similarity index 100%
rename from arch/x86/kernel/cpu/intel_cacheinfo.c
rename to arch/x86/kernel/cpu/cacheinfo.c
-- 
2.7.4



[PATCH 0/2] x86/CPU/AMD: Add support for Extended Topology Enumeration

2018-03-26 Thread Suravee Suthikulpanit
Linux currently provides function detect_extended_topology()
for parsing CPUID Fn0xB and derive CPU topology information.
Therefore, also call this function in AMD code path.

Thanks,
Suravee

Suravee Suthikulpanit (2):
  x86/CPU: Modify detect_extended_topology() to return result
  x86/CPU/AMD: Derive CPU topology from CPUID Fn0xB when available

 arch/x86/include/asm/processor.h |  2 +-
 arch/x86/kernel/cpu/amd.c| 16 
 arch/x86/kernel/cpu/topology.c   |  8 
 3 files changed, 17 insertions(+), 9 deletions(-)

-- 
2.7.4



[PATCH 1/2] x86/CPU: Modify detect_extended_topology() to return result

2018-03-26 Thread Suravee Suthikulpanit
Current implementation does not communicate whether it can successfully
detect CPUID Fn0x000B information. Therefore, modify the function
to return success or error codes. This will be used by subsequent patches.

Reviewed-by: Borislav Petkov 
Signed-off-by: Suravee Suthikulpanit 
---
 arch/x86/include/asm/processor.h | 2 +-
 arch/x86/kernel/cpu/topology.c   | 8 
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index b0ccd48..2a5d5ed 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -193,7 +193,7 @@ extern u32 get_scattered_cpuid_leaf(unsigned int level,
 extern unsigned int init_intel_cacheinfo(struct cpuinfo_x86 *c);
 extern void init_amd_cacheinfo(struct cpuinfo_x86 *c);
 
-extern void detect_extended_topology(struct cpuinfo_x86 *c);
+extern int detect_extended_topology(struct cpuinfo_x86 *c);
 extern void detect_ht(struct cpuinfo_x86 *c);
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index b099024..81c0afb 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -27,7 +27,7 @@
  * exists, use it for populating initial_apicid and cpu topology
  * detection.
  */
-void detect_extended_topology(struct cpuinfo_x86 *c)
+int detect_extended_topology(struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_SMP
unsigned int eax, ebx, ecx, edx, sub_index;
@@ -36,7 +36,7 @@ void detect_extended_topology(struct cpuinfo_x86 *c)
static bool printed;
 
if (c->cpuid_level < 0xb)
-   return;
+   return -1;
 
cpuid_count(0xb, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
 
@@ -44,7 +44,7 @@ void detect_extended_topology(struct cpuinfo_x86 *c)
 * check if the cpuid leaf 0xb is actually implemented.
 */
if (ebx == 0 || (LEAFB_SUBTYPE(ecx) != SMT_TYPE))
-   return;
+   return -1;
 
set_cpu_cap(c, X86_FEATURE_XTOPOLOGY);
 
@@ -95,6 +95,6 @@ void detect_extended_topology(struct cpuinfo_x86 *c)
   c->cpu_core_id);
printed = 1;
}
-   return;
 #endif
+   return 0;
 }
-- 
2.7.4



[PATCH 2/2] x86/CPU/AMD: Derive CPU topology from CPUID Fn0xB

2018-03-26 Thread Suravee Suthikulpanit
Derive topology information from Extended Topology Enumeration
(CPUID Fn0x000B) when the information is available.

Signed-off-by: Suravee Suthikulpanit 
---
 arch/x86/kernel/cpu/amd.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 2c1a9f2..2b40144 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -338,10 +338,18 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
c->cu_id = ebx & 0xff;
 
if (c->x86 >= 0x17) {
-   c->cpu_core_id = ebx & 0xff;
+   int err = detect_extended_topology(c);
 
-   if (smp_num_siblings > 1)
-   c->x86_max_cores /= smp_num_siblings;
+   if (err) {
+   c->cpu_core_id = ebx & 0xff;
+
+   if (smp_num_siblings > 1)
+   c->x86_max_cores /= smp_num_siblings;
+   } else {
+   int bits = get_count_order(c->x86_max_cores);
+
+   c->x86_coreid_bits = get_count_order(bits);
+   }
}
 
cacheinfo_amd_init_llc_id(c, cpu, node_id);
@@ -378,7 +386,6 @@ static void amd_detect_cmp(struct cpuinfo_x86 *c)
c->phys_proc_id = c->initial_apicid >> bits;
/* use socket ID also for last level cache */
per_cpu(cpu_llc_id, cpu) = c->phys_proc_id;
-   amd_get_topology(c);
 }
 
 u16 amd_get_nb_id(int cpu)
@@ -823,6 +830,7 @@ static void init_amd(struct cpuinfo_x86 *c)
/* Multi core CPU? */
if (c->extended_cpuid_level >= 0x8008) {
amd_detect_cmp(c);
+   amd_get_topology(c);
srat_detect_node(c);
}
 
-- 
2.7.4



Re: [PATCH 0/2] x86/CPU/AMD: Add support for Extended Topology Enumeration

2018-03-27 Thread Suravee Suthikulpanit

Hi All,

On 3/27/18 1:52 PM, David Rientjes wrote:

On Tue, 27 Mar 2018, Ingo Molnar wrote:


Linux currently provides function detect_extended_topology()
for parsing CPUID Fn0xB and derive CPU topology information.
Therefore, also call this function in AMD code path.

Thanks,
Suravee

Suravee Suthikulpanit (2):
   x86/CPU: Modify detect_extended_topology() to return result
   x86/CPU/AMD: Derive CPU topology from CPUID Fn0xB when available

  arch/x86/include/asm/processor.h |  2 +-
  arch/x86/kernel/cpu/amd.c| 16 
  arch/x86/kernel/cpu/topology.c   |  8 
  3 files changed, 17 insertions(+), 9 deletions(-)


Which tree is this again? The second patch does not apply to -linus or -tip.



I was wondering the exact same thing today when this came across.  I found
it's based on Suravee's other patch series posted today entitled "x86/CPU:
Update AMD Last-Level-Cache Information".

https://marc.info/?l=linux-kernel&m=152204614503522
1522046116-22578-1-git-send-email-suravee.suthikulpa...@amd.com



Sorry for failing to mention the dependency on the other series here
(https://lkml.org/lkml/2018/3/26/24).

Suravee


Re: [PATCH 2/2] x86/CPU/AMD: Derive CPU topology from CPUID Fn0xB

2018-03-27 Thread Suravee Suthikulpanit

Hi All,

On 3/26/18 3:05 PM, Suravee Suthikulpanit wrote:

Derive topology information from Extended Topology Enumeration
(CPUID Fn0x000B) when the information is available.

Signed-off-by: Suravee Suthikulpanit 
---
  arch/x86/kernel/cpu/amd.c | 16 
  1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 2c1a9f2..2b40144 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -338,10 +338,18 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
c->cu_id = ebx & 0xff;
  
  		if (c->x86 >= 0x17) {

-   c->cpu_core_id = ebx & 0xff;
+   int err = detect_extended_topology(c);
  
-			if (smp_num_siblings > 1)

-   c->x86_max_cores /= smp_num_siblings;
+   if (err) {
+   c->cpu_core_id = ebx & 0xff;
+
+   if (smp_num_siblings > 1)
+   c->x86_max_cores /= smp_num_siblings;
+   } else {
+   int bits = get_count_order(c->x86_max_cores);
+
+   c->x86_coreid_bits = get_count_order(bits);
+   }
}


I made a mistake here in the attempt to clean up the code. I'll send out V2 to 
fix this.

Thanks,
Suravee


[PATCH] x86/CPU/AMD: Fix LLC ID bit-shift calculation

2018-06-13 Thread Suravee Suthikulpanit
The current logic incorrectly calculates the LLC ID from the APIC ID.
Unless specified otherwise, the LLC ID should be calculated from
the count order of the number of threads sharing cache.

Fixes: 68091ee7ac3c ("Calculate last level cache ID from number of sharing 
threads")
Signed-off-by: Suravee Suthikulpanit 
---
 arch/x86/kernel/cpu/cacheinfo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 38354c6..0c5fcbd 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -671,7 +671,7 @@ void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, int 
cpu, u8 node_id)
num_sharing_cache = ((eax >> 14) & 0xfff) + 1;
 
if (num_sharing_cache) {
-   int bits = get_count_order(num_sharing_cache) - 1;
+   int bits = get_count_order(num_sharing_cache);
 
per_cpu(cpu_llc_id, cpu) = c->apicid >> bits;
}
-- 
2.7.4



[PATCH 0/2] iommu/amd: Revert and remove failing PMC test

2021-04-09 Thread Suravee Suthikulpanit
This has prevented PMC to work on more recent desktop/mobile platforms,
where the PMC power-gating is normally enabled. After consulting
with HW designers and IOMMU maintainer, we have decide to remove
the legacy test altogether to avoid future PMC enabling issues.

Thanks the community for helping to test, investigate, provide data
and report issues on several platforms in the field.

Regards,
Suravee 

Paul Menzel (1):
  Revert "iommu/amd: Fix performance counter initialization"

Suravee Suthikulpanit (1):
  iommu/amd: Remove performance counter pre-initialization test

 drivers/iommu/amd/init.c | 49 ++--
 1 file changed, 2 insertions(+), 47 deletions(-)

-- 
2.17.1



[PATCH 1/2] Revert "iommu/amd: Fix performance counter initialization"

2021-04-09 Thread Suravee Suthikulpanit
From: Paul Menzel 

This reverts commit 6778ff5b21bd8e78c8bd547fd66437cf2657fd9b.

The original commit tries to address an issue, where PMC power-gating
causing the IOMMU PMC pre-init test to fail on certain desktop/mobile
platforms where the power-gating is normally enabled.

There have been several reports that the workaround still does not
guarantee to work, and can add up to 100 ms (on the worst case)
to the boot process on certain platforms such as the MSI B350M MORTAR
with AMD Ryzen 3 2200G.

Therefore, revert this commit as a prelude to removing the pre-init
test.

Link: 
https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753
Cc: Tj (Elloe Linux) 
Cc: Shuah Khan 
Cc: Alexander Monakov 
Cc: David Coe 
Signed-off-by: Paul Menzel 
Signed-off-by: Suravee Suthikulpanit 
---
Note: I have revised the commit message to add more detail
  and remove uncessary information.

 drivers/iommu/amd/init.c | 45 ++--
 1 file changed, 11 insertions(+), 34 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 321f5906e6ed..648cdfd03074 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -12,7 +12,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -257,8 +256,6 @@ static enum iommu_init_state init_state = IOMMU_START_STATE;
 static int amd_iommu_enable_interrupts(void);
 static int __init iommu_go_to_state(enum iommu_init_state state);
 static void init_device_table_dma(void);
-static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr,
-   u8 fxn, u64 *value, bool is_write);
 
 static bool amd_iommu_pre_enabled = true;
 
@@ -1717,11 +1714,13 @@ static int __init init_iommu_all(struct 
acpi_table_header *table)
return 0;
 }
 
-static void __init init_iommu_perf_ctr(struct amd_iommu *iommu)
+static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr,
+   u8 fxn, u64 *value, bool is_write);
+
+static void init_iommu_perf_ctr(struct amd_iommu *iommu)
 {
-   int retry;
struct pci_dev *pdev = iommu->dev;
-   u64 val = 0xabcd, val2 = 0, save_reg, save_src;
+   u64 val = 0xabcd, val2 = 0, save_reg = 0;
 
if (!iommu_feature(iommu, FEATURE_PC))
return;
@@ -1729,39 +1728,17 @@ static void __init init_iommu_perf_ctr(struct amd_iommu 
*iommu)
amd_iommu_pc_present = true;
 
/* save the value to restore, if writable */
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, false) ||
-   iommu_pc_get_set_reg(iommu, 0, 0, 8, &save_src, false))
-   goto pc_false;
-
-   /*
-* Disable power gating by programing the performance counter
-* source to 20 (i.e. counts the reads and writes from/to IOMMU
-* Reserved Register [MMIO Offset 1FF8h] that are ignored.),
-* which never get incremented during this init phase.
-* (Note: The event is also deprecated.)
-*/
-   val = 20;
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 8, &val, true))
+   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, false))
goto pc_false;
 
/* Check if the performance counters can be written to */
-   val = 0xabcd;
-   for (retry = 5; retry; retry--) {
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &val, true) ||
-   iommu_pc_get_set_reg(iommu, 0, 0, 0, &val2, false) ||
-   val2)
-   break;
-
-   /* Wait about 20 msec for power gating to disable and retry. */
-   msleep(20);
-   }
-
-   /* restore */
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, true) ||
-   iommu_pc_get_set_reg(iommu, 0, 0, 8, &save_src, true))
+   if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, &val, true)) ||
+   (iommu_pc_get_set_reg(iommu, 0, 0, 0, &val2, false)) ||
+   (val != val2))
goto pc_false;
 
-   if (val != val2)
+   /* restore */
+   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, true))
goto pc_false;
 
pci_info(pdev, "IOMMU performance counters supported\n");
-- 
2.17.1



[PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-09 Thread Suravee Suthikulpanit
In early AMD desktop/mobile platforms (during 2013), when the IOMMU
Performance Counter (PMC) support was first introduced in
commit 30861ddc9cca ("perf/x86/amd: Add IOMMU Performance Counter
resource management"), there was a HW bug where the counters could not
be accessed. The result was reading of the counter always return zero.

At the time, the suggested workaround was to add a test logic prior
to initializing the PMC feature to check if the counters can be programmed
and read back the same value. This has been working fine until the more
recent desktop/mobile platforms start enabling power gating for the PMC,
which prevents access to the counters. This results in the PMC support
being disabled unnecesarily.

Unfortunatly, there is no documentation of since which generation
of hardware the original PMC HW bug was fixed. Although, it was fixed
soon after the first introduction of the PMC. Base on this, we assume
that the buggy platforms are less likely to be in used, and it should
be relatively safe to remove this legacy logic.

Link: 
https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753
Cc: Tj (Elloe Linux) 
Cc: Shuah Khan 
Cc: Alexander Monakov 
Cc: David Coe 
Cc: Paul Menzel 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/init.c | 24 +---
 1 file changed, 1 insertion(+), 23 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 648cdfd03074..247cdda5d683 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1714,33 +1714,16 @@ static int __init init_iommu_all(struct 
acpi_table_header *table)
return 0;
 }
 
-static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr,
-   u8 fxn, u64 *value, bool is_write);
-
 static void init_iommu_perf_ctr(struct amd_iommu *iommu)
 {
+   u64 val;
struct pci_dev *pdev = iommu->dev;
-   u64 val = 0xabcd, val2 = 0, save_reg = 0;
 
if (!iommu_feature(iommu, FEATURE_PC))
return;
 
amd_iommu_pc_present = true;
 
-   /* save the value to restore, if writable */
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, false))
-   goto pc_false;
-
-   /* Check if the performance counters can be written to */
-   if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, &val, true)) ||
-   (iommu_pc_get_set_reg(iommu, 0, 0, 0, &val2, false)) ||
-   (val != val2))
-   goto pc_false;
-
-   /* restore */
-   if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, true))
-   goto pc_false;
-
pci_info(pdev, "IOMMU performance counters supported\n");
 
val = readl(iommu->mmio_base + MMIO_CNTR_CONF_OFFSET);
@@ -1748,11 +1731,6 @@ static void init_iommu_perf_ctr(struct amd_iommu *iommu)
iommu->max_counters = (u8) ((val >> 7) & 0xf);
 
return;
-
-pc_false:
-   pci_err(pdev, "Unable to read/write to IOMMU perf counter.\n");
-   amd_iommu_pc_present = false;
-   return;
 }
 
 static ssize_t amd_iommu_show_cap(struct device *dev,
-- 
2.17.1



[V2 PATCH 1/2] ACPI / scan: Add support for ACPI _CLS device matching

2015-01-05 Thread Suravee Suthikulpanit
Device drivers typically use ACPI _HIDs/_CIDs listed in struct device_driver
acpi_match_table to match devices. However, for generic drivers, we do
not want to list _HID for all supported devices, and some device classes
do not have _CID (e.g. SATA, USB). Instead, we can leverage ACPI _CLS,
which specifies PCI-defined class code (i.e. base-class, subclass and
programming interface).

This patch adds support for matching ACPI devices using the _CLS method.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/acpi/scan.c | 79 +++--
 include/acpi/acnames.h  |  1 +
 include/linux/acpi.h| 10 ++
 include/linux/device.h  |  1 +
 include/linux/mod_devicetable.h |  6 
 5 files changed, 94 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 16914cc..7b25221 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -987,13 +987,86 @@ static bool acpi_of_driver_match_device(struct device 
*dev,
 bool acpi_driver_match_device(struct device *dev,
  const struct device_driver *drv)
 {
-   if (!drv->acpi_match_table)
-   return acpi_of_driver_match_device(dev, drv);
+   bool ret = false;
 
-   return !!acpi_match_device(drv->acpi_match_table, dev);
+   if (drv->acpi_match_table)
+   ret = !!acpi_match_device(drv->acpi_match_table, dev);
+
+   /* Next, try to match with special "PRP0001" _HID */
+   if (!ret && drv->of_match_table)
+   ret = acpi_of_driver_match_device(dev, drv);
+
+   /* Next, try to match with PCI-defined class-code */
+   if (!ret && drv->acpi_match_cls)
+   ret = acpi_match_device_cls(drv->acpi_match_cls, dev);
+
+   return ret;
 }
 EXPORT_SYMBOL_GPL(acpi_driver_match_device);
 
+/**
+ * acpi_match_device_cls - Match a struct device against a ACPI _CLS method
+ * @dev_cls: A pointer to struct acpi_device_cls object to match against.
+ * @dev: The ACPI device structure to match.
+ *
+ * Check if @dev has a valid ACPI and _CLS handle. If there is a
+ * struct acpi_device_cls object for that handle, use that object to match
+ * against the given struct acpi_device_cls object.
+ *
+ * Return true on success or false on failure.
+ */
+bool acpi_match_device_cls(const struct acpi_device_cls *dev_cls,
+ const struct device *dev)
+{
+   bool ret = false;
+   acpi_status status;
+   union acpi_object *pkg;
+   struct acpi_device_cls cls;
+   struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
+   struct acpi_buffer format = { sizeof("NNN"), "NNN" };
+   struct acpi_buffer state = { 0, NULL };
+   struct acpi_device *adev = ACPI_COMPANION(dev);
+   acpi_handle handle = ACPI_HANDLE(dev);
+
+   if (!handle || !adev || !adev->status.present || !dev_cls)
+   return ret;
+
+   status = acpi_evaluate_object(handle, METHOD_NAME__CLS, NULL, &buffer);
+   if (ACPI_FAILURE(status))
+   return ret;
+
+   /**
+* Note:
+* ACPIv5.1 defines the package to contain 3 integers for
+* Base-Class code, Sub-Class code, and Programming Interface code.
+*/
+   pkg = buffer.pointer;
+   if (!pkg ||
+   (pkg->type != ACPI_TYPE_PACKAGE) ||
+   (pkg->package.count != 3)) {
+   dev_err(&adev->dev, "Invalid _CLS data\n");
+   goto out;
+   }
+
+   state.length = sizeof(struct acpi_device_cls);
+   state.pointer = &cls;
+
+   status = acpi_extract_package(pkg, &format, &state);
+   if (ACPI_FAILURE(status)) {
+   ACPI_EXCEPTION((AE_INFO, status, "Invalid data"));
+   goto out;
+   }
+
+   if ((dev_cls->base_class == cls.base_class) &&
+   (dev_cls->sub_class == cls.sub_class) &&
+   (dev_cls->prog_interface == cls.prog_interface))
+   ret = true;
+out:
+   kfree(buffer.pointer);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(acpi_match_device_cls);
+
 static void acpi_free_power_resources_lists(struct acpi_device *device)
 {
int i;
diff --git a/include/acpi/acnames.h b/include/acpi/acnames.h
index 7461327..22332a6 100644
--- a/include/acpi/acnames.h
+++ b/include/acpi/acnames.h
@@ -51,6 +51,7 @@
 #define METHOD_NAME__BBN"_BBN"
 #define METHOD_NAME__CBA"_CBA"
 #define METHOD_NAME__CID"_CID"
+#define METHOD_NAME__CLS"_CLS"
 #define METHOD_NAME__CRS"_CRS"
 #define METHOD_NAME__DDN"_DDN"
 #define METHOD_NAME__HID"_HID"
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 87f365e..2f2b8ce 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -432,6 +432,10

[V2 PATCH 0/2] Introduce ACPI support for ahci_platform driver

2015-01-05 Thread Suravee Suthikulpanit
This patch series introduce ACPI support for non-PCI AHCI platform driver.
Existing ACPI support for AHCI assumes the device controller is a PCI device.

Also, since there is no ACPI _HID/_CID for generic AHCI controller, the driver
could not use them for matching devices. Therefore, this patch introduces
a mechanism for drivers to match devices using ACPI _CLS method.

This patch series is rebased from and tested with:

http://git.linaro.org/leg/acpi/acpi.git acpi-5.1-v7

This topic was discussed earlier here (as part of introducing support for
AMD Seattle SATA controller):

http://marc.info/?l=linux-arm-kernel&m=141083492521584&w=2

NOTE:
* PATCH 2/2 has already been Acked-by Tejun Heo in V1. I only made
  a minor renaming of the acpi_cls to acpi_match_cls for clarity
  in V2. It probably should be routed together with the PATCH 1/2
  (once acked) since it defines the new member in the struct.

Changes V1 (https://lkml.org/lkml/2014/12/19/345)
* Rebased to 3.19.0-rc2
* Change from acpi_cls in device_driver to acpi_match_cls (Hanjun 
comment)
* Change the matching logic in acpi_driver_match_device() due to the new
  special PRP0001 _HID.
* Simplify the return type of acpi_match_device_cls() to boolean.

Changes from RFC (https://lkml.org/lkml/2014/12/17/446)
* Remove #ifdef and make non-ACPI version of the acpi_match_device_cls
  as inline. (per Arnd)
* Simplify logic to retrieve and evaluate _CLS handle. (per Hanjun)

Suravee Suthikulpanit (2):
  ACPI / scan: Add support for ACPI _CLS device matching
  ata: ahci_platform: Add ACPI _CLS matching

 drivers/acpi/scan.c | 79 +++--
 drivers/ata/Kconfig |  2 +-
 drivers/ata/ahci_platform.c |  3 ++
 include/acpi/acnames.h  |  1 +
 include/linux/acpi.h| 10 ++
 include/linux/device.h  |  1 +
 include/linux/mod_devicetable.h |  6 
 7 files changed, 98 insertions(+), 4 deletions(-)

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V2 PATCH 2/2] ata: ahci_platform: Add ACPI _CLS matching

2015-01-05 Thread Suravee Suthikulpanit
This patch adds ACPI supports for AHCI platform driver, which uses _CLS
method to match the device.

The following is an example of ASL structure in DSDT for a SATA controller,
which contains _CLS package to be matched by the ahci_platform driver:

  Device (AHC0) // AHCI Controller
  {
Name(_HID, "AMDI0600")
Name (_CCA, 1)
Name (_CLS, Package (3)
{
  0x01, // Base Class: Mass Storage
  0x06, // Sub-Class: serial ATA
  0x01, // Interface: AHCI
})
Name (_CRS, ResourceTemplate ()
{
  Memory32Fixed (ReadWrite, 0xE030, 0x0001)
  Interrupt (ResourceConsumer, Level, ActiveHigh, Exclusive,,,) { 387 }
})
  }

Also, since ATA driver should not require PCI support for ATA_ACPI,
this patch removes dependency in the driver/ata/Kconfig.

Acked-by: Tejun Heo 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/ata/Kconfig | 2 +-
 drivers/ata/ahci_platform.c | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/ata/Kconfig b/drivers/ata/Kconfig
index a3a1360..edca892 100644
--- a/drivers/ata/Kconfig
+++ b/drivers/ata/Kconfig
@@ -48,7 +48,7 @@ config ATA_VERBOSE_ERROR
 
 config ATA_ACPI
bool "ATA ACPI Support"
-   depends on ACPI && PCI
+   depends on ACPI
default y
help
  This option adds support for ATA-related ACPI objects.
diff --git a/drivers/ata/ahci_platform.c b/drivers/ata/ahci_platform.c
index 18d5398..ae66974 100644
--- a/drivers/ata/ahci_platform.c
+++ b/drivers/ata/ahci_platform.c
@@ -71,12 +71,15 @@ static const struct of_device_id ahci_of_match[] = {
 };
 MODULE_DEVICE_TABLE(of, ahci_of_match);
 
+static const struct acpi_device_cls ahci_cls = {0x01, 0x06, 0x01};
+
 static struct platform_driver ahci_driver = {
.probe = ahci_probe,
.remove = ata_platform_remove_one,
.driver = {
.name = "ahci",
.of_match_table = ahci_of_match,
+   .acpi_match_cls = &ahci_cls,
.pm = &ahci_pm_ops,
},
 };
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 2/4] PCI: generic: Add support for ARM64 and MSI(x)

2014-12-29 Thread Suravee Suthikulpanit

Hi,

I am not sure if this thread is still alive. I'm trying to see what I 
can do to help clean up/convert to make the PCI GHC also works for arm64 
w/ zero or minimal ifdefs.


Please let me know if someone is already working on this. I noticed that 
Lorenzo's patches has already been in 3.19-rc1, and in Bjorn's 
pci/domain branch. Otherwise, I'll try to continue the work based on the 
sample patch from Arnd here.


On 10/23/14 08:33, Arnd Bergmann wrote:

[...]
diff --git a/drivers/pci/host/pci-host-generic.c 
b/drivers/pci/host/pci-host-generic.c
index 3d2076f59911..3542a7b740e5 100644
--- a/drivers/pci/host/pci-host-generic.c
+++ b/drivers/pci/host/pci-host-generic.c
@@ -40,16 +40,20 @@ struct gen_pci_cfg_windows {

  struct gen_pci {
struct pci_host_bridge  host;
+   struct pci_sys_data sys;
struct gen_pci_cfg_windows  cfg;
-   struct list_headresources;
  };


Arnd, based on the patch here, if we are trying to use the 
pci-host-generic driver on arm64, this means that we are going to have 
to introduce struct pci_sys_data for the arm64 as well (e.g move the 
struct from include/asm/mach/pci.h to include/linux/pci.h). Is this also 
your intention?


Thanks,

Suravee



+static inline struct gen_pci *gen_pci_from_sys(struct pci_sys_data *sys)
+{
+   return container_of(sys, struct gen_pci, sys);
+}
+
  static void __iomem *gen_pci_map_cfg_bus_cam(struct pci_bus *bus,
 unsigned int devfn,
 int where)
  {
-   struct pci_sys_data *sys = bus->sysdata;
-   struct gen_pci *pci = sys->private_data;
+   struct gen_pci *pci = gen_pci_from_sys(bus->sysdata);
resource_size_t idx = bus->number - pci->cfg.bus_range.start;

return pci->cfg.win[idx] + ((devfn << 8) | where);
@@ -64,8 +68,7 @@ static void __iomem *gen_pci_map_cfg_bus_ecam(struct pci_bus 
*bus,
  unsigned int devfn,
  int where)
  {
-   struct pci_sys_data *sys = bus->sysdata;
-   struct gen_pci *pci = sys->private_data;
+   struct gen_pci *pci = gen_pci_from_sys(bus->sysdata);
resource_size_t idx = bus->number - pci->cfg.bus_range.start;

return pci->cfg.win[idx] + ((devfn << 12) | where);
@@ -80,8 +83,7 @@ static int gen_pci_config_read(struct pci_bus *bus, unsigned 
int devfn,
int where, int size, u32 *val)
  {
void __iomem *addr;
-   struct pci_sys_data *sys = bus->sysdata;
-   struct gen_pci *pci = sys->private_data;
+   struct gen_pci *pci = gen_pci_from_sys(bus->sysdata);

addr = pci->cfg.ops->map_bus(bus, devfn, where);

@@ -103,8 +105,7 @@ static int gen_pci_config_write(struct pci_bus *bus, 
unsigned int devfn,
 int where, int size, u32 val)
  {
void __iomem *addr;
-   struct pci_sys_data *sys = bus->sysdata;
-   struct gen_pci *pci = sys->private_data;
+   struct gen_pci *pci = gen_pci_from_sys(bus->sysdata);

addr = pci->cfg.ops->map_bus(bus, devfn, where);

@@ -181,10 +182,10 @@ static void gen_pci_release_of_pci_ranges(struct gen_pci 
*pci)
  {
struct pci_host_bridge_window *win;

-   list_for_each_entry(win, &pci->resources, list)
+   list_for_each_entry(win, &pci->sys.resources, list)
release_resource(win->res);

-   pci_free_resource_list(&pci->resources);
+   pci_free_resource_list(&pci->sys.resources);
  }

  static int gen_pci_parse_request_of_pci_ranges(struct gen_pci *pci)
@@ -237,7 +238,7 @@ static int gen_pci_parse_request_of_pci_ranges(struct 
gen_pci *pci)
if (err)
goto out_release_res;

-   pci_add_resource_offset(&pci->resources, res, offset);
+   pci_add_resource_offset(&pci->sys.resources, res, offset);
}

if (!res_valid) {
@@ -306,17 +307,10 @@ static int gen_pci_parse_map_cfg_windows(struct gen_pci 
*pci)
}

/* Register bus resource */
-   pci_add_resource(&pci->resources, bus_range);
+   pci_add_resource(&pci->sys.resources, bus_range);
return 0;
  }

-static int gen_pci_setup(int nr, struct pci_sys_data *sys)
-{
-   struct gen_pci *pci = sys->private_data;
-   list_splice_init(&pci->resources, &sys->resources);
-   return 1;
-}
-
  static int gen_pci_probe(struct platform_device *pdev)
  {
int err;
@@ -326,17 +320,12 @@ static int gen_pci_probe(struct platform_device *pdev)
struct device *dev = &pdev->dev;
struct device_node *np = dev->of_node;
struct gen_pci *pci = devm_kzalloc(dev, sizeof(*pci), GFP_KERNEL);
-   struct hw_pci hw = {
-   .nr_controllers = 1,
-   .private_data   = (void **)&pci,
-   .setup  = gen_pci_setup,
- 

Re: [Linaro-acpi] [PATCH v7 00/17] Introduce ACPI for ARM64 based on ACPI 5.1

2015-01-16 Thread Suravee Suthikulpanit



On 1/16/15 09:17, Al Stone wrote:

On 01/16/2015 03:20 AM, Catalin Marinas wrote:

On Thu, Jan 15, 2015 at 09:31:53PM +, Al Stone wrote:

On 01/15/2015 11:23 AM, Catalin Marinas wrote:

On Thu, Jan 15, 2015 at 04:26:20PM +, Grant Likely wrote:

On Wed, Jan 14, 2015 at 3:04 PM, Hanjun Guo  wrote:

This is the v7 of ACPI core patches for ARM64 based on ACPI 5.1


I'll get right to the point: Can we please have this series queued up
for v3.20?

[snip ... ]



5. Platform support patches need verification and review
* ACPI core works on at least the Foundation model, Juno, APM
Mustang, and AMD Seattle
* There still are driver patches being discussed. See Al's summary
for details
* As I argued above, the state of driver patches isn't going to be


We are still lacking here. To quote Al, "First version for AMD Seattle
has been posted to the public linaro-acpi mailing list for initial
review". Sorry but I don't follow linaro-acpi list. I don't know what's
in those patches and I can't tell which subsystems they touch, whether
maintainers agree with them. So in conclusion, I'm not confident the
arm64 hardware ACPI story looks that great yet.



This is solely my fault -- too much time on processes, email, and
documentation, not enough time on the Seattle patches.  And not
enough Seattles to go around for someone else to pick up the slack.

I am aware not everyone is subscribed to linaro-acpi; we use that
for internal review before posting more broadly, which is the only
reason I sent them there.

I'm in the middle of updating them as I have time, based on really
good feedback from Arnd; few of them are terribly new (the very first
posting was [0]) -- it's mostly a matter of rebasing, integrating
updates from AMD and others, and reacting to the comments.  One can
also see what these patches will probably look like via one of the
Fedora kernel trees [1].


Do you have some simple branch against mainline with just the ACPI core
patches and what's required for AMD Seattle? I have no plans to dig
through the Fedora kernels.



Nor was I expecting you to; I only added it as additional reference
material, should one be interested.

The version of patches sent to the linaro-acpi list are from the Linaro
acpi.git tree, and are precisely what you describe; those are the ones
being updated.



Catalin,

For Seattle, you could use the https://git.linaro.org/leg/acpi/acpi.git 
acpi-5.1-v7, and it would also need the AHCI ACPI patch here 
(https://lkml.org/lkml/2015/1/5/662).


Thanks,
Suravee
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 09/30] PCI: Separate pci_host_bridge creation out of pci_create_root_bus()

2015-03-21 Thread Suravee Suthikulpanit



On 3/8/15 21:34, Yijing Wang wrote:

This patch separate pci_host_bridge creation out
of pci_create_root_bus(), and try to make a generic
pci_host_bridge, then we could place generic PCI
infos like domain number in it. Also Ripping out
pci_host_bridge creation from pci_create_root_bus()
make code more better readability. Further more,
we could use the generic pci_host_bridge to hold
host bridge specific operations like
pcibios_root_bridge_prepare(). The changes are
transparent to platform host bridge drivers.

Signed-off-by: Yijing Wang 
Signed-off-by: Bjorn Helgaas 
---
  drivers/pci/host-bridge.c |   55 ++
  drivers/pci/pci.h |3 +
  drivers/pci/probe.c   |  114 -
  include/linux/pci.h   |1 +
  4 files changed, 109 insertions(+), 64 deletions(-)

diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c
index 39b2dbe..3bd45e7 100644
--- a/drivers/pci/host-bridge.c
+++ b/drivers/pci/host-bridge.c
@@ -8,6 +8,61 @@

  #include "pci.h"

+static void pci_release_host_bridge_dev(struct device *dev)
+{
+   struct pci_host_bridge *bridge = to_pci_host_bridge(dev);
+
+   if (bridge->release_fn)
+   bridge->release_fn(bridge);
+
+   pci_free_resource_list(&bridge->windows);
+   kfree(bridge);
+}
+
+struct pci_host_bridge *pci_create_host_bridge(
+   struct device *parent, u32 db, struct list_head *resources)
+{
+   int error;
+   int bus = PCI_BUSNUM(db);
+   int domain = PCI_DOMAIN(db);
+   struct pci_host_bridge *host;
+   struct resource_entry *window, *n;
+
+   host = kzalloc(sizeof(*host), GFP_KERNEL);
+   if (!host)
+   return NULL;
+
+   host->busnum = bus;
+   host->domain = domain;
+   /* If support CONFIG_PCI_DOMAINS_GENERIC, use
+* pci_host_assign_domain_nr() to assign domain
+* number instead PCI_DOMAIN(db).
+*/
+   pci_host_assign_domain_nr(host);


At this point, host->dev.parent has not been assigned. However, when 
calling pci_host_assign_domain_nr(host), it calls 
pci_assign_domain_nr(host->dev.parent), which uses parent->of_node 
directly w/o checking if parent is NULL. This ended up causing NULL 
pointer exception when I do the test.


I think we need to moveo host->dev.parent = parent before calling 
pci_host_assign_domain_nr(host).


Thanks,

Suravee
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[V8 PATCH 1/3] ACPICA: Add ACPI _CLS processing

2015-03-30 Thread Suravee Suthikulpanit
ACPI Device configuration often contain _CLS object to suppy PCI-defined
class code for the device. This patch introduces logic to process the _CLS
object.

Acked-by: Mika Westerberg 
Reviewed-by: Hanjun Guo 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/acpi/acpica/acutils.h  |  3 ++
 drivers/acpi/acpica/nsxfname.c | 21 ++--
 drivers/acpi/acpica/utids.c| 73 ++
 3 files changed, 95 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/acpica/acutils.h b/drivers/acpi/acpica/acutils.h
index c2f03e8..2aef850 100644
--- a/drivers/acpi/acpica/acutils.h
+++ b/drivers/acpi/acpica/acutils.h
@@ -430,6 +430,9 @@ acpi_status
 acpi_ut_execute_CID(struct acpi_namespace_node *device_node,
struct acpi_pnp_device_id_list ** return_cid_list);
 
+acpi_status
+acpi_ut_execute_CLS(struct acpi_namespace_node *device_node,
+   struct acpi_pnp_device_id **return_id);
 /*
  * utlock - reader/writer locks
  */
diff --git a/drivers/acpi/acpica/nsxfname.c b/drivers/acpi/acpica/nsxfname.c
index d66c326..590ef06 100644
--- a/drivers/acpi/acpica/nsxfname.c
+++ b/drivers/acpi/acpica/nsxfname.c
@@ -276,11 +276,12 @@ acpi_get_object_info(acpi_handle handle,
struct acpi_pnp_device_id *hid = NULL;
struct acpi_pnp_device_id *uid = NULL;
struct acpi_pnp_device_id *sub = NULL;
+   struct acpi_pnp_device_id *cls = NULL;
char *next_id_string;
acpi_object_type type;
acpi_name name;
u8 param_count = 0;
-   u8 valid = 0;
+   u16 valid = 0;
u32 info_size;
u32 i;
acpi_status status;
@@ -320,7 +321,7 @@ acpi_get_object_info(acpi_handle handle,
if ((type == ACPI_TYPE_DEVICE) || (type == ACPI_TYPE_PROCESSOR)) {
/*
 * Get extra info for ACPI Device/Processor objects only:
-* Run the Device _HID, _UID, _SUB, and _CID methods.
+* Run the Device _HID, _UID, _SUB, _CID and _CLS methods.
 *
 * Note: none of these methods are required, so they may or may
 * not be present for this device. The Info->Valid bitfield is 
used
@@ -351,6 +352,14 @@ acpi_get_object_info(acpi_handle handle,
valid |= ACPI_VALID_SUB;
}
 
+   /* Execute the Device._CLS method */
+
+   status = acpi_ut_execute_CLS(node, &cls);
+   if (ACPI_SUCCESS(status)) {
+   info_size += cls->length;
+   valid |= ACPI_VALID_CLS;
+   }
+
/* Execute the Device._CID method */
 
status = acpi_ut_execute_CID(node, &cid_list);
@@ -468,6 +477,11 @@ acpi_get_object_info(acpi_handle handle,
sub, next_id_string);
}
 
+   if (cls) {
+   next_id_string = acpi_ns_copy_device_id(&info->cls,
+   cls, next_id_string);
+   }
+
if (cid_list) {
info->compatible_id_list.count = cid_list->count;
info->compatible_id_list.list_size = cid_list->list_size;
@@ -507,6 +521,9 @@ cleanup:
if (sub) {
ACPI_FREE(sub);
}
+   if (cls) {
+   ACPI_FREE(cls);
+   }
if (cid_list) {
ACPI_FREE(cid_list);
}
diff --git a/drivers/acpi/acpica/utids.c b/drivers/acpi/acpica/utids.c
index 27431cf..9745065 100644
--- a/drivers/acpi/acpica/utids.c
+++ b/drivers/acpi/acpica/utids.c
@@ -416,3 +416,76 @@ cleanup:
acpi_ut_remove_reference(obj_desc);
return_ACPI_STATUS(status);
 }
+
+/***
+ *
+ * FUNCTION:acpi_ut_execute_CLS
+ *
+ * PARAMETERS:  device_node - Node for the device
+ *  return_id   - Where the string UID is returned
+ *
+ * RETURN:  Status
+ *
+ * DESCRIPTION: Executes the _CLS control method that returns PCI-defined
+ *  class code of the device. The ACPI spec define _CLS as a
+ *  package with three integers. The returned string has format:
+ *
+ *  "bbsspp"
+ *  where:
+ *  bb = Base-class code
+ *  ss = Sub-class code
+ *  pp = Programming Interface code
+ *
+ 
**/
+
+acpi_status
+acpi_ut_execute_CLS(struct acpi_namespace_node *device_node,
+   struct acpi_pnp_device_id **return_id)
+{
+   struct acpi_pnp_device_id *cls;
+   union acpi_operand_object *obj_desc;
+   union acpi_operand_object **cls_objects;
+   acpi_status status;
+
+   ACPI_FUNCTION_TRACE(ut_execute_CLS);
+   status = acpi_ut_ev

  1   2   3   4   5   6   7   8   9   10   >