Re: [PATCH] AMD perf PMU events for AMD Family 17h.

Martin Liška Mon, 27 Aug 2018 07:57:01 -0700

On 08/23/2018 06:16 PM, William Cohen wrote:
> On 08/23/2018 10:31 AM, Arnaldo Carvalho de Melo wrote:
>> Em Thu, Aug 23, 2018 at 01:21:45PM +0200, Martin Liška escreveu:
>>> May I please ping this.
>> I was waiting for someone to give some ack, perhaps Will Cohen can take
>> a brief look and provide that? Will?
>>
>> Thanks,
>>
>> - Arnaldo
>>  
>>> Thanks,
>>> Martin
>>>
>>> On 08/06/2018 10:42 AM, Martin Liška wrote:
>>>> Hello.
>>>>
>>>> Following patch adds PMC events for AMD Family 17 CPUs as defined in [1].
>>>> It covers events described in section: 2.1.13. Regex pattern in mapfile.csv
>>>> covers all CPUs of the family.
>>>>
>>>> Thanks,
>>>> Martin
>>>>
>>>> [1] 
>>>> https://support.amd.com/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf
>>>>
>>>> Signed-off-by: Martin Liška <mli...@suse.cz>
>>>>
>>>> ---
>>>>  .../pmu-events/arch/x86/amdfam17h/cache.json  | 332 ++++++++++++++++++
>>>>  .../pmu-events/arch/x86/amdfam17h/core.json   | 124 +++++++
>>>>  .../arch/x86/amdfam17h/floating-point.json    | 196 +++++++++++
>>>>  .../pmu-events/arch/x86/amdfam17h/memory.json | 225 ++++++++++++
>>>>  .../pmu-events/arch/x86/amdfam17h/other.json  |  51 +++
>>>>  tools/perf/pmu-events/arch/x86/mapfile.csv    |   1 +
>>>>  6 files changed, 929 insertions(+)
>>>>  create mode 100644 tools/perf/pmu-events/arch/x86/amdfam17h/cache.json
>>>>  create mode 100644 tools/perf/pmu-events/arch/x86/amdfam17h/core.json
>>>>  create mode 100644 
>>>> tools/perf/pmu-events/arch/x86/amdfam17h/floating-point.json
>>>>  create mode 100644 tools/perf/pmu-events/arch/x86/amdfam17h/memory.json
>>>>  create mode 100644 tools/perf/pmu-events/arch/x86/amdfam17h/other.json
>>>>
>>>>
> Hi,
> 
> I had already deleted the patch from my mailbox earlier, so I downloaded the 
> patch from the archive and added some inline comments to the attached patch.


Hello.

First, I would like to thank you William. In general, I must fully agree that 
AMD's documentation is quite poor. I'll comment
some questions I have inline:

> 
> -Will
> 
> 
> PATCH-AMD-perf-PMU-events-for-AMD-Family-17h.txt
> 
> 
> From mboxrd@z Thu Jan  1 00:00:00 1970
> Return-Path: <SRS0=/Ixz=KV=vger.kernel.org=linux-kernel-ow...@kernel.org>
> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
>       aws-us-west-2-korg-lkml-1.web.codeaurora.org
> X-Spam-Level: 
> X-Spam-Status: No, score=-1.0 required=3.0 
> tests=HEADER_FROM_DIFFERENT_DOMAINS,
>       MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED autolearn=ham 
> autolearn_force=no
>       version=3.4.0
> Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
>       by smtp.lore.kernel.org (Postfix) with ESMTP id C31F5C46471
>       for <linux-ker...@archiver.kernel.org>; Mon,  6 Aug 2018 08:42:30 +0000 
> (UTC)
> Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
>       by mail.kernel.org (Postfix) with ESMTP id CC72F219E6
>       for <linux-ker...@archiver.kernel.org>; Mon,  6 Aug 2018 08:42:29 +0000 
> (UTC)
> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CC72F219E6
> Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) 
> header.from=suse.cz
> Authentication-Results: mail.kernel.org; spf=none 
> smtp.mailfrom=linux-kernel-ow...@vger.kernel.org
> Received: (majord...@vger.kernel.org) by vger.kernel.org via listexpand
>         id S1727489AbeHFKu0 (ORCPT
>         <rfc822;linux-ker...@archiver.kernel.org>);
>         Mon, 6 Aug 2018 06:50:26 -0400
> Received: from mx2.suse.de ([195.135.220.15]:60316 "EHLO mx1.suse.de"
>         rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP
>         id S1725951AbeHFKu0 (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
>         Mon, 6 Aug 2018 06:50:26 -0400
> X-Virus-Scanned: by amavisd-new at test-mx.suse.de
> Received: from relay1.suse.de (unknown [195.135.220.254])
>         by mx1.suse.de (Postfix) with ESMTP id 99C3AAC9C;
>         Mon,  6 Aug 2018 08:42:19 +0000 (UTC)
> From:   =?UTF-8?Q?Martin_Li=c5=a1ka?= <mli...@suse.cz>
> Subject: [PATCH] AMD perf PMU events for AMD Family 17h.
> To:     gcc-patc...@gcc.gnu.org, linux-perf-us...@vger.kernel.org,
>         lkml <linux-kernel@vger.kernel.org>
> Cc:     Arnaldo Carvalho de Melo <a...@kernel.org>,
>         Jiri Olsa <jo...@redhat.com>
> Message-ID: <3ee15066-429e-b0f2-1255-aab100fad...@suse.cz>
> Date:   Mon, 6 Aug 2018 10:42:19 +0200
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>  Thunderbird/52.9.1
> MIME-Version: 1.0
> Content-Type: multipart/mixed;
>  boundary="------------DD285E7CC6B09B0E203385F4"
> Content-Language: en-US
> Sender: linux-kernel-ow...@vger.kernel.org
> Precedence: bulk
> List-ID: <linux-kernel.vger.kernel.org>
> X-Mailing-List: linux-kernel@vger.kernel.org
> Archived-At: 
> <https://lore.kernel.org/lkml/3ee15066-429e-b0f2-1255-aab100fad...@suse.cz/>
> List-Archive: <https://lore.kernel.org/lkml/>
> List-Post: <mailto:linux-kernel@vger.kernel.org>
> 
> This is a multi-part message in MIME format.
> --------------DD285E7CC6B09B0E203385F4
> Content-Type: text/plain; charset=utf-8
> Content-Transfer-Encoding: 8bit
> 
> Hello.
> 
> Following patch adds PMC events for AMD Family 17 CPUs as defined in [1].
> It covers events described in section: 2.1.13. Regex pattern in mapfile.csv
> covers all CPUs of the family.
> 
> Thanks,
> Martin
> 
> [1] https://support.amd.com/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf
> 
> Signed-off-by: Martin Liška <mli...@suse.cz>
> 
> ---
>  .../pmu-events/arch/x86/amdfam17h/cache.json  | 332 ++++++++++++++++++
>  .../pmu-events/arch/x86/amdfam17h/core.json   | 124 +++++++
>  .../arch/x86/amdfam17h/floating-point.json    | 196 +++++++++++
>  .../pmu-events/arch/x86/amdfam17h/memory.json | 225 ++++++++++++
>  .../pmu-events/arch/x86/amdfam17h/other.json  |  51 +++
>  tools/perf/pmu-events/arch/x86/mapfile.csv    |   1 +
>  6 files changed, 929 insertions(+)
>  create mode 100644 tools/perf/pmu-events/arch/x86/amdfam17h/cache.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/amdfam17h/core.json
>  create mode 100644 
> tools/perf/pmu-events/arch/x86/amdfam17h/floating-point.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/amdfam17h/memory.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/amdfam17h/other.json
> 
> 
> 
> --------------DD285E7CC6B09B0E203385F4
> Content-Type: text/x-patch;
>  name="0001-AMD-perf-PMU-eventts-for-AMD-Family-17h.patch"
> Content-Transfer-Encoding: 7bit
> Content-Disposition: attachment;
>  filename="0001-AMD-perf-PMU-eventts-for-AMD-Family-17h.patch"
> 
> diff --git a/tools/perf/pmu-events/arch/x86/amdfam17h/cache.json 
> b/tools/perf/pmu-events/arch/x86/amdfam17h/cache.json
> new file mode 100644
> index 000000000000..6a41cc9d1d5e
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/x86/amdfam17h/cache.json
> @@ -0,0 +1,332 @@
> +[
> +  {
> +    "EventName": "ic_fw32",
> +    "EventCode": "0x80",
> +    "BriefDescription": "The number of 32B fetch windows transferred from IC 
> pipe to DE instruction decoder (includes non-cacheable and cacheable fill 
> responses)."
> +  },
> +  {
> +    "EventName": "ic_fw32_miss",
> +    "EventCode": "0x81",
> +    "BriefDescription": "The number of 32B fetch windows tried to read the 
> L1 IC and missed in the full tag."
> +  },
> +  {
> +    "EventName": "ic_cache_fill_l2",
> +    "EventCode": "0x82",
> +    "BriefDescription": "The number of 64 byte instruction cache line was 
> fulfilled from the L2 cache."
> +  },
> +  {
> +    "EventName": "ic_cache_fill_sys",
> +    "EventCode": "0x83",
> +    "BriefDescription": "The number of 64 byte instruction cache line 
> fulfilled from system memory or another cache."
> +  },
> +  {
> +    "EventName": "bp_l1_tlb_miss_l2_hit",
> +    "EventCode": "0x84",
> +    "BriefDescription": "The number of instruction fetches that miss in the 
> L1 ITLB but hit in the L2 ITLB."
> +  },
> +  {
> +    "EventName": "bp_l1_tlb_miss_l2_miss",
> +    "EventCode": "0x85",
> +    "BriefDescription": "The number of instruction fetches that miss in both 
> the L1 and L2 TLBs."
> +  },
> +  {
> +    "EventName": "bp_snp_re_sync",
> +    "EventCode": "0x86",
> +    "BriefDescription": "The number of pipeline restarts caused by 
> invalidating probes that hit on the instruction stream currently being 
> executed. This would happen if the active instruction stream was being 
> modified by another processor in an MP system - typically a highly unlikely 
> event."
> +  },
> +  {
> +    "EventName": "ic_fetch_stall.ic_stall_any",
> +    "EventCode": "0x87",
> +    "BriefDescription": "IC pipe was stalled during this clock cycle for any 
> reason (nothing valid in pipe ICM1).",
> +    "PublicDescription": "Instruction Pipe Stall. IC pipe was stalled during 
> this clock cycle for any reason (nothing valid in pipe ICM1).",
> +    "UMask": "0x4"
> +  },
> +  {
> +    "EventName": "ic_fetch_stall.ic_stall_dq_empty",
> +    "EventCode": "0x87",
> +    "BriefDescription": "IC pipe was stalled during this clock cycle 
> (including IC to OC fetches) due to DQ empty.",
> +    "PublicDescription": "Instruction Pipe Stall. IC pipe was stalled during 
> this clock cycle (including IC to OC fetches) due to DQ empty.",
> +    "UMask": "0x2"
> +  },
> +  {
> +    "EventName": "ic_fetch_stall.ic_stall_back_pressure",
> +    "EventCode": "0x87",
> +    "BriefDescription": "IC pipe was stalled during this clock cycle 
> (including IC to OC fetches) due to back-pressure.",
> +    "PublicDescription": "Instruction Pipe Stall. IC pipe was stalled during 
> this clock cycle (including IC to OC fetches) due to back-pressure.",
> +    "UMask": "0x1"
> +  },
> 
> Aren't the following bp_l1_btb_correct and bp_l2btb_correct branch prediction 
> instructions should they be in a branch.json file rather than be lumped in 
> with the cache perf events?
> 
> +  {
> +    "EventName": "bp_l1_btb_correct",
> +    "EventCode": "0x8a",
> +    "BriefDescription": "L1 BTB Correction."
> +  },
> +  {
> +    "EventName": "bp_l2_btb_correct",
> +    "EventCode": "0x8b",
> +    "BriefDescription": "L2 BTB Correction."
> +  },
> +  {
> +    "EventName": "ic_cache_inval.l2_invalidating_probe",
> +    "EventCode": "0x8c",
> +    "BriefDescription": "IC line invalidated due to L2 invalidating probe 
> (external or LS).",
> +    "PublicDescription": "The number of instruction cache lines invalidated. 
> A non-SMC event is CMC (cross modifying code), either from the other thread 
> of the core or another core. IC line invalidated due to L2 invalidating probe 
> (external or LS).",
> +    "UMask": "0x2"
> +  },
> +  {
> +    "EventName": "ic_cache_inval.fill_invalidated",
> +    "EventCode": "0x8c",
> +    "BriefDescription": "IC line invalidated due to overwriting fill 
> response.",
> +    "PublicDescription": "The number of instruction cache lines invalidated. 
> A non-SMC event is CMC (cross modifying code), either from the other thread 
> of the core or another core. IC line invalidated due to overwriting fill 
> response.",
> +    "UMask": "0x1"
> +  },
> +  {
> +    "EventName": "bp_tlb_rel",
> +    "EventCode": "0x99",
> +    "BriefDescription": "The number of ITLB reload requests."
> +  },
> 
> The AMD documentions isn't really clear what the 
> ic_oc_mode_switch.oc_ic_mode_switch and ic_oc_mode_switch.ic_oc_mode_switch 
> do.  Should these two events go into the other.json?
> 
> +  {
> +    "EventName": "ic_oc_mode_switch.oc_ic_mode_switch",
> +    "EventCode": "0x28a",
> +    "BriefDescription": "OC to IC mode switch.",
> +    "PublicDescription": "OC Mode Switch. OC to IC mode switch.",
> +    "UMask": "0x2"
> +  },
> +  {
> +    "EventName": "ic_oc_mode_switch.ic_oc_mode_switch",
> +    "EventCode": "0x28a",
> +    "BriefDescription": "IC to OC mode switch.",
> +    "PublicDescription": "OC Mode Switch. IC to OC mode switch.",
> +    "UMask": "0x1"
> +  },
> +  {
> +    "EventName": "l2_request_g1.rd_blk_l",
> +    "EventCode": "0x60",
> +    "BriefDescription": "Requests to L2 Group1.",
> +    "PublicDescription": "Requests to L2 Group1.",
> +    "UMask": "0x80"
> +  },
> +  {
> +    "EventName": "l2_request_g1.rd_blk_x",
> +    "EventCode": "0x60",
> +    "BriefDescription": "Requests to L2 Group1.",
> +    "PublicDescription": "Requests to L2 Group1.",
> +    "UMask": "0x40"
> +  },
> +  {
> +    "EventName": "l2_request_g1.ls_rd_blk_c_s",
> +    "EventCode": "0x60",
> +    "BriefDescription": "Requests to L2 Group1.",
> +    "PublicDescription": "Requests to L2 Group1.",
> +    "UMask": "0x20"
> +  },
> +  {
> +    "EventName": "l2_request_g1.cacheable_ic_read",
> +    "EventCode": "0x60",
> +    "BriefDescription": "Requests to L2 Group1.",
> +    "PublicDescription": "Requests to L2 Group1.",
> +    "UMask": "0x10"
> +  },
> +  {
> +    "EventName": "l2_request_g1.change_to_x",
> +    "EventCode": "0x60",
> +    "BriefDescription": "Requests to L2 Group1.",
> +    "PublicDescription": "Requests to L2 Group1.",
> +    "UMask": "0x8"
> +  },
> +  {
> +    "EventName": "l2_request_g1.prefetch_l2",
> +    "EventCode": "0x60",
> +    "BriefDescription": "Requests to L2 Group1.",
> +    "PublicDescription": "Requests to L2 Group1.",
> +    "UMask": "0x4"
> +  },
> +  {
> +    "EventName": "l2_request_g1.l2_hw_pf",
> +    "EventCode": "0x60",
> +    "BriefDescription": "Requests to L2 Group1.",
> +    "PublicDescription": "Requests to L2 Group1.",
> +    "UMask": "0x2"
> +  },
> +  {
> +    "EventName": "l2_request_g1.other_requests",
> +    "EventCode": "0x60",
> +    "BriefDescription": "Events covered by l2_request_g2.",
> +    "PublicDescription": "Requests to L2 Group1. Events covered by 
> l2_request_g2.",
> +    "UMask": "0x1"
> +  },
> +  {
> +    "EventName": "l2_request_g2.group1",
> +    "EventCode": "0x61",
> +    "BriefDescription": "All Group 1 commands not in unit0.",
> +    "PublicDescription": "Multi-events in that LS and IF requests can be 
> received simultaneous. All Group 1 commands not in unit0.",
> +    "UMask": "0x80"
> +  },
> +  {
> +    "EventName": "l2_request_g2.ls_rd_sized",
> +    "EventCode": "0x61",
> +    "BriefDescription": "RdSized, RdSized32, RdSized64.",
> +    "PublicDescription": "Multi-events in that LS and IF requests can be 
> received simultaneous. RdSized, RdSized32, RdSized64.",
> +    "UMask": "0x40"
> +  },
> +  {
> +    "EventName": "l2_request_g2.ls_rd_sized_nc",
> +    "EventCode": "0x61",
> +    "BriefDescription": "RdSizedNC, RdSized32NC, RdSized64NC.",
> +    "PublicDescription": "Multi-events in that LS and IF requests can be 
> received simultaneous. RdSizedNC, RdSized32NC, RdSized64NC.",
> +    "UMask": "0x20"
> +  },
> +  {
> +    "EventName": "l2_request_g2.ic_rd_sized",
> +    "EventCode": "0x61",
> +    "BriefDescription": "Multi-events in that LS and IF requests can be 
> received simultaneous.",
> +    "PublicDescription": "Multi-events in that LS and IF requests can be 
> received simultaneous.",
> +    "UMask": "0x10"
> +  },
> +  {
> +    "EventName": "l2_request_g2.ic_rd_sized_nc",
> +    "EventCode": "0x61",
> +    "BriefDescription": "Multi-events in that LS and IF requests can be 
> received simultaneous.",
> +    "PublicDescription": "Multi-events in that LS and IF requests can be 
> received simultaneous.",
> +    "UMask": "0x8"
> +  },
> +  {
> +    "EventName": "l2_request_g2.smc_inval",
> +    "EventCode": "0x61",
> +    "BriefDescription": "Multi-events in that LS and IF requests can be 
> received simultaneous.",
> +    "PublicDescription": "Multi-events in that LS and IF requests can be 
> received simultaneous.",
> +    "UMask": "0x4"
> +  },
> +  {
> +    "EventName": "l2_request_g2.bus_locks_originator",
> +    "EventCode": "0x61",
> +    "BriefDescription": "Multi-events in that LS and IF requests can be 
> received simultaneous.",
> +    "PublicDescription": "Multi-events in that LS and IF requests can be 
> received simultaneous.",
> +    "UMask": "0x2"
> +  },
> +  {
> +    "EventName": "l2_request_g2.bus_locks_responses",
> +    "EventCode": "0x61",
> +    "BriefDescription": "Multi-events in that LS and IF requests can be 
> received simultaneous.",
> +    "PublicDescription": "Multi-events in that LS and IF requests can be 
> received simultaneous.",
> +    "UMask": "0x1"
> +  },
> 
> The following event brief description for l2_latency is too long.  For this 
> description there is no way to program event l2_request_g1 unit mask to be 
> FEH. The l2_request_g1 only (and other events) configurations only allow 
> setting a single bit.

Does it mean that I should just trim mentioning of 'l2_request_g1 == FEh' from 
description?

> 
> +  {
> +    "EventName": "l2_latency.l2_cycles_waiting_on_fills",
> +    "EventCode": "0x62",
> +    "BriefDescription": "Total cycles spent waiting for L2 fills to complete 
> from L3 or memory, divided by four. This may be used to calculate average 
> latency by multiplying this count by four and then dividing by the total 
> number of L2 fills (unit mask l2_request_g1 == FEh). Event counts are for 
> both threads. To calculate average latency, the number of fills from both 
> threads must be used.",
> +    "PublicDescription": "Total cycles spent waiting for L2 fills to 
> complete from L3 or memory, divided by four. This may be used to calculate 
> average latency by multiplying this count by four and then dividing by the 
> total number of L2 fills (unit mask l2_request_g1 == FEh). Event counts are 
> for both threads. To calculate average latency, the number of fills from both 
> threads must be used.",
> +    "UMask": "0x1"
> +  },
> 
> The AMD manual doesn't provide much details, but are the following 
> l2_wbc_req.* events suppose to have identical *Description sections?

I will include 'write'/'close'/'line' flush and 'instruction line flush' in the 
description. Will it be fine?

> 
> +  {
> +    "EventName": "l2_wbc_req.wcb_write",
> +    "EventCode": "0x63",
> +    "BriefDescription": "LS to L2 WBC requests.",
> +    "PublicDescription": "LS to L2 WBC requests.",
> +    "UMask": "0x40"
> +  },
> +  {
> +    "EventName": "l2_wbc_req.wcb_close",
> +    "EventCode": "0x63",
> +    "BriefDescription": "LS to L2 WBC requests.",
> +    "PublicDescription": "LS to L2 WBC requests.",
> +    "UMask": "0x20"
> +  },
> +  {
> +    "EventName": "l2_wbc_req.cache_line_flush",
> +    "EventCode": "0x63",
> +    "BriefDescription": "LS to L2 WBC requests.",
> +    "PublicDescription": "LS to L2 WBC requests.",
> +    "UMask": "0x10"
> +  },
> +  {
> +    "EventName": "l2_wbc_req.i_line_flush",
> +    "EventCode": "0x63",
> +    "BriefDescription": "LS to L2 WBC requests.",
> +    "PublicDescription": "LS to L2 WBC requests.",
> +    "UMask": "0x8"
> +  },
> +  {
> +    "EventName": "l2_wbc_req.zero_byte_store",
> +    "EventCode": "0x63",
> +    "BriefDescription": "This becomes WriteNoData at SDP; this count does 
> not include DVM Sync Ops and bus locks which are counted in l2_request_g2.",
> +    "PublicDescription": "LS to L2 WBC requests. This becomes WriteNoData at 
> SDP; this count does not include DVM Sync Ops and bus locks which are counted 
> in l2_request_g2.",
> +    "UMask": "0x4"
> +  },
> +  {
> +    "EventName": "l2_wbc_req.local_ic_clr",
> +    "EventCode": "0x63",
> +    "BriefDescription": "Local IC Clear.",
> +    "PublicDescription": "LS to L2 WBC requests. Local IC Clear.",
> +    "UMask": "0x2"
> +  },
> +  {
> +    "EventName": "l2_wbc_req.cl_zero",
> +    "EventCode": "0x63",
> +    "BriefDescription": "Cache Line Zero.",
> +    "PublicDescription": "LS to L2 WBC requests. Cache Line Zero.",
> +    "UMask": "0x1"
> +  },
> +  {
> +    "EventName": "l2_cache_req_stat.ls_rd_blk_cs",
> +    "EventCode": "0x64",
> +    "BriefDescription": "LS ReadBlock C/S Hit.",
> +    "PublicDescription": "This event does not count accesses to the L2 cache 
> by the L2 prefetcher, but it does count accesses by the L1 prefetcher. LS 
> ReadBlock C/S Hit.",
> +    "UMask": "0x80"
> +  },
> +  {
> +    "EventName": "l2_cache_req_stat.ls_rd_blk_l_hit_x",
> +    "EventCode": "0x64",
> +    "BriefDescription": "LS Read Block L Hit X.",
> +    "PublicDescription": "This event does not count accesses to the L2 cache 
> by the L2 prefetcher, but it does count accesses by the L1 prefetcher. LS 
> Read Block L Hit X.",
> +    "UMask": "0x40"
> +  },
> +  {
> +    "EventName": "l2_cache_req_stat.ls_rd_blk_l_hit_s",
> +    "EventCode": "0x64",
> +    "BriefDescription": "LsRdBlkL Hit Shared.",
> +    "PublicDescription": "This event does not count accesses to the L2 cache 
> by the L2 prefetcher, but it does count accesses by the L1 prefetcher. 
> LsRdBlkL Hit Shared.",
> +    "UMask": "0x20"
> +  },
> +  {
> +    "EventName": "l2_cache_req_stat.ls_rd_blk_x",
> +    "EventCode": "0x64",
> +    "BriefDescription": "LsRdBlkX/ChgToX Hit X.  Count RdBlkX finding Shared 
> as a Miss.",
> +    "PublicDescription": "This event does not count accesses to the L2 cache 
> by the L2 prefetcher, but it does count accesses by the L1 prefetcher. 
> LsRdBlkX/ChgToX Hit X.  Count RdBlkX finding Shared as a Miss.",
> +    "UMask": "0x10"
> +  },
> +  {
> +    "EventName": "l2_cache_req_stat.ls_rd_blk_c",
> +    "EventCode": "0x64",
> +    "BriefDescription": "LS Read Block C S L X Change to X Miss.",
> +    "PublicDescription": "This event does not count accesses to the L2 cache 
> by the L2 prefetcher, but it does count accesses by the L1 prefetcher. LS 
> Read Block C S L X Change to X Miss.",
> +    "UMask": "0x8"
> +  },
> +  {
> +    "EventName": "l2_cache_req_stat.ic_fill_hit_x",
> +    "EventCode": "0x64",
> +    "BriefDescription": "IC Fill Hit Exclusive Stale.",
> +    "PublicDescription": "This event does not count accesses to the L2 cache 
> by the L2 prefetcher, but it does count accesses by the L1 prefetcher. IC 
> Fill Hit Exclusive Stale.",
> +    "UMask": "0x4"
> +  },
> +  {
> +    "EventName": "l2_cache_req_stat.ic_fill_hit_s",
> +    "EventCode": "0x64",
> +    "BriefDescription": "IC Fill Hit Shared.",
> +    "PublicDescription": "This event does not count accesses to the L2 cache 
> by the L2 prefetcher, but it does count accesses by the L1 prefetcher. IC 
> Fill Hit Shared.",
> +    "UMask": "0x2"
> +  },
> +  {
> +    "EventName": "l2_cache_req_stat.ic_fill_miss",
> +    "EventCode": "0x64",
> +    "BriefDescription": "IC Fill Miss.",
> +    "PublicDescription": "This event does not count accesses to the L2 cache 
> by the L2 prefetcher, but it does count accesses by the L1 prefetcher. IC 
> Fill Miss.",
> +    "UMask": "0x1"
> +  },
> +  {
> +    "EventName": "l2_fill_pending.l2_fill_busy",
> +    "EventCode": "0x6d",
> +    "BriefDescription": "Total cycles spent with one or more fill requests 
> in flight from L2.",
> +    "PublicDescription": "Total cycles spent with one or more fill requests 
> in flight from L2.",
> +    "UMask": "0x1"
> +  }
> +]
> \ No newline at end of file
> diff --git a/tools/perf/pmu-events/arch/x86/amdfam17h/core.json 
> b/tools/perf/pmu-events/arch/x86/amdfam17h/core.json
> new file mode 100644
> index 000000000000..79754a187fe5
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/x86/amdfam17h/core.json
> @@ -0,0 +1,124 @@
> +[
> +  {
> +    "EventName": "ex_ret_instr",
> +    "EventCode": "0xc0",
> +    "BriefDescription": "Retired Instructions."
> +  },
> 
> For the following ex_ret_* instruction make the Briefdescription in a form 
> like the ex_ret_instr above and move the existing BriefDescription to the 
> long description.
> 
> +  {
> +    "EventName": "ex_ret_cops",
> +    "EventCode": "0xc1",
> +    "BriefDescription": "The number of uOps retired. This includes all 
> processor activity (instructions, exceptions, interrupts, microcode assists, 
> etc.). The number of events logged per cycle can vary from 0 to 4."
> +  },
> +  {
> +    "EventName": "ex_ret_brn",
> +    "EventCode": "0xc2",
> +    "BriefDescription": "The number of branch instructions retired. This 
> includes all types of architectural control flow changes, including 
> exceptions and interrupts."
> +  },
> +  {
> +    "EventName": "ex_ret_brn_misp",
> +    "EventCode": "0xc3",
> +    "BriefDescription": "The number of branch instructions retired, of any 
> type, that were not correctly predicted. This includes those for which 
> prediction is not attempted (far control transfers, exceptions and 
> interrupts)."
> +  },
> +  {
> +    "EventName": "ex_ret_brn_tkn",
> +    "EventCode": "0xc4",
> +    "BriefDescription": "The number of taken branches that were retired. 
> This includes all types of architectural control flow changes, including 
> exceptions and interrupts."
> +  },
> +  {
> +    "EventName": "ex_ret_brn_tkn_misp",
> +    "EventCode": "0xc5",
> +    "BriefDescription": "The number of retired taken branch instructions 
> that were mispredicted."
> +  },
> +  {
> +    "EventName": "ex_ret_brn_far",
> +    "EventCode": "0xc6",
> +    "BriefDescription": "The number of far control transfers retired 
> including far call/jump/return, IRET, SYSCALL and SYSRET, plus exceptions and 
> interrupts. Far control transfers are not subject to branch prediction."
> +  },
> +  {
> +    "EventName": "ex_ret_brn_resync",
> +    "EventCode": "0xc7",
> +    "BriefDescription": "The number of resync branches. These reflect 
> pipeline restarts due to certain microcode assists and events such as writes 
> to the active instruction stream, among other things. Each occurrence 
> reflects a restart penalty similar to a branch mispredict. This is relatively 
> rare."
> +  },
> +  {
> +    "EventName": "ex_ret_near_ret",
> +    "EventCode": "0xc8",
> +    "BriefDescription": "The number of near return instructions (RET or RET 
> Iw) retired."
> +  },
> +  {
> +    "EventName": "ex_ret_near_ret_mispred",
> +    "EventCode": "0xc9",
> +    "BriefDescription": "The number of near returns retired that were not 
> correctly predicted by the return address predictor. Each such mispredict 
> incurs the same penalty as a mispredicted conditional branch instruction."
> +  },
> +  {
> +    "EventName": "ex_ret_brn_ind_misp",
> +    "EventCode": "0xca",
> +    "BriefDescription": "Retired Indirect Branch Instructions Mispredicted."
> +  },
> +  {
> +    "EventName": "ex_ret_mmx_fp_instr.sse_instr",
> +    "EventCode": "0xcb",
> +    "BriefDescription": "SSE instructions (SSE, SSE2, SSE3, SSSE3, SSE4A, 
> SSE41, SSE42, AVX).",
> +    "PublicDescription": "The number of MMX, SSE or x87 instructions 
> retired. The UnitMask allows the selection of the individual classes of 
> instructions as given in the table. Each increment represents one complete 
> instruction. Since this event includes non-numeric instructions it is not 
> suitable for measuring MFLOPS. SSE instructions (SSE, SSE2, SSE3, SSSE3, 
> SSE4A, SSE41, SSE42, AVX).",
> +    "UMask": "0x4"
> +  },
> +  {
> +    "EventName": "ex_ret_mmx_fp_instr.mmx_instr",
> +    "EventCode": "0xcb",
> +    "BriefDescription": "MMX instructions.",
> +    "PublicDescription": "The number of MMX, SSE or x87 instructions 
> retired. The UnitMask allows the selection of the individual classes of 
> instructions as given in the table. Each increment represents one complete 
> instruction. Since this event includes non-numeric instructions it is not 
> suitable for measuring MFLOPS. MMX instructions.",
> +    "UMask": "0x2"
> +  },
> +  {
> +    "EventName": "ex_ret_mmx_fp_instr.x87_instr",
> +    "EventCode": "0xcb",
> +    "BriefDescription": "x87 instructions.",
> +    "PublicDescription": "The number of MMX, SSE or x87 instructions 
> retired. The UnitMask allows the selection of the individual classes of 
> instructions as given in the table. Each increment represents one complete 
> instruction. Since this event includes non-numeric instructions it is not 
> suitable for measuring MFLOPS. x87 instructions.",
> +    "UMask": "0x1"
> +  },
> +  {
> +    "EventName": "ex_ret_cond",
> +    "EventCode": "0xd1",
> +    "BriefDescription": "Retired Conditional Branch Instructions."
> +  },
> +  {
> +    "EventName": "ex_ret_cond_misp",
> +    "EventCode": "0xd2",
> +    "BriefDescription": "Retired Conditional Branch Instructions 
> Mispredicted."
> +  },
> +  {
> +    "EventName": "ex_div_busy",
> +    "EventCode": "0xd3",
> +    "BriefDescription": "Div Cycles Busy count."
> +  },
> +  {
> +    "EventName": "ex_div_count",
> +    "EventCode": "0xd4",
> +    "BriefDescription": "Div Op Count."
> +  },
> +  {
> +    "EventName": "ex_tagged_ibs_ops.ibs_count_rollover",
> +    "EventCode": "0x1cf",
> +    "BriefDescription": "Number of times an op could not be tagged by IBS 
> because of a previous tagged op that has not retired.",
> +    "PublicDescription": "Tagged IBS Ops. Number of times an op could not be 
> tagged by IBS because of a previous tagged op that has not retired.",
> +    "UMask": "0x4"
> +  },
> +  {
> +    "EventName": "ex_tagged_ibs_ops.ibs_tagged_ops_ret",
> +    "EventCode": "0x1cf",
> +    "BriefDescription": "Number of Ops tagged by IBS that retired.",
> +    "PublicDescription": "Tagged IBS Ops. Number of Ops tagged by IBS that 
> retired.",
> +    "UMask": "0x2"
> +  },
> +  {
> +    "EventName": "ex_tagged_ibs_ops.ibs_tagged_ops",
> +    "EventCode": "0x1cf",
> +    "BriefDescription": "Number of Ops tagged by IBS.",
> +    "PublicDescription": "Tagged IBS Ops. Number of Ops tagged by IBS.",
> +    "UMask": "0x1"
> +  },
> +  {
> +    "EventName": "ex_ret_fus_brnch_inst",
> +    "EventCode": "0x1d0",
> +    "BriefDescription": "The number of fused retired branch instructions 
> retired per cycle. The number of events logged per cycle can vary from 0 to 
> 3."
> +  }
> +]
> \ No newline at end of file
> diff --git a/tools/perf/pmu-events/arch/x86/amdfam17h/floating-point.json 
> b/tools/perf/pmu-events/arch/x86/amdfam17h/floating-point.json
> new file mode 100644
> index 000000000000..529e95c2d4bb
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/x86/amdfam17h/floating-point.json
> @@ -0,0 +1,196 @@
> 
> For the fpu_pipe_assignement.* does it make sense to just allow measurement 
> of one pipe at a time?  Seems like the likely use cases would be 0xf0 (dual, 
> all multi-pipe uOps)  and 0x0f (total, total number of uOps).  Are people 
> going to really care about number of uOps to Pipe3 vs Pipe0?
> 
> +[
> +  {
> +    "EventName": "fpu_pipe_assignment.dual3",
> +    "EventCode": "0x00",
> +    "BriefDescription": "Total number multi-pipe uOps assigned to Pipe 3.",
> +    "PublicDescription": "The number of operations (uOps) and dual-pipe uOps 
> dispatched to each of the 4 FPU execution pipelines. This event reflects how 
> busy the FPU pipelines are and may be used for workload characterization. 
> This includes all operations performed by x87, MMXTM, and SSE instructions, 
> including moves. Each increment represents a one- cycle dispatch event. This 
> event is a speculative event. Since this event includes non-numeric 
> operations it is not suitable for measuring MFLOPS. Total number multi-pipe 
> uOps assigned to Pipe 3.",
> +    "UMask": "0x80"
> +  },
> +  {
> +    "EventName": "fpu_pipe_assignment.dual2",
> +    "EventCode": "0x00",
> +    "BriefDescription": "Total number multi-pipe uOps assigned to Pipe 2.",
> +    "PublicDescription": "The number of operations (uOps) and dual-pipe uOps 
> dispatched to each of the 4 FPU execution pipelines. This event reflects how 
> busy the FPU pipelines are and may be used for workload characterization. 
> This includes all operations performed by x87, MMXTM, and SSE instructions, 
> including moves. Each increment represents a one- cycle dispatch event. This 
> event is a speculative event. Since this event includes non-numeric 
> operations it is not suitable for measuring MFLOPS. Total number multi-pipe 
> uOps assigned to Pipe 2.",
> +    "UMask": "0x40"
> +  },
> +  {
> +    "EventName": "fpu_pipe_assignment.dual1",
> +    "EventCode": "0x00",
> +    "BriefDescription": "Total number multi-pipe uOps assigned to Pipe 1.",
> +    "PublicDescription": "The number of operations (uOps) and dual-pipe uOps 
> dispatched to each of the 4 FPU execution pipelines. This event reflects how 
> busy the FPU pipelines are and may be used for workload characterization. 
> This includes all operations performed by x87, MMXTM, and SSE instructions, 
> including moves. Each increment represents a one- cycle dispatch event. This 
> event is a speculative event. Since this event includes non-numeric 
> operations it is not suitable for measuring MFLOPS. Total number multi-pipe 
> uOps assigned to Pipe 1.",
> +    "UMask": "0x20"
> +  },
> +  {
> +    "EventName": "fpu_pipe_assignment.dual0",
> +    "EventCode": "0x00",
> +    "BriefDescription": "Total number multi-pipe uOps assigned to Pipe 0.",
> +    "PublicDescription": "The number of operations (uOps) and dual-pipe uOps 
> dispatched to each of the 4 FPU execution pipelines. This event reflects how 
> busy the FPU pipelines are and may be used for workload characterization. 
> This includes all operations performed by x87, MMXTM, and SSE instructions, 
> including moves. Each increment represents a one- cycle dispatch event. This 
> event is a speculative event. Since this event includes non-numeric 
> operations it is not suitable for measuring MFLOPS. Total number multi-pipe 
> uOps assigned to Pipe 0.",
> +    "UMask": "0x10"
> +  },
> +  {
> +    "EventName": "fpu_pipe_assignment.total3",
> +    "EventCode": "0x00",
> +    "BriefDescription": "Total number uOps assigned to Pipe 3.",
> +    "PublicDescription": "The number of operations (uOps) and dual-pipe uOps 
> dispatched to each of the 4 FPU execution pipelines. This event reflects how 
> busy the FPU pipelines are and may be used for workload characterization. 
> This includes all operations performed by x87, MMXTM, and SSE instructions, 
> including moves. Each increment represents a one- cycle dispatch event. This 
> event is a speculative event. Since this event includes non-numeric 
> operations it is not suitable for measuring MFLOPS. Total number uOps 
> assigned to Pipe 3.",
> +    "UMask": "0x8"
> +  },
> +  {
> +    "EventName": "fpu_pipe_assignment.total2",
> +    "EventCode": "0x00",
> +    "BriefDescription": "Total number uOps assigned to Pipe 2.",
> +    "PublicDescription": "The number of operations (uOps) and dual-pipe uOps 
> dispatched to each of the 4 FPU execution pipelines. This event reflects how 
> busy the FPU pipelines are and may be used for workload characterization. 
> This includes all operations performed by x87, MMXTM, and SSE instructions, 
> including moves. Each increment represents a one- cycle dispatch event. This 
> event is a speculative event. Since this event includes non-numeric 
> operations it is not suitable for measuring MFLOPS. Total number uOps 
> assigned to Pipe 2.",
> +    "UMask": "0x4"
> +  },
> +  {
> +    "EventName": "fpu_pipe_assignment.total1",
> +    "EventCode": "0x00",
> +    "BriefDescription": "Total number uOps assigned to Pipe 1.",
> +    "PublicDescription": "The number of operations (uOps) and dual-pipe uOps 
> dispatched to each of the 4 FPU execution pipelines. This event reflects how 
> busy the FPU pipelines are and may be used for workload characterization. 
> This includes all operations performed by x87, MMXTM, and SSE instructions, 
> including moves. Each increment represents a one- cycle dispatch event. This 
> event is a speculative event. Since this event includes non-numeric 
> operations it is not suitable for measuring MFLOPS. Total number uOps 
> assigned to Pipe 1.",
> +    "UMask": "0x2"
> +  },
> +  {
> +    "EventName": "fpu_pipe_assignment.total0",
> +    "EventCode": "0x00",
> +    "BriefDescription": "Total number uOps assigned to Pipe 0.",
> +    "PublicDescription": "The number of operations (uOps) and dual-pipe uOps 
> dispatched to each of the 4 FPU execution pipelines. This event reflects how 
> busy the FPU pipelines are and may be used for workload characterization. 
> This includes all operations performed by x87, MMXTM, and SSE instructions, 
> including moves. Each increment represents a one- cycle dispatch event. This 
> event is a speculative event. Since this event includes non-numeric 
> operations it is not suitable for measuring MFLOPS. Total number uOps 
> assigned to Pipe 0.",
> +    "UMask": "0x1"
> +  },
> +  {
> +    "EventName": "fp_sched_empty",
> +    "EventCode": "0x01",
> +    "BriefDescription": "This is a speculative event. The number of cycles 
> in which the FPU scheduler is empty. Note that some Ops like FP loads bypass 
> the scheduler."
> +  },
> 
> For fp_retx86_fp_ops, would it be possible to have a setting for all event in 
> addition to the individual flags?
> 
> +  {
> +    "EventName": "fp_retx87_fp_ops.div_sqr_r_ops",
> +    "EventCode": "0x02",
> +    "BriefDescription": "Divide and square root Ops.",
> +    "PublicDescription": "The number of x87 floating-point Ops that have 
> retired. The number of events logged per cycle can vary from 0 to 8. Divide 
> and square root Ops.",
> +    "UMask": "0x4"
> +  },
> +  {
> +    "EventName": "fp_retx87_fp_ops.mul_ops",
> +    "EventCode": "0x02",
> +    "BriefDescription": "Multiply Ops.",
> +    "PublicDescription": "The number of x87 floating-point Ops that have 
> retired. The number of events logged per cycle can vary from 0 to 8. Multiply 
> Ops.",
> +    "UMask": "0x2"
> +  },
> +  {
> +    "EventName": "fp_retx87_fp_ops.add_sub_ops",
> +    "EventCode": "0x02",
> +    "BriefDescription": "Add/subtract Ops.",
> +    "PublicDescription": "The number of x87 floating-point Ops that have 
> retired. The number of events logged per cycle can vary from 0 to 8. 
> Add/subtract Ops.",
> +    "UMask": "0x1"
> +  },
> 
> For fp_ret_sse_avx_ops, would like to have a umask setting for all the events 
> sub events it can measure.
> 
> +  {
> +    "EventName": "fp_ret_sse_avx_ops.dp_mult_add_flops",
> +    "EventCode": "0x03",
> +    "BriefDescription": "Double precision multiply-add FLOPS. Multiply-add 
> counts as 2 FLOPS.",
> +    "PublicDescription": "This is a retire-based event. The number of 
> retired SSE/AVX FLOPS. The number of events logged per cycle can vary from 0 
> to 64. This event can count above 15. Double precision multiply-add FLOPS. 
> Multiply-add counts as 2 FLOPS.",
> +    "UMask": "0x80"
> +  },
> +  {
> +    "EventName": "fp_ret_sse_avx_ops.dp_div_flops",
> +    "EventCode": "0x03",
> +    "BriefDescription": "Double precision divide/square root FLOPS.",
> +    "PublicDescription": "This is a retire-based event. The number of 
> retired SSE/AVX FLOPS. The number of events logged per cycle can vary from 0 
> to 64. This event can count above 15. Double precision divide/square root 
> FLOPS.",
> +    "UMask": "0x40"
> +  },
> +  {
> +    "EventName": "fp_ret_sse_avx_ops.dp_mult_flops",
> +    "EventCode": "0x03",
> +    "BriefDescription": "Double precision multiply FLOPS.",
> +    "PublicDescription": "This is a retire-based event. The number of 
> retired SSE/AVX FLOPS. The number of events logged per cycle can vary from 0 
> to 64. This event can count above 15. Double precision multiply FLOPS.",
> +    "UMask": "0x20"
> +  },
> +  {
> +    "EventName": "fp_ret_sse_avx_ops.dp_add_sub_flops",
> +    "EventCode": "0x03",
> +    "BriefDescription": "Double precision add/subtract FLOPS.",
> +    "PublicDescription": "This is a retire-based event. The number of 
> retired SSE/AVX FLOPS. The number of events logged per cycle can vary from 0 
> to 64. This event can count above 15. Double precision add/subtract FLOPS.",
> +    "UMask": "0x10"
> +  },
> +  {
> +    "EventName": "fp_ret_sse_avx_ops.sp_mult_add_flops",
> +    "EventCode": "0x03",
> +    "BriefDescription": "Single precision multiply-add FLOPS. Multiply-add 
> counts as 2 FLOPS.",
> +    "PublicDescription": "This is a retire-based event. The number of 
> retired SSE/AVX FLOPS. The number of events logged per cycle can vary from 0 
> to 64. This event can count above 15. Single precision multiply-add FLOPS. 
> Multiply-add counts as 2 FLOPS.",
> +    "UMask": "0x8"
> +  },
> +  {
> +    "EventName": "fp_ret_sse_avx_ops.sp_div_flops",
> +    "EventCode": "0x03",
> +    "BriefDescription": "Single-precision divide/square root FLOPS.",
> +    "PublicDescription": "This is a retire-based event. The number of 
> retired SSE/AVX FLOPS. The number of events logged per cycle can vary from 0 
> to 64. This event can count above 15. Single-precision divide/square root 
> FLOPS.",
> +    "UMask": "0x4"
> +  },
> +  {
> +    "EventName": "fp_ret_sse_avx_ops.sp_mult_flops",
> +    "EventCode": "0x03",
> +    "BriefDescription": "Single-precision multiply FLOPS.",
> +    "PublicDescription": "This is a retire-based event. The number of 
> retired SSE/AVX FLOPS. The number of events logged per cycle can vary from 0 
> to 64. This event can count above 15. Single-precision multiply FLOPS.",
> +    "UMask": "0x2"
> +  },
> +  {
> +    "EventName": "fp_ret_sse_avx_ops.sp_add_sub_flops",
> +    "EventCode": "0x03",
> +    "BriefDescription": "Single-precision add/subtract FLOPS.",
> +    "PublicDescription": "This is a retire-based event. The number of 
> retired SSE/AVX FLOPS. The number of events logged per cycle can vary from 0 
> to 64. This event can count above 15. Single-precision add/subtract FLOPS.",
> +    "UMask": "0x1"
> +  },
> +  {
> +    "EventName": "fp_num_mov_elim_scal_op.optimized",
> +    "EventCode": "0x04",
> +    "BriefDescription": "Number of Scalar Ops optimized.",
> +    "PublicDescription": "This is a dispatch based speculative event, and is 
> useful for measuring the effectiveness of the Move elimination and Scalar 
> code optimization schemes. Number of Scalar Ops optimized.",
> +    "UMask": "0x8"
> +  },
> +  {
> +    "EventName": "fp_num_mov_elim_scal_op.opt_potential",
> +    "EventCode": "0x04",
> +    "BriefDescription": "Number of Ops that are candidates for optimization 
> (have Z-bit either set or pass).",
> +    "PublicDescription": "This is a dispatch based speculative event, and is 
> useful for measuring the effectiveness of the Move elimination and Scalar 
> code optimization schemes. Number of Ops that are candidates for optimization 
> (have Z-bit either set or pass).",
> +    "UMask": "0x4"
> +  },
> +  {
> +    "EventName": "fp_num_mov_elim_scal_op.sse_mov_ops_elim",
> +    "EventCode": "0x04",
> +    "BriefDescription": "Number of SSE Move Ops eliminated.",
> +    "PublicDescription": "This is a dispatch based speculative event, and is 
> useful for measuring the effectiveness of the Move elimination and Scalar 
> code optimization schemes. Number of SSE Move Ops eliminated.",
> +    "UMask": "0x2"
> +  },
> +  {
> +    "EventName": "fp_num_mov_elim_scal_op.sse_mov_ops",
> +    "EventCode": "0x04",
> +    "BriefDescription": "Number of SSE Move Ops.",
> +    "PublicDescription": "This is a dispatch based speculative event, and is 
> useful for measuring the effectiveness of the Move elimination and Scalar 
> code optimization schemes. Number of SSE Move Ops.",
> +    "UMask": "0x1"
> +  },
> +  {
> +    "EventName": "fp_retired_ser_ops.x87_ctrl_ret",
> +    "EventCode": "0x05",
> +    "BriefDescription": "x87 control word mispredict traps due to 
> mispredictions in RC or PC, or changes in mask bits.",
> +    "PublicDescription": "The number of serializing Ops retired. x87 control 
> word mispredict traps due to mispredictions in RC or PC, or changes in mask 
> bits.",
> +    "UMask": "0x8"
> +  },
> +  {
> +    "EventName": "fp_retired_ser_ops.x87_bot_ret",
> +    "EventCode": "0x05",
> +    "BriefDescription": "x87 bottom-executing uOps retired.",
> +    "PublicDescription": "The number of serializing Ops retired. x87 
> bottom-executing uOps retired.",
> +    "UMask": "0x4"
> +  },
> +  {
> +    "EventName": "fp_retired_ser_ops.sse_ctrl_ret",
> +    "EventCode": "0x05",
> +    "BriefDescription": "SSE control word mispredict traps due to 
> mispredictions in RC, FTZ or DAZ, or changes in mask bits.",
> +    "PublicDescription": "The number of serializing Ops retired. SSE control 
> word mispredict traps due to mispredictions in RC, FTZ or DAZ, or changes in 
> mask bits.",
> +    "UMask": "0x2"
> +  },
> +  {
> +    "EventName": "fp_retired_ser_ops.sse_bot_ret",
> +    "EventCode": "0x05",
> +    "BriefDescription": "SSE bottom-executing uOps retired.",
> +    "PublicDescription": "The number of serializing Ops retired. SSE 
> bottom-executing uOps retired.",
> +    "UMask": "0x1"
> +  }
> +]
> \ No newline at end of file
> diff --git a/tools/perf/pmu-events/arch/x86/amdfam17h/memory.json 
> b/tools/perf/pmu-events/arch/x86/amdfam17h/memory.json
> new file mode 100644
> index 000000000000..15678880f90b
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/x86/amdfam17h/memory.json
> @@ -0,0 +1,225 @@
> +[
> 
> Is "Unit Masks ORed." really the description for ls_locks.*?  That looks 
> documentation error in the AMD manual.

Definitelly.

> 
> +  {
> +    "EventName": "ls_locks.spec_lock_map_commit",
> +    "EventCode": "0x25",
> +    "BriefDescription": "Unit Masks ORed.",
> +    "PublicDescription": "Unit Masks ORed.",
> +    "UMask": "0x8"
> +  },
> +  {
> +    "EventName": "ls_locks.spec_lock",
> +    "EventCode": "0x25",
> +    "BriefDescription": "Unit Masks ORed.",
> +    "PublicDescription": "Unit Masks ORed.",
> +    "UMask": "0x4"
> +  },
> +  {
> +    "EventName": "ls_locks.non_spec_lock",
> +    "EventCode": "0x25",
> +    "BriefDescription": "Unit Masks ORed.",
> +    "PublicDescription": "Unit Masks ORed.",
> +    "UMask": "0x2"
> +  },
> +  {
> +    "EventName": "ls_locks.bus_lock",
> +    "EventCode": "0x25",
> +    "BriefDescription": "Unit Masks ORed.",
> +    "PublicDescription": "Unit Masks ORed.",
> +    "UMask": "0x1"
> +  },
> +  {
> +    "EventName": "ls_dispatch.ld_st_dispatch",
> +    "EventCode": "0x29",
> +    "BriefDescription": "Load-op-Stores.",
> +    "PublicDescription": "Counts the number of operations dispatched to the 
> LS unit. Unit Masks ADDed. Load-op-Stores.",
> +    "UMask": "0x4"
> +  },
> +  {
> +    "EventName": "ls_dispatch.store_dispatch",
> +    "EventCode": "0x29",
> +    "BriefDescription": "Counts the number of operations dispatched to the 
> LS unit. Unit Masks ADDed.",
> +    "PublicDescription": "Counts the number of operations dispatched to the 
> LS unit. Unit Masks ADDed.",
> +    "UMask": "0x2"
> +  },
> +  {
> +    "EventName": "ls_dispatch.ld_dispatch",
> +    "EventCode": "0x29",
> +    "BriefDescription": "Counts the number of operations dispatched to the 
> LS unit. Unit Masks ADDed.",
> +    "PublicDescription": "Counts the number of operations dispatched to the 
> LS unit. Unit Masks ADDed.",
> +    "UMask": "0x1"
> +  },
> +  {
> +    "EventName": "ls_stlf",
> +    "EventCode": "0x35",
> +    "BriefDescription": "Number of STLF hits."
> +  },
> +  {
> +    "EventName": "ls_dc_accesses",
> +    "EventCode": "0x40",
> +    "BriefDescription": "The number of accesses to the data cache for load 
> and store references. This may include certain microcode scratchpad accesses, 
> although these are generally rare. Each increment represents an eight-byte 
> access, although the instruction may only be accessing a portion of that. 
> This event is a speculative event."
> +  },
> 
> Shouldn't there be some variation in the description of the 
> ls_mab_alloc_pipe.* events with the different unit masks?
> 
> +  {
> +    "EventName": "ls_mab_alloc_pipe.tlb_pipe_early",
> +    "EventCode": "0x41",
> +    "BriefDescription": "MAB Allocation by Pipe.",
> +    "PublicDescription": "MAB Allocation by Pipe.",
> +    "UMask": "0x10"
> +  },
> +  {
> +    "EventName": "ls_mab_alloc_pipe.hw_pf",
> +    "EventCode": "0x41",
> +    "BriefDescription": "MAB Allocation by Pipe.",
> +    "PublicDescription": "MAB Allocation by Pipe.",
> +    "UMask": "0x8"
> +  },
> +  {
> +    "EventName": "ls_mab_alloc_pipe.tlb_pipe_late",
> +    "EventCode": "0x41",
> +    "BriefDescription": "MAB Allocation by Pipe.",
> +    "PublicDescription": "MAB Allocation by Pipe.",
> +    "UMask": "0x4"
> +  },
> +  {
> +    "EventName": "ls_mab_alloc_pipe.st_pipe",
> +    "EventCode": "0x41",
> +    "BriefDescription": "MAB Allocation by Pipe.",
> +    "PublicDescription": "MAB Allocation by Pipe.",
> +    "UMask": "0x2"
> +  },
> +  {
> +    "EventName": "ls_mab_alloc_pipe.data_pipe",
> +    "EventCode": "0x41",
> +    "BriefDescription": "MAB Allocation by Pipe.",
> +    "PublicDescription": "MAB Allocation by Pipe.",
> +    "UMask": "0x1"
> +  },
> 
> Shouldn't the descriptions ls_l1_d_tlb_miss.* mention the different page 
> sizes that the different unit masks refer to?  Also would it be possible to 
> have an entry count all variations of ls_l1_d_tlb_miss?
> 
> +  {
> +    "EventName": "ls_l1_d_tlb_miss.tlb_reload1_gl2_miss",
> +    "EventCode": "0x45",
> +    "BriefDescription": "L1 DTLB Miss.",
> +    "PublicDescription": "L1 DTLB Miss.",
> +    "UMask": "0x80"
> +  },
> +  {
> +    "EventName": "ls_l1_d_tlb_miss.tlb_reload2_ml2_miss",
> +    "EventCode": "0x45",
> +    "BriefDescription": "L1 DTLB Miss.",
> +    "PublicDescription": "L1 DTLB Miss.",
> +    "UMask": "0x40"
> +  },
> +  {
> +    "EventName": "ls_l1_d_tlb_miss.tlb_reload32_kl2_miss",
> +    "EventCode": "0x45",
> +    "BriefDescription": "L1 DTLB Miss.",
> +    "PublicDescription": "L1 DTLB Miss.",
> +    "UMask": "0x20"
> +  },
> +  {
> +    "EventName": "ls_l1_d_tlb_miss.tlb_reload4_kl2_miss",
> +    "EventCode": "0x45",
> +    "BriefDescription": "L1 DTLB Miss.",
> +    "PublicDescription": "L1 DTLB Miss.",
> +    "UMask": "0x10"
> +  },
> +  {
> +    "EventName": "ls_l1_d_tlb_miss.tlb_reload1_gl2_hit",
> +    "EventCode": "0x45",
> +    "BriefDescription": "L1 DTLB Miss.",
> +    "PublicDescription": "L1 DTLB Miss.",
> +    "UMask": "0x8"
> +  },
> +  {
> +    "EventName": "ls_l1_d_tlb_miss.tlb_reload2_ml2_hit",
> +    "EventCode": "0x45",
> +    "BriefDescription": "L1 DTLB Miss.",
> +    "PublicDescription": "L1 DTLB Miss.",
> +    "UMask": "0x4"
> +  },
> +  {
> +    "EventName": "ls_l1_d_tlb_miss.tlb_reload32_kl2_hit",
> +    "EventCode": "0x45",
> +    "BriefDescription": "L1 DTLB Miss.",
> +    "PublicDescription": "L1 DTLB Miss.",
> +    "UMask": "0x2"
> +  },
> +  {
> +    "EventName": "ls_l1_d_tlb_miss.tlb_reload4_kl2_hit",
> +    "EventCode": "0x45",
> +    "BriefDescription": "L1 DTLB Miss.",
> +    "PublicDescription": "L1 DTLB Miss.",
> +    "UMask": "0x1"
> +  },
> 
> Would it be possible to have a setting for ls_tablewalker.*iside* and another 
> setting for *dside*?
> 
> +  {
> +    "EventName": "ls_tablewalker.perf_mon_tablewalk_alloc_iside1",
> +    "EventCode": "0x46",
> +    "BriefDescription": "Tablewalker allocation.",
> +    "PublicDescription": "Tablewalker allocation.",
> +    "UMask": "0x8"
> +  },
> +  {
> +    "EventName": "ls_tablewalker.perf_mon_tablewalk_alloc_iside0",
> +    "EventCode": "0x46",
> +    "BriefDescription": "Tablewalker allocation.",
> +    "PublicDescription": "Tablewalker allocation.",
> +    "UMask": "0x4"
> +  },
> +  {
> +    "EventName": "ls_tablewalker.perf_mon_tablewalk_alloc_dside1",
> +    "EventCode": "0x46",
> +    "BriefDescription": "Tablewalker allocation.",
> +    "PublicDescription": "Tablewalker allocation.",
> +    "UMask": "0x2"
> +  },
> +  {
> +    "EventName": "ls_tablewalker.perf_mon_tablewalk_alloc_dside0",
> +    "EventCode": "0x46",
> +    "BriefDescription": "Tablewalker allocation.",
> +    "PublicDescription": "Tablewalker allocation.",
> +    "UMask": "0x1"
> +  },
> +  {
> +    "EventName": "ls_misal_accesses",
> +    "EventCode": "0x47",
> +    "BriefDescription": "Misaligned loads."
> +  },
> 
> 
> The descriptions for ls_pref_instr_disp.prefetch_nta and store_prefetch_w 
> should have some differences.

Again I will encorporate event name into the description.

Anyway the documentation from AMD can be definitelly improved. We have some 
communication channel with the
company and I can address these missing descriptions. But I can't guarantee 
when they will reply.

Would it be possible to get in a version that will resolve your comments based 
on my best
knowledge in order to address the missing documentation?

Thanks,
Martin

> 
> +  {
> +    "EventName": "ls_pref_instr_disp.prefetch_nta",
> +    "EventCode": "0x4b",
> +    "BriefDescription": "Software Prefetch Instructions Dispatched.",
> +    "PublicDescription": "Software Prefetch Instructions Dispatched.",
> +    "UMask": "0x4"
> +  },
> +  {
> +    "EventName": "ls_pref_instr_disp.store_prefetch_w",
> +    "EventCode": "0x4b",
> +    "BriefDescription": "Software Prefetch Instructions Dispatched.",
> +    "PublicDescription": "Software Prefetch Instructions Dispatched.",
> +    "UMask": "0x2"
> +  },
> +  {
> +    "EventName": "ls_pref_instr_disp.load_prefetch_w",
> +    "EventCode": "0x4b",
> +    "BriefDescription": "Prefetch, Prefetch_T0_T1_T2.",
> +    "PublicDescription": "Software Prefetch Instructions Dispatched. 
> Prefetch, Prefetch_T0_T1_T2.",
> +    "UMask": "0x1"
> +  },
> +  {
> +    "EventName": "ls_inef_sw_pref.mab_mch_cnt",
> +    "EventCode": "0x52",
> +    "BriefDescription": "The number of software prefetches that did not 
> fetch data outside of the processor core.",
> +    "PublicDescription": "The number of software prefetches that did not 
> fetch data outside of the processor core.",
> +    "UMask": "0x2"
> +  },
> +  {
> +    "EventName": "ls_inef_sw_pref.data_pipe_sw_pf_dc_hit",
> +    "EventCode": "0x52",
> +    "BriefDescription": "The number of software prefetches that did not 
> fetch data outside of the processor core.",
> +    "PublicDescription": "The number of software prefetches that did not 
> fetch data outside of the processor core.",
> +    "UMask": "0x1"
> +  },
> +  {
> +    "EventName": "ls_not_halted_cyc",
> +    "EventCode": "0x76",
> +    "BriefDescription": "Cycles not in Halt."
> +  }
> +]
> \ No newline at end of file
> diff --git a/tools/perf/pmu-events/arch/x86/amdfam17h/other.json 
> b/tools/perf/pmu-events/arch/x86/amdfam17h/other.json
> new file mode 100644
> index 000000000000..03fa0d97ad3d
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/x86/amdfam17h/other.json
> @@ -0,0 +1,51 @@
> +[
> +  {
> +    "EventName": "de_dis_dispatch_token_stalls0.retire_token_stall",
> +    "EventCode": "0xaf",
> +    "BriefDescription": "RETIRE Tokens unavailable.",
> +    "PublicDescription": "Cycles where a dispatch group is valid but does 
> not get dispatched due to a token stall. RETIRE Tokens unavailable.",
> +    "UMask": "0x40"
> +  },
> +  {
> +    "EventName": "de_dis_dispatch_token_stalls0.agsq_token_stall",
> +    "EventCode": "0xaf",
> +    "BriefDescription": "AGSQ Tokens unavailable.",
> +    "PublicDescription": "Cycles where a dispatch group is valid but does 
> not get dispatched due to a token stall. AGSQ Tokens unavailable.",
> +    "UMask": "0x20"
> +  },
> +  {
> +    "EventName": "de_dis_dispatch_token_stalls0.alu_token_stall",
> +    "EventCode": "0xaf",
> +    "BriefDescription": "ALU tokens total unavailable.",
> +    "PublicDescription": "Cycles where a dispatch group is valid but does 
> not get dispatched due to a token stall. ALU tokens total unavailable.",
> +    "UMask": "0x10"
> +  },
> +  {
> +    "EventName": "de_dis_dispatch_token_stalls0.alsq3_0_token_stall",
> +    "EventCode": "0xaf",
> +    "BriefDescription": "Cycles where a dispatch group is valid but does not 
> get dispatched due to a token stall.",
> +    "PublicDescription": "Cycles where a dispatch group is valid but does 
> not get dispatched due to a token stall.",
> +    "UMask": "0x8"
> +  },
> +  {
> +    "EventName": "de_dis_dispatch_token_stalls0.alsq3_token_stall",
> +    "EventCode": "0xaf",
> +    "BriefDescription": "ALSQ 3 Tokens unavailable.",
> +    "PublicDescription": "Cycles where a dispatch group is valid but does 
> not get dispatched due to a token stall. ALSQ 3 Tokens unavailable.",
> +    "UMask": "0x4"
> +  },
> +  {
> +    "EventName": "de_dis_dispatch_token_stalls0.alsq2_token_stall",
> +    "EventCode": "0xaf",
> +    "BriefDescription": "ALSQ 2 Tokens unavailable.",
> +    "PublicDescription": "Cycles where a dispatch group is valid but does 
> not get dispatched due to a token stall. ALSQ 2 Tokens unavailable.",
> +    "UMask": "0x2"
> +  },
> +  {
> +    "EventName": "de_dis_dispatch_token_stalls0.alsq1_token_stall",
> +    "EventCode": "0xaf",
> +    "BriefDescription": "ALSQ 1 Tokens unavailable.",
> +    "PublicDescription": "Cycles where a dispatch group is valid but does 
> not get dispatched due to a token stall. ALSQ 1 Tokens unavailable.",
> +    "UMask": "0x1"
> +  }
> +]
> \ No newline at end of file
> diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv 
> b/tools/perf/pmu-events/arch/x86/mapfile.csv
> index 7e3cce3bcf3b..4e0973c08a52 100644
> --- a/tools/perf/pmu-events/arch/x86/mapfile.csv
> +++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
> @@ -32,3 +32,4 @@ GenuineIntel-6-2C,v2,westmereep-dp,core
>  GenuineIntel-6-25,v2,westmereep-sp,core
>  GenuineIntel-6-2F,v2,westmereex,core
>  GenuineIntel-6-55,v1,skylakex,core
> +AuthenticAMD-23-[[:xdigit:]]+,v1,amdfam17h,core
> 
> 
> --------------DD285E7CC6B09B0E203385F4--
>

Re: [PATCH] AMD perf PMU events for AMD Family 17h.

Reply via email to