Hi David,

On 14.07.2020 9:23, David E. Box wrote:
> PMT Telemetry is a capability of the Intel Platform Monitoring Technology.
> The Telemetry capability provides access to device telemetry metrics that
> provide hardware performance data to users from continuous, memory mapped,
> read-only register spaces.
> 
> Register mappings are not provided by the driver. Instead, a GUID is read
> from a header for each endpoint. The GUID identifies the device and is to
> be used with an XML, provided by the vendor, to discover the available set
> of metrics and their register mapping.  This allows firmware updates to
> modify the register space without needing to update the driver every time
> with new mappings. Firmware writes a new GUID in this case to specify the
> new mapping.  Software tools with access to the associated XML file can
> then interpret the changes.
> 
> This module manages access to all PMT Telemetry endpoints on a system,
> independent of the device exporting them. It creates a pmt_telemetry class
> to manage the devices. For each telemetry endpoint, sysfs files provide
> GUID and size information as well as a pointer to the parent device the
> telemetry came from. Software may discover the association between
> endpoints and devices by iterating through the list in sysfs, or by looking
> for the existence of the class folder under the device of interest.  A
> device node of the same name allows software to then map the telemetry
> space for direct access.
> 
> This patch also creates an pci device id list for early telemetry hardware
> that requires workarounds for known issues.
> 
> Signed-off-by: David E. Box <david.e....@linux.intel.com>
> Signed-off-by: Alexander Duyck <alexander.h.du...@linux.intel.com>
> ---
>  .../ABI/testing/sysfs-class-pmt_telemetry     |  46 ++
>  MAINTAINERS                                   |   1 +
>  drivers/platform/x86/Kconfig                  |  10 +
>  drivers/platform/x86/Makefile                 |   1 +
>  drivers/platform/x86/intel_pmt_telemetry.c    | 454 ++++++++++++++++++
>  5 files changed, 512 insertions(+)
>  create mode 100644 Documentation/ABI/testing/sysfs-class-pmt_telemetry
>  create mode 100644 drivers/platform/x86/intel_pmt_telemetry.c
> 
> diff --git a/Documentation/ABI/testing/sysfs-class-pmt_telemetry 
> b/Documentation/ABI/testing/sysfs-class-pmt_telemetry
> new file mode 100644
> index 000000000000..381924549ecb
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-class-pmt_telemetry
> @@ -0,0 +1,46 @@
> +What:                /sys/class/pmt_telemetry/
> +Date:                July 2020
> +KernelVersion:       5.9
> +Contact:     David Box <david.e....@linux.intel.com>
> +Description:
> +             The pmt_telemetry/ class directory contains information for
> +             devices that expose hardware telemetry using Intel Platform
> +             Monitoring Technology (PMT)
> +
> +What:                /sys/class/pmt_telemetry/telem<x>
> +Date:                July 2020
> +KernelVersion:       5.9
> +Contact:     David Box <david.e....@linux.intel.com>
> +Description:
> +             The telem<x> directory contains files describing an instance of
> +             a PMT telemetry device that exposes hardware telemetry. Each
> +             telem<x> directory has an associated /dev/telem<x> node. This
> +             node may be opened and mapped to access the telemetry space of
> +             the device. The register layout of the telemetry space is
> +             determined from an XML file that matches the pci device id and
> +             guid for the device.
> +
> +What:                /sys/class/pmt_telemetry/telem<x>/guid
> +Date:                July 2020
> +KernelVersion:       5.9
> +Contact:     David Box <david.e....@linux.intel.com>
> +Description:
> +             (RO) The guid for this telemetry device. The guid identifies
> +             the version of the XML file for the parent device that is to
> +             be used to get the register layout.
> +
> +What:                /sys/class/pmt_telemetry/telem<x>/size
> +Date:                July 2020
> +KernelVersion:       5.9
> +Contact:     David Box <david.e....@linux.intel.com>
> +Description:
> +             (RO) The size of telemetry region in bytes that corresponds to
> +             the mapping size for the /dev/telem<x> device node.
> +
> +What:                /sys/class/pmt_telemetry/telem<x>/offset
> +Date:                July 2020
> +KernelVersion:       5.9
> +Contact:     David Box <david.e....@linux.intel.com>
> +Description:
> +             (RO) The offset of telemetry region in bytes that corresponds to
> +             the mapping for the /dev/telem<x> device node.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 2e42bf0c41ab..ebc145894abd 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8849,6 +8849,7 @@ INTEL PMT DRIVER
>  M:   "David E. Box" <david.e....@linux.intel.com>
>  S:   Maintained
>  F:   drivers/mfd/intel_pmt.c
> +F:   drivers/platform/x86/intel_pmt_*
>  
>  INTEL PRO/WIRELESS 2100, 2200BG, 2915ABG NETWORK CONNECTION SUPPORT
>  M:   Stanislav Yakovlev <stas.yakov...@gmail.com>
> diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig
> index 0581a54cf562..5e1f7ce6e69f 100644
> --- a/drivers/platform/x86/Kconfig
> +++ b/drivers/platform/x86/Kconfig
> @@ -1396,6 +1396,16 @@ config INTEL_TELEMETRY
>         directly via debugfs files. Various tools may use
>         this interface for SoC state monitoring.
>  
> +config INTEL_PMT_TELEMETRY
> +     tristate "Intel Platform Monitoring Technology (PMT) Telemetry driver"
> +     help
> +      The Intel Platform Monitory Technology (PMT) Telemetry driver provides
> +      access to hardware telemetry metrics on devices that support the
> +      feature.
> +
> +      For more information, see
> +      <file:Documentation/ABI/testing/sysfs-class-intel_pmt_telem>
> +
>  endif # X86_PLATFORM_DEVICES
>  
>  config PMC_ATOM
> diff --git a/drivers/platform/x86/Makefile b/drivers/platform/x86/Makefile
> index 2b85852a1a87..95cd3d0be17f 100644
> --- a/drivers/platform/x86/Makefile
> +++ b/drivers/platform/x86/Makefile
> @@ -139,6 +139,7 @@ obj-$(CONFIG_INTEL_MFLD_THERMAL)  += intel_mid_thermal.o
>  obj-$(CONFIG_INTEL_MID_POWER_BUTTON) += intel_mid_powerbtn.o
>  obj-$(CONFIG_INTEL_MRFLD_PWRBTN)     += intel_mrfld_pwrbtn.o
>  obj-$(CONFIG_INTEL_PMC_CORE)         += intel_pmc_core.o 
> intel_pmc_core_pltdrv.o
> +obj-$(CONFIG_INTEL_PMT_TELEMETRY)    += intel_pmt_telemetry.o
>  obj-$(CONFIG_INTEL_PUNIT_IPC)                += intel_punit_ipc.o
>  obj-$(CONFIG_INTEL_SCU_IPC)          += intel_scu_ipc.o
>  obj-$(CONFIG_INTEL_SCU_PCI)          += intel_scu_pcidrv.o
> diff --git a/drivers/platform/x86/intel_pmt_telemetry.c 
> b/drivers/platform/x86/intel_pmt_telemetry.c
> new file mode 100644
> index 000000000000..e1856fc8c209
> --- /dev/null
> +++ b/drivers/platform/x86/intel_pmt_telemetry.c
> @@ -0,0 +1,454 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Intel Platform Monitory Technology Telemetry driver
> + *
> + * Copyright (c) 2020, Intel Corporation.
> + * All Rights Reserved.
> + *
> + * Author: "David E. Box" <david.e....@linux.intel.com>
> + */
> +
> +#include <linux/bits.h>
> +#include <linux/cdev.h>
> +#include <linux/io-64-nonatomic-lo-hi.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +#include <linux/platform_device.h>
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +#include <linux/uaccess.h>
> +#include <linux/xarray.h>
> +
> +#define TELEM_DEV_NAME               "pmt_telemetry"
> +
> +/* Telemetry access types */
> +#define TELEM_ACCESS_FUTURE  1
> +#define TELEM_ACCESS_BARID   2
> +#define TELEM_ACCESS_LOCAL   3
> +
> +#define TELEM_GUID_OFFSET    0x4
> +#define TELEM_BASE_OFFSET    0x8
> +#define TELEM_TBIR_MASK              GENMASK(2, 0)
> +#define TELEM_ACCESS(v)              ((v) & GENMASK(3, 0))
> +#define TELEM_TYPE(v)                (((v) & GENMASK(7, 4)) >> 4)
> +/* size is in bytes */
> +#define TELEM_SIZE(v)                (((v) & GENMASK(27, 12)) >> 10)
> +
> +#define TELEM_XA_START               0
> +#define TELEM_XA_MAX         INT_MAX
> +#define TELEM_XA_LIMIT               XA_LIMIT(TELEM_XA_START, TELEM_XA_MAX)
> +
> +/* Used by client hardware to identify a fixed telemetry entry*/
> +#define TELEM_CLIENT_FIXED_BLOCK_GUID        0x10000000
> +
> +static DEFINE_XARRAY_ALLOC(telem_array);
> +
> +struct pmt_telem_priv;
> +
> +struct telem_header {
> +     u8      access_type;
> +     u8      telem_type;
> +     u16     size;
> +     u32     guid;
> +     u32     base_offset;
> +     u8      tbir;
> +};
> +
> +struct pmt_telem_entry {
> +     struct pmt_telem_priv           *priv;
> +     struct telem_header             header;
> +     struct resource                 *header_res;
> +     unsigned long                   base_addr;
> +     void __iomem                    *disc_table;
> +     struct cdev                     cdev;
> +     dev_t                           devt;
> +     int                             devid;
> +};
> +
> +struct pmt_telem_priv {
> +     struct pmt_telem_entry          *entry;
> +     int                             num_entries;
> +     struct device                   *dev;
> +};
> +
> +/*
> + * devfs
> + */
> +static int pmt_telem_open(struct inode *inode, struct file *filp)
> +{
> +     struct pmt_telem_priv *priv;
> +     struct pmt_telem_entry *entry;
> +     struct pci_driver *pci_drv;
> +     struct pci_dev *pci_dev;
> +
> +     if (!capable(CAP_SYS_ADMIN))

Thanks for supplying these patches.
Are there any reasons not to expose this feature to CAP_PERFMON privileged
processes too that currently have access to performance monitoring features
of the kernel without root/CAP_SYS_ADMIN credentials? This could be done by
pefmon_capable() function call starting from v5.8+.

Thanks,
Alexei

> +             return -EPERM;
> +
> +     entry = container_of(inode->i_cdev, struct pmt_telem_entry, cdev);
> +     priv = entry->priv;
> +     pci_dev = to_pci_dev(priv->dev->parent);
> +
> +     pci_drv = pci_dev_driver(pci_dev);
> +     if (!pci_drv)
> +             return -ENODEV;
> +
> +     filp->private_data = entry;
> +     get_device(&pci_dev->dev);
> +
> +     if (!try_module_get(pci_drv->driver.owner)) {
> +             put_device(&pci_dev->dev);
> +             return -ENODEV;
> +     }
> +
> +     return 0;
> +}
> +
> +static int pmt_telem_release(struct inode *inode, struct file *filp)
> +{
> +     struct pmt_telem_entry *entry = filp->private_data;
> +     struct pci_dev *pci_dev = to_pci_dev(entry->priv->dev->parent);
> +     struct pci_driver *pci_drv = pci_dev_driver(pci_dev);
> +
> +     put_device(&pci_dev->dev);
> +     module_put(pci_drv->driver.owner);
> +
> +     return 0;
> +}
> +
> +static int pmt_telem_mmap(struct file *filp, struct vm_area_struct *vma)
> +{
> +     struct pmt_telem_entry *entry = filp->private_data;
> +     struct pmt_telem_priv *priv;
> +     unsigned long vsize = vma->vm_end - vma->vm_start;
> +     unsigned long phys = entry->base_addr;
> +     unsigned long pfn = PFN_DOWN(phys);
> +     unsigned long psize;
> +
> +     priv = entry->priv;
> +     psize = (PFN_UP(entry->base_addr + entry->header.size) - pfn) *
> +             PAGE_SIZE;
> +     if (vsize > psize) {
> +             dev_err(priv->dev, "Requested mmap size is too large\n");
> +             return -EINVAL;
> +     }
> +
> +     if ((vma->vm_flags & VM_WRITE) || (vma->vm_flags & VM_MAYWRITE))
> +             return -EPERM;
> +
> +     vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> +
> +     if (io_remap_pfn_range(vma, vma->vm_start, pfn, vsize,
> +                            vma->vm_page_prot))
> +             return -EINVAL;
> +
> +     return 0;
> +}
> +
> +static const struct file_operations pmt_telem_fops = {
> +     .owner =        THIS_MODULE,
> +     .open =         pmt_telem_open,
> +     .mmap =         pmt_telem_mmap,
> +     .release =      pmt_telem_release,
> +};
> +
> +/*
> + * sysfs
> + */
> +static ssize_t guid_show(struct device *dev, struct device_attribute *attr,
> +                      char *buf)
> +{
> +     struct pmt_telem_entry *entry = dev_get_drvdata(dev);
> +
> +     return sprintf(buf, "0x%x\n", entry->header.guid);
> +}
> +static DEVICE_ATTR_RO(guid);
> +
> +static ssize_t size_show(struct device *dev, struct device_attribute *attr,
> +                      char *buf)
> +{
> +     struct pmt_telem_entry *entry = dev_get_drvdata(dev);
> +
> +     /* Display buffer size in bytes */
> +     return sprintf(buf, "%u\n", entry->header.size);
> +}
> +static DEVICE_ATTR_RO(size);
> +
> +static ssize_t offset_show(struct device *dev, struct device_attribute *attr,
> +                        char *buf)
> +{
> +     struct pmt_telem_entry *entry = dev_get_drvdata(dev);
> +
> +     /* Display buffer offset in bytes */
> +     return sprintf(buf, "%lu\n", offset_in_page(entry->base_addr));
> +}
> +static DEVICE_ATTR_RO(offset);
> +
> +static struct attribute *pmt_telem_attrs[] = {
> +     &dev_attr_guid.attr,
> +     &dev_attr_size.attr,
> +     &dev_attr_offset.attr,
> +     NULL
> +};
> +ATTRIBUTE_GROUPS(pmt_telem);
> +
> +struct class pmt_telem_class = {
> +     .owner  = THIS_MODULE,
> +     .name   = "pmt_telemetry",
> +     .dev_groups = pmt_telem_groups,
> +};
> +
> +/*
> + * driver initialization
> + */
> +static const struct pci_device_id pmt_telem_early_client_pci_ids[] = {
> +     { PCI_VDEVICE(INTEL, 0x9a0d) }, /* TGL */
> +     { }
> +};
> +
> +static bool pmt_telem_is_early_client_hw(struct device *dev)
> +{
> +     struct pci_dev *parent;
> +
> +     parent = to_pci_dev(dev->parent);
> +
> +     return !!pci_match_id(pmt_telem_early_client_pci_ids, parent);
> +}
> +
> +static int pmt_telem_create_dev(struct pmt_telem_priv *priv,
> +                             struct pmt_telem_entry *entry)
> +{
> +     struct pci_dev *pci_dev;
> +     struct device *dev;
> +     int ret;
> +
> +     cdev_init(&entry->cdev, &pmt_telem_fops);
> +     ret = cdev_add(&entry->cdev, entry->devt, 1);
> +     if (ret) {
> +             dev_err(priv->dev, "Could not add char dev\n");
> +             return ret;
> +     }
> +
> +     pci_dev = to_pci_dev(priv->dev->parent);
> +     dev = device_create(&pmt_telem_class, &pci_dev->dev, entry->devt,
> +                         entry, "telem%d", entry->devid);
> +     if (IS_ERR(dev)) {
> +             dev_err(priv->dev, "Could not create device node\n");
> +             cdev_del(&entry->cdev);
> +     }
> +
> +     return PTR_ERR_OR_ZERO(dev);
> +}
> +
> +static void pmt_telem_populate_header(void __iomem *disc_offset,
> +                                   struct telem_header *header)
> +{
> +     header->access_type = TELEM_ACCESS(readb(disc_offset));
> +     header->telem_type = TELEM_TYPE(readb(disc_offset));
> +     header->size = TELEM_SIZE(readl(disc_offset));
> +     header->guid = readl(disc_offset + TELEM_GUID_OFFSET);
> +     header->base_offset = readl(disc_offset + TELEM_BASE_OFFSET);
> +
> +     /*
> +      * For non-local access types the lower 3 bits of base offset
> +      * contains the index of the base address register where the
> +      * telemetry can be found.
> +      */
> +     header->tbir = header->base_offset & TELEM_TBIR_MASK;
> +     header->base_offset ^= header->tbir;
> +}
> +
> +static int pmt_telem_add_entry(struct pmt_telem_priv *priv,
> +                            struct pmt_telem_entry *entry)
> +{
> +     struct resource *res = entry->header_res;
> +     struct pci_dev *pci_dev = to_pci_dev(priv->dev->parent);
> +     int ret;
> +
> +     pmt_telem_populate_header(entry->disc_table, &entry->header);
> +
> +     /* Local access and BARID only for now */
> +     switch (entry->header.access_type) {
> +     case TELEM_ACCESS_LOCAL:
> +             if (entry->header.tbir) {
> +                     dev_err(priv->dev,
> +                             "Unsupported BAR index %d for access type %d\n",
> +                             entry->header.tbir, entry->header.access_type);
> +                     return -EINVAL;
> +             }
> +
> +             /*
> +              * For access_type LOCAL, the base address is as follows:
> +              * base address = header address + header length + base offset
> +              */
> +             entry->base_addr = res->start + resource_size(res) +
> +                                entry->header.base_offset;
> +             break;
> +
> +     case TELEM_ACCESS_BARID:
> +             entry->base_addr = pci_dev->resource[entry->header.tbir].start +
> +                                entry->header.base_offset;
> +             break;
> +
> +     default:
> +             dev_err(priv->dev, "Unsupported access type %d\n",
> +                     entry->header.access_type);
> +             return -EINVAL;
> +     }
> +
> +     ret = alloc_chrdev_region(&entry->devt, 0, 1, TELEM_DEV_NAME);
> +     if (ret) {
> +             dev_err(priv->dev,
> +                     "PMT telemetry chrdev_region error: %d\n", ret);
> +             return ret;
> +     }
> +
> +     ret = xa_alloc(&telem_array, &entry->devid, entry, TELEM_XA_LIMIT,
> +                    GFP_KERNEL);
> +     if (ret)
> +             goto fail_xa_alloc;
> +
> +     ret = pmt_telem_create_dev(priv, entry);
> +     if (ret)
> +             goto fail_create_dev;
> +
> +     entry->priv = priv;
> +     priv->num_entries++;
> +     return 0;
> +
> +fail_create_dev:
> +     xa_erase(&telem_array, entry->devid);
> +fail_xa_alloc:
> +     unregister_chrdev_region(entry->devt, 1);
> +
> +     return ret;
> +}
> +
> +static bool pmt_telem_region_overlaps(struct platform_device *pdev,
> +                                   void __iomem *disc_table)
> +{
> +     u32 guid;
> +
> +     guid = readl(disc_table + TELEM_GUID_OFFSET);
> +
> +     return guid == TELEM_CLIENT_FIXED_BLOCK_GUID;
> +}
> +
> +static void pmt_telem_remove_entries(struct pmt_telem_priv *priv)
> +{
> +     int i;
> +
> +     for (i = 0; i < priv->num_entries; i++) {
> +             device_destroy(&pmt_telem_class, priv->entry[i].devt);
> +             cdev_del(&priv->entry[i].cdev);
> +             xa_erase(&telem_array, priv->entry[i].devid);
> +             unregister_chrdev_region(priv->entry[i].devt, 1);
> +     }
> +}
> +
> +static int pmt_telem_probe(struct platform_device *pdev)
> +{
> +     struct pmt_telem_priv *priv;
> +     struct pmt_telem_entry *entry;
> +     bool early_hw;
> +     int i;
> +
> +     priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
> +     if (!priv)
> +             return -ENOMEM;
> +
> +     platform_set_drvdata(pdev, priv);
> +     priv->dev = &pdev->dev;
> +
> +     priv->entry = devm_kcalloc(&pdev->dev, pdev->num_resources,
> +                                sizeof(struct pmt_telem_entry), GFP_KERNEL);
> +     if (!priv->entry)
> +             return -ENOMEM;
> +
> +     if (pmt_telem_is_early_client_hw(&pdev->dev))
> +             early_hw = true;
> +
> +     for (i = 0, entry = priv->entry; i < pdev->num_resources;
> +          i++, entry++) {
> +             int ret;
> +
> +             entry->header_res = platform_get_resource(pdev, IORESOURCE_MEM,
> +                                                       i);
> +             if (!entry->header_res) {
> +                     pmt_telem_remove_entries(priv);
> +                     return -ENODEV;
> +             }
> +
> +             entry->disc_table = devm_platform_ioremap_resource(pdev, i);
> +             if (IS_ERR(entry->disc_table)) {
> +                     pmt_telem_remove_entries(priv);
> +                     return PTR_ERR(entry->disc_table);
> +             }
> +
> +             if (pmt_telem_region_overlaps(pdev, entry->disc_table) &&
> +                 early_hw)
> +                     continue;
> +
> +             ret = pmt_telem_add_entry(priv, entry);
> +             if (ret) {
> +                     pmt_telem_remove_entries(priv);
> +                     return ret;
> +             }
> +     }
> +
> +     return 0;
> +}
> +
> +static int pmt_telem_remove(struct platform_device *pdev)
> +{
> +     struct pmt_telem_priv *priv = platform_get_drvdata(pdev);
> +
> +     pmt_telem_remove_entries(priv);
> +
> +     return 0;
> +}
> +
> +static const struct platform_device_id pmt_telem_table[] = {
> +     {
> +             .name = "pmt_telemetry",
> +     },
> +     {}
> +};
> +MODULE_DEVICE_TABLE(platform, pmt_telem_table);
> +
> +static struct platform_driver pmt_telem_driver = {
> +     .driver = {
> +             .name   = TELEM_DEV_NAME,
> +     },
> +     .probe  = pmt_telem_probe,
> +     .remove = pmt_telem_remove,
> +     .id_table = pmt_telem_table,
> +};
> +
> +static int __init pmt_telem_init(void)
> +{
> +     int ret = class_register(&pmt_telem_class);
> +
> +     if (ret)
> +             return ret;
> +
> +     ret = platform_driver_register(&pmt_telem_driver);
> +     if (ret)
> +             class_unregister(&pmt_telem_class);
> +
> +     return ret;
> +}
> +module_init(pmt_telem_init);
> +
> +static void __exit pmt_telem_exit(void)
> +{
> +     platform_driver_unregister(&pmt_telem_driver);
> +     class_unregister(&pmt_telem_class);
> +     xa_destroy(&telem_array);
> +}
> +module_exit(pmt_telem_exit);
> +
> +MODULE_AUTHOR("David E. Box <david.e....@linux.intel.com>");
> +MODULE_DESCRIPTION("Intel PMT Telemetry driver");
> +MODULE_ALIAS("platform:" TELEM_DEV_NAME);
> +MODULE_LICENSE("GPL v2");
> 

Reply via email to