[PATCH] perf diff: bug fix, donot overwrite build id in dso__load

2016-09-13 Thread kan . liang
From: Kan Liang This patch fixes a perf diff regression issue which was introduced by: commit 5baecbcd9c9a ("perf symbols: we can now read separate debug-info files based on a build ID") The binary name could be same when perf diff different binaries. Build id is used to distingui

[RFC V2 PATCH 08/25] net/netpolicy: introduce NET policy object

2016-08-04 Thread kan . liang
From: Kan Liang This patch introduces the concept of NET policy object and policy object list. The NET policy object is the instance of CPU/queue mapping. The object can be shared between different tasks/sockets. So besides CPU and queue information, the object also maintains a reference

[RFC V2 PATCH 25/25] Documentation/networking: Document NET policy

2016-08-04 Thread kan . liang
From: Kan Liang Signed-off-by: Kan Liang --- Documentation/networking/netpolicy.txt | 157 + 1 file changed, 157 insertions(+) create mode 100644 Documentation/networking/netpolicy.txt diff --git a/Documentation/networking/netpolicy.txt b/Documentation

[RFC V2 PATCH 22/25] net/netpolicy: fast path for finding the queues

2016-08-04 Thread kan . liang
From: Kan Liang Current implementation searches the hash table to get assigned object for each transmit/receive packet. It's not necessory, because the assigned object usually remain unchanged. This patch store the assigned queue to speed up the searching process. But under certain situa

[RFC V2 PATCH 06/25] net/netpolicy: set and remove IRQ affinity

2016-08-04 Thread kan . liang
From: Kan Liang This patches introduces functions to set and remove IRQ affinity according to cpu and queue mapping. The functions will not record the previous affinity status. After a set/remove cycles, it will set the affinity on all online CPU with IRQ balance enabling. Signed-off-by: Kan

[RFC V2 PATCH 21/25] net/netpolicy: set per task policy by proc

2016-08-04 Thread kan . liang
From: Kan Liang Users may not want to change the source code to add per task net polic support. Or they may want to change a running task's net policy. prctl does not work for both cases. This patch adds an interface in /proc, which can be used to set and retrieve policy of already ru

[RFC V2 PATCH 09/25] net/netpolicy: set NET policy by policy name

2016-08-04 Thread kan . liang
From: Kan Liang User can write policy name to /proc/net/netpolicy/$DEV/policy to enable net policy for specific device. When the policy is enabled, the subsystem automatically disables IRQ balance and set IRQ affinity. The object list is also generated accordingly. It is device driver&#

[RFC V2 PATCH 18/25] net/netpolicy: set Tx queues according to policy

2016-08-04 Thread kan . liang
From: Kan Liang When the device tries to transmit a packet, netdev_pick_tx is called to find the available Tx queues. If the net policy is applied, it picks up the assigned Tx queue from net policy subsystem, and redirect the traffic to the assigned queue. Signed-off-by: Kan Liang --- include

[RFC V2 PATCH 24/25] net/netpolicy: limit the total record number

2016-08-04 Thread kan . liang
From: Kan Liang NET policy can not fulfill users request without limit, because of the security consideration and device limitation. For security consideration, the attacker may fake millions of per task/socket request to crash the system. For device limitation, the flow director rules number is

[RFC V2 PATCH 20/25] net/netpolicy: introduce per task net policy

2016-08-04 Thread kan . liang
From: Kan Liang Usually, application as a whole has specific requirement. Applying the net policy to all sockets one by one in the application is too complex. This patch introduces per task net policy to address this case. Once the per task net policy is applied, all the sockets in the

[RFC V2 PATCH 10/25] net/netpolicy: add three new NET policies

2016-08-04 Thread kan . liang
From: Kan Liang Introduce three NET policies CPU policy: configure for higher throughput and lower CPU% (power saving). BULK policy: configure for highest throughput. LATENCY policy: configure for lowest latency. Signed-off-by: Kan Liang --- include/linux/netpolicy.h | 3 +++ net/core

[RFC V2 PATCH 19/25] net/netpolicy: set Rx queues according to policy

2016-08-04 Thread kan . liang
From: Kan Liang For setting Rx queues, this patch configure Rx network flow classification rules to redirect the packets to the assigned queue. Since we may not get all the information required for rule until the first packet arrived, it will add the rule after recvmsg. Also, to avoid

[RFC V2 PATCH 15/25] net/netpolicy: implement netpolicy register

2016-08-04 Thread kan . liang
From: Kan Liang The socket/task can only be benefited when it register itself with specific policy. If it's the first time to register, a record will be created and inserted into RCU hash table. The record includes ptr, policy and object information. ptr is the socket/task's pointe

[RFC V2 PATCH 13/25] net/netpolicy: support CPU hotplug

2016-08-04 Thread kan . liang
From: Kan Liang For CPU hotplug, the NET policy subsystem will rebuild the sys map and object list. Signed-off-by: Kan Liang --- net/core/netpolicy.c | 76 1 file changed, 76 insertions(+) diff --git a/net/core/netpolicy.c b/net/core

[RFC V2 PATCH 07/25] net/netpolicy: enable and disable NET policy

2016-08-04 Thread kan . liang
From: Kan Liang This patch introduces functions to enable and disable NET policy. For enabling, it collects device and CPU information, setup CPU/queue mapping, and set IRQ affinity accordingly. For disabling, it removes the IRQ affinity and mapping information. np_lock should protect the

[RFC V2 PATCH 16/25] net/netpolicy: introduce per socket netpolicy

2016-08-04 Thread kan . liang
From: Kan Liang The network socket is the most basic unit which control the network traffic. This patch introduces a new socket option SO_NETPOLICY to set/get net policy for socket. so that the application can set its own policy on socket to improve the network performance. Per socket net policy

[RFC V2 PATCH 05/25] net/netpolicy: create CPU and queue mapping

2016-08-04 Thread kan . liang
From: Kan Liang Current implementation forces CPU and queue 1:1 mapping. This patch introduces the function netpolicy_update_sys_map to create this mapping. The result is stored in netpolicy_sys_info. If the CPU count and queue count are different, the remaining CPUs/queues are not used for now

[RFC V2 PATCH 17/25] net/netpolicy: introduce netpolicy_pick_queue

2016-08-04 Thread kan . liang
From: Kan Liang To achieve better network performance, the key step is to distribute the packets to dedicated queues according to policy and system run time status. This patch provides an interface which can return the proper dedicated queue for socket/task. Then the packets of the socket/task

[RFC V2 PATCH 14/25] net/netpolicy: handle channel changes

2016-08-04 Thread kan . liang
From: Kan Liang User can uses ethtool to set the channel number. This patch handles the channel changes by rebuilding the object list. Signed-off-by: Kan Liang --- include/linux/netpolicy.h | 8 net/core/ethtool.c| 8 +++- net/core/netpolicy.c | 1 + 3 files changed

[RFC V2 PATCH 00/25] Kernel NET policy

2016-08-04 Thread kan . liang
From: Kan Liang It is a big challenge to get good network performance. First, the network performance is not good with default system settings. Second, it is too difficult to do automatic tuning for all possible workloads, since workloads have different requirements. Some workloads may want high

[RFC V2 PATCH 12/25] net/netpolicy: NET device hotplug

2016-08-04 Thread kan . liang
From: Kan Liang Support NET device up/down/namechange in the NET policy code. Signed-off-by: Kan Liang --- net/core/netpolicy.c | 66 +--- 1 file changed, 58 insertions(+), 8 deletions(-) diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c

[RFC V2 PATCH 11/25] net/netpolicy: add MIX policy

2016-08-04 Thread kan . liang
From: Kan Liang MIX policy is combine of other policies. It allows different queue has different policy. If MIX policy is applied, /proc/net/netpolicy/$DEV/policy shows per queue policy. Usually, the workloads requires either high throughput or low latency. So for current implementation, MIX

[RFC V2 PATCH 01/25] net: introduce NET policy

2016-08-04 Thread kan . liang
From: Kan Liang This patch introduce NET policy subsystem. If proc is supported in the system, it creates netpolicy node in proc system. Signed-off-by: Kan Liang --- include/linux/netdevice.h | 7 +++ include/net/net_namespace.h | 3 ++ net/Kconfig | 7 +++ net/core

[RFC V2 PATCH 02/25] net/netpolicy: init NET policy

2016-08-04 Thread kan . liang
From: Kan Liang This patch tries to initialize NET policy for all the devices in the system. However, not all device drivers have NET policy support. For those drivers who does not have NET policy support, the node will not be showed in /proc/net/netpolicy/. The device driver who has NET policy

[RFC V2 PATCH 04/25] net/netpolicy: get CPU information

2016-08-04 Thread kan . liang
From: Kan Liang Net policy also needs to know CPU information. Currently, online CPU number is enough. Signed-off-by: Kan Liang --- net/core/netpolicy.c | 5 + 1 file changed, 5 insertions(+) diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c index 7c34c8a..075aaca 100644 --- a/net

[RFC V2 PATCH 03/25] net/netpolicy: get device queue irq information

2016-08-04 Thread kan . liang
From: Kan Liang Net policy needs to know device information. Currently, it's enough to only get irq information of rx and tx queues. This patch introduces ndo ops to do so, not ethtool ops. Because there are already several ways to get irq information in userspace. It's not necessory

[RFC V2 PATCH 23/25] net/netpolicy: optimize for queue pair

2016-08-04 Thread kan . liang
From: Kan Liang Some drivers like i40e driver does not support separate Tx and Rx queues as channels. Using Rx queue to stand for the channels, if queue_pair is set by driver. Signed-off-by: Kan Liang --- include/linux/netpolicy.h | 1 + net/core/netpolicy.c | 3 +++ 2 files changed, 4

[RFC V2 PATCH 03/25] net/netpolicy: get device queue irq information

2016-08-04 Thread kan . liang
From: Kan Liang Net policy needs to know device information. Currently, it's enough to only get irq information of rx and tx queues. This patch introduces ndo ops to do so, not ethtool ops. Because there are already several ways to get irq information in userspace. It's not necessory

[RFC V2 PATCH 08/25] net/netpolicy: introduce NET policy object

2016-08-04 Thread kan . liang
From: Kan Liang This patch introduces the concept of NET policy object and policy object list. The NET policy object is the instance of CPU/queue mapping. The object can be shared between different tasks/sockets. So besides CPU and queue information, the object also maintains a reference

[RFC V2 PATCH 14/25] net/netpolicy: handle channel changes

2016-08-04 Thread kan . liang
From: Kan Liang User can uses ethtool to set the channel number. This patch handles the channel changes by rebuilding the object list. Signed-off-by: Kan Liang --- include/linux/netpolicy.h | 8 net/core/ethtool.c| 8 +++- net/core/netpolicy.c | 1 + 3 files changed

[RFC V2 PATCH 19/25] net/netpolicy: set Rx queues according to policy

2016-08-04 Thread kan . liang
From: Kan Liang For setting Rx queues, this patch configure Rx network flow classification rules to redirect the packets to the assigned queue. Since we may not get all the information required for rule until the first packet arrived, it will add the rule after recvmsg. Also, to avoid

[RFC V2 PATCH 22/25] net/netpolicy: fast path for finding the queues

2016-08-04 Thread kan . liang
From: Kan Liang Current implementation searches the hash table to get assigned object for each transmit/receive packet. It's not necessory, because the assigned object usually remain unchanged. This patch store the assigned queue to speed up the searching process. But under certain situa

[RFC V2 PATCH 25/25] Documentation/networking: Document NET policy

2016-08-04 Thread kan . liang
From: Kan Liang Signed-off-by: Kan Liang --- Documentation/networking/netpolicy.txt | 157 + 1 file changed, 157 insertions(+) create mode 100644 Documentation/networking/netpolicy.txt diff --git a/Documentation/networking/netpolicy.txt b/Documentation

[RFC V2 PATCH 23/25] net/netpolicy: optimize for queue pair

2016-08-04 Thread kan . liang
From: Kan Liang Some drivers like i40e driver does not support separate Tx and Rx queues as channels. Using Rx queue to stand for the channels, if queue_pair is set by driver. Signed-off-by: Kan Liang --- include/linux/netpolicy.h | 1 + net/core/netpolicy.c | 3 +++ 2 files changed, 4

[RFC V2 PATCH 21/25] net/netpolicy: set per task policy by proc

2016-08-04 Thread kan . liang
From: Kan Liang Users may not want to change the source code to add per task net polic support. Or they may want to change a running task's net policy. prctl does not work for both cases. This patch adds an interface in /proc, which can be used to set and retrieve policy of already ru

[RFC V2 PATCH 24/25] net/netpolicy: limit the total record number

2016-08-04 Thread kan . liang
From: Kan Liang NET policy can not fulfill users request without limit, because of the security consideration and device limitation. For security consideration, the attacker may fake millions of per task/socket request to crash the system. For device limitation, the flow director rules number is

[RFC V2 PATCH 20/25] net/netpolicy: introduce per task net policy

2016-08-04 Thread kan . liang
From: Kan Liang Usually, application as a whole has specific requirement. Applying the net policy to all sockets one by one in the application is too complex. This patch introduces per task net policy to address this case. Once the per task net policy is applied, all the sockets in the

[RFC V2 PATCH 16/25] net/netpolicy: introduce per socket netpolicy

2016-08-04 Thread kan . liang
From: Kan Liang The network socket is the most basic unit which control the network traffic. This patch introduces a new socket option SO_NETPOLICY to set/get net policy for socket. so that the application can set its own policy on socket to improve the network performance. Per socket net policy

[RFC V2 PATCH 17/25] net/netpolicy: introduce netpolicy_pick_queue

2016-08-04 Thread kan . liang
From: Kan Liang To achieve better network performance, the key step is to distribute the packets to dedicated queues according to policy and system run time status. This patch provides an interface which can return the proper dedicated queue for socket/task. Then the packets of the socket/task

[RFC V2 PATCH 18/25] net/netpolicy: set Tx queues according to policy

2016-08-04 Thread kan . liang
From: Kan Liang When the device tries to transmit a packet, netdev_pick_tx is called to find the available Tx queues. If the net policy is applied, it picks up the assigned Tx queue from net policy subsystem, and redirect the traffic to the assigned queue. Signed-off-by: Kan Liang --- include

[RFC V2 PATCH 06/25] net/netpolicy: set and remove IRQ affinity

2016-08-04 Thread kan . liang
From: Kan Liang This patches introduces functions to set and remove IRQ affinity according to cpu and queue mapping. The functions will not record the previous affinity status. After a set/remove cycles, it will set the affinity on all online CPU with IRQ balance enabling. Signed-off-by: Kan

[RFC V2 PATCH 10/25] net/netpolicy: add three new NET policies

2016-08-04 Thread kan . liang
From: Kan Liang Introduce three NET policies CPU policy: configure for higher throughput and lower CPU% (power saving). BULK policy: configure for highest throughput. LATENCY policy: configure for lowest latency. Signed-off-by: Kan Liang --- include/linux/netpolicy.h | 3 +++ net/core

[RFC V2 PATCH 07/25] net/netpolicy: enable and disable NET policy

2016-08-04 Thread kan . liang
From: Kan Liang This patch introduces functions to enable and disable NET policy. For enabling, it collects device and CPU information, setup CPU/queue mapping, and set IRQ affinity accordingly. For disabling, it removes the IRQ affinity and mapping information. np_lock should protect the

[RFC V2 PATCH 11/25] net/netpolicy: add MIX policy

2016-08-04 Thread kan . liang
From: Kan Liang MIX policy is combine of other policies. It allows different queue has different policy. If MIX policy is applied, /proc/net/netpolicy/$DEV/policy shows per queue policy. Usually, the workloads requires either high throughput or low latency. So for current implementation, MIX

[RFC V2 PATCH 15/25] net/netpolicy: implement netpolicy register

2016-08-04 Thread kan . liang
From: Kan Liang The socket/task can only be benefited when it register itself with specific policy. If it's the first time to register, a record will be created and inserted into RCU hash table. The record includes ptr, policy and object information. ptr is the socket/task's pointe

[RFC V2 PATCH 13/25] net/netpolicy: support CPU hotplug

2016-08-04 Thread kan . liang
From: Kan Liang For CPU hotplug, the NET policy subsystem will rebuild the sys map and object list. Signed-off-by: Kan Liang --- net/core/netpolicy.c | 76 1 file changed, 76 insertions(+) diff --git a/net/core/netpolicy.c b/net/core

[RFC V2 PATCH 12/25] net/netpolicy: NET device hotplug

2016-08-04 Thread kan . liang
From: Kan Liang Support NET device up/down/namechange in the NET policy code. Signed-off-by: Kan Liang --- net/core/netpolicy.c | 66 +--- 1 file changed, 58 insertions(+), 8 deletions(-) diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c

[RFC V2 PATCH 09/25] net/netpolicy: set NET policy by policy name

2016-08-04 Thread kan . liang
From: Kan Liang User can write policy name to /proc/net/netpolicy/$DEV/policy to enable net policy for specific device. When the policy is enabled, the subsystem automatically disables IRQ balance and set IRQ affinity. The object list is also generated accordingly. It is device driver&#

[RFC V2 PATCH 01/25] net: introduce NET policy

2016-08-04 Thread kan . liang
From: Kan Liang This patch introduce NET policy subsystem. If proc is supported in the system, it creates netpolicy node in proc system. Signed-off-by: Kan Liang --- include/linux/netdevice.h | 7 +++ include/net/net_namespace.h | 3 ++ net/Kconfig | 7 +++ net/core

[RFC V2 PATCH 02/25] net/netpolicy: init NET policy

2016-08-04 Thread kan . liang
From: Kan Liang This patch tries to initialize NET policy for all the devices in the system. However, not all device drivers have NET policy support. For those drivers who does not have NET policy support, the node will not be showed in /proc/net/netpolicy/. The device driver who has NET policy

[RFC V2 PATCH 05/25] net/netpolicy: create CPU and queue mapping

2016-08-04 Thread kan . liang
From: Kan Liang Current implementation forces CPU and queue 1:1 mapping. This patch introduces the function netpolicy_update_sys_map to create this mapping. The result is stored in netpolicy_sys_info. If the CPU count and queue count are different, the remaining CPUs/queues are not used for now

[RFC V2 PATCH 00/25] Kernel NET policy

2016-08-04 Thread kan . liang
From: Kan Liang (re-send to correct system time issue. Sorry for any inconvenience.) It is a big challenge to get good network performance. First, the network performance is not good with default system settings. Second, it is too difficult to do automatic tuning for all possible workloads

[RFC V2 PATCH 04/25] net/netpolicy: get CPU information

2016-08-04 Thread kan . liang
From: Kan Liang Net policy also needs to know CPU information. Currently, online CPU number is enough. Signed-off-by: Kan Liang --- net/core/netpolicy.c | 5 + 1 file changed, 5 insertions(+) diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c index 7c34c8a..075aaca 100644 --- a/net

[PATCH V4] perf: Add PERF_SAMPLE_PHYS_ADDR

2017-08-17 Thread kan . liang
From: Kan Liang For understanding how the workload maps to memory channels and hardware behavior, it's very important to collect address maps with physical addresses. For example, 3D XPoint access can only be found by filtering the physical address. However, perf doesn't collect physic

[PATCH V5] perf: Add PERF_SAMPLE_PHYS_ADDR

2017-08-17 Thread kan . liang
From: Kan Liang For understanding how the workload maps to memory channels and hardware behavior, it's very important to collect address maps with physical addresses. For example, 3D XPoint access can only be found by filtering the physical address. However, perf doesn't collect physic

[PATCH] kernel/watchdog: fix spurious hard lockups

2017-06-20 Thread kan . liang
From: Kan Liang Some users reported spurious NMI watchdog timeouts. We now have more and more systems where the Turbo range is wide enough that the NMI watchdog expires faster than the soft watchdog timer that updates the interrupt tick the NMI watchdog relies on. This problem was originally

[PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-06-21 Thread kan . liang
From: Kan Liang Some users reported spurious NMI watchdog timeouts. We now have more and more systems where the Turbo range is wide enough that the NMI watchdog expires faster than the soft watchdog timer that updates the interrupt tick the NMI watchdog relies on. This problem was originally

[PATCH] perf script: add script to profile and resolve physical mem type

2017-10-16 Thread kan . liang
From: Kan Liang There could be different types of memory in the system. E.g normal System Memory, Persistent Memory. To understand how the workload maps to those memories, it's important to know the I/O statistics on different type of memorys. Perf can collect address maps with phy

[PATCH V2 0/2] measure SMI cost (user)

2017-05-26 Thread kan . liang
From: Kan Liang Currently, there is no way to measure the time cost in System management mode (SMM) by perf. Intel perfmon supports FREEZE_WHILE_SMM bit in IA32_DEBUGCTL. Once it sets, the PMU core counters will freeze on SMI handler. But it will not have an effect on free running counters. E.g

[PATCH V2 1/2] tools lib api fs: Add sysfs__write_int function

2017-05-26 Thread kan . liang
From: Kan Liang Adding sysfs__write_int function to ease up writing int to sysfs. New interface is: int sysfs__write_int(const char *entry, int value); Also, introducing filename__write_int which is useful for new helpers to write sysctl values. Signed-off-by: Kan Liang --- tools/lib/api

[PATCH V2 2/2] perf stat: Add support to measure SMI cost

2017-05-26 Thread kan . liang
From: Kan Liang Implementing a new --smi-cost mode in perf stat to measure SMI cost. During the measurement, the /sys/device/cpu/freeze_on_smi will be set. The measurement can be done with one counter (unhalted core cycles), and two free running MSR counters (IA32_APERF and SMI_COUNT). In

[PATCH V4 2/8] perf/x86/intel/uncore: correct fixed counter index check for NHM

2017-11-02 Thread kan . liang
From: Kan Liang For Nehalem and Westmere, there is only one fixed counter for W-Box. There is no index which is bigger than UNCORE_PMC_IDX_FIXED. It is not correct to use >= to check fixed counter. The code quality issue will bring problem when new counter index is introduced. Signed-off

[PATCH V4 6/8] perf/x86/intel/uncore: SKX support for IIO free running counters

2017-11-02 Thread kan . liang
From: Kan Liang As of Skylake Server, there are a number of free-running counters in each IIO Box that collect counts for per box IO clocks and per Port Input/Output x BW/Utilization. The free running counter is read-only and always active. Counting will be suspended only when the IIO Box is

[PATCH V4 7/8] perf/x86/intel/uncore: expose uncore_pmu_event functions

2017-11-02 Thread kan . liang
From: Kan Liang Some uncore has custom pmu. For custom pmu, it does not need to customize everything. For example, it only needs to customize init() function for client IMC uncore. Other functions like add()/del()/start()/stop()/read() can use generic code. Expose the uncore_pmu_event_add/del

[PATCH V4 1/8] perf/x86/intel/uncore: customized event_read for client IMC uncore

2017-11-02 Thread kan . liang
From: Kan Liang There are two free running counters for client IMC uncore. The custom event_init() function hardcode their index to 'UNCORE_PMC_IDX_FIXED' and 'UNCORE_PMC_IDX_FIXED + 1'. To support the 'UNCORE_PMC_IDX_FIXED + 1' case, the generic uncore_perf_event_u

[PATCH V4 5/8] perf/x86/intel/uncore: add infrastructure for free running counter

2017-11-02 Thread kan . liang
From: Kan Liang The free running counter is read-only and always active. Current generic uncore code does not support this kind of counters. The free running counter is read-only. It cannot be enable/disable in event_start/stop. The free running counter event and free running counter are 1:1

[PATCH V4 8/8] perf/x86/intel/uncore: clean up client IMC uncore

2017-11-02 Thread kan . liang
From: Kan Liang The counters in client IMC uncore are free running counters, not fixed counters. It should be corrected. The new infrastructure for free running counter should be applied. Introduce free running counter type SNB_PCI_UNCORE_IMC_DATA for data read and data write counters. Keep

[PATCH V4 4/8] perf/x86/intel/uncore: add new data structures for free running counters

2017-11-02 Thread kan . liang
From: Kan Liang There are a number of free running counters introduced for uncore, which provide highly valuable information to a wide array of customers. For example, Skylake Server has IIO free running counters to collect Input/Output x BW/Utilization. The precious generic counters could be

[PATCH V4 3/8] perf/x86/intel/uncore: correct fixed counter index check in generic code

2017-11-02 Thread kan . liang
From: Kan Liang There is no index which is bigger than UNCORE_PMC_IDX_FIXED. The only exception is client IMC uncore. It has customized function to deal with the 'UNCORE_PMC_IDX_FIXED + 1' case. It does not touch the generic code. For generic code, it is not correct to use >= t

[PATCH V3 2/5] perf/x86/intel/uncore: add infrastructure for free running counter

2017-10-24 Thread kan . liang
From: Kan Liang There are a number of free running counters introduced for uncore, which provide highly valuable information to a wide array of customers. For example, Skylake Server has IIO freerunning counters to collect Input/Output x BW/Utilization. The precious generic counters could be

[PATCH V3 1/5] perf/x86/intel/uncore: customized pmu event read for client IMC uncore

2017-10-24 Thread kan . liang
From: Kan Liang The client IMC uncore obscurely hack the generic uncore_perf_event_update to support the 'UNCORE_PMC_IDX_FIXED + 1' case. The code quality issue will bring problem when new counter index is introduced into generic code. For example, free running counter. Introduce cust

[PATCH V3 3/5] perf/x86/intel/uncore: SKX support for IIO free running counter

2017-10-24 Thread kan . liang
From: Kan Liang As of Skylake Server, there are a number of free-running counters in each IIO Box that collect counts for per box IO clocks and per Port Input/Output x BW/Utilization. The free running counter is read-only and always active. Counting will be suspended only when the IIO Box is

[PATCH V3 4/5] perf/x86/intel/uncore: expose pmu counter operation functions

2017-10-24 Thread kan . liang
From: Kan Liang Some uncore has custom pmu. Usually, it is not fully customized. Most of the counter operation functions can still use the generic code. For example, it only needs to customize init() function for client IMC uncore. Other counter operation functions, add()/del()/start()/stop

[PATCH V3 5/5] perf/x86/intel/uncore: clean up client IMC uncore

2017-10-24 Thread kan . liang
From: Kan Liang The counters in client IMC uncore are free running counters, not fixed counters. It should be corrected. The new infrastructure for free running counter should be applied. Introduce free running counter type SNB_PCI_UNCORE_IMC_DATA for data read and data write counters. Keep

[PATCH V2] perf script: add script to profile and resolve physical mem type

2017-10-25 Thread kan . liang
From: Kan Liang There could be different types of memory in the system. E.g normal System Memory, Persistent Memory. To understand how the workload maps to those memories, it's important to know the I/O statistics of them. Perf can collect physical addresses, but those are raw data. It

[PATCH] perf/x86/intel/uncore: add event constraint for BDX PCU

2017-11-14 Thread kan . liang
From: Kan Liang Event select bit 7 'Use Occupancy' in PCU Box is not available for counter 0 on BDX Add a constraint to fix it. Reported-by: Stephane Eranian Signed-off-by: Kan Liang Tested-by: Stephane Eranian --- arch/x86/events/intel/uncore_snbep.c | 8 1 file

[PATCH] perf vendor events: Add Goldmont Plus V1 event file

2017-10-18 Thread kan . liang
From: Kan Liang Add a Intel event file for perf. Signed-off-by: Kan Liang --- .../pmu-events/arch/x86/goldmontplus/cache.json| 1453 .../pmu-events/arch/x86/goldmontplus/frontend.json | 62 + .../pmu-events/arch/x86/goldmontplus/memory.json | 38 + .../pmu

[PATCH V2 0/5] event synthesization multithreading for perf record

2017-10-18 Thread kan . liang
From: Kan Liang The event synthesization multithreading is introduced in ("perf top optimization") https://lkml.org/lkml/2017/9/29/269 But it was not enabled for perf record. Because the process function process_synthesized_event was not multithreading friendly. The patch series t

[PATCH V2 4/5] perf record: synthesize event multithreading support

2017-10-18 Thread kan . liang
From: Kan Liang The process function process_synthesized_event writes the process result to perf.data, which is not multithreading friendly. Create per thread file to temporarily keep the processing result. Write them to the perf.data at the end of event synthesization. The new method doesn&#

[PATCH V2 5/5] perf record: add option to set the number of thread for event synthesize

2017-10-18 Thread kan . liang
From: Kan Liang Using UINT_MAX to indicate the default thread#, which is the number of online CPU. Signed-off-by: Kan Liang --- tools/perf/Documentation/perf-record.txt | 4 tools/perf/builtin-record.c | 13 +++-- 2 files changed, 15 insertions(+), 2 deletions

[PATCH V2 3/5] perf tools: expose copyfile_offset()

2017-10-18 Thread kan . liang
From: Kan Liang copyfile_offset could be used to merge per thread file to perf.data in the following patch. Signed-off-by: Kan Liang --- tools/perf/util/util.c | 2 +- tools/perf/util/util.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/util.c b/tools

[PATCH V2 2/5] perf tools: pass thread info in event synthesization

2017-10-18 Thread kan . liang
From: Kan Liang Pass the thread idx to process function, which is used by the following patch. Signed-off-by: Kan Liang --- tools/perf/builtin-record.c | 4 +- tools/perf/tests/dwarf-unwind.c | 3 +- tools/perf/util/event.c | 98 ++--- tools

[PATCH V2 1/5] perf tools: pass thread info to process function

2017-10-18 Thread kan . liang
From: Kan Liang For multithreading, the process function needs to know the thread related information. E.g. saving the process result to the buffer or file which belongs to specific thread. Add struct thread_info parameter for process function. Currently, it only includes thread index

[PATCH V2 2/4] perf/x86/intel/uncore: inline function to check the fixed counter event

2017-10-19 Thread kan . liang
From: Kan Liang Remove the special codes in generic uncore_perf_event_update. Introduce inline function to check the fixed counter event. Signed-off-by: Kan Liang --- Changes since V1: - New file to address check event->hw.idx >= UNCORE_PMC_IDX_FIXED arch/x86/events/intel/uncore

[PATCH V2 4/4] perf/x86/intel/uncore: SKX support for IIO freerunning counter

2017-10-19 Thread kan . liang
From: Kan Liang As of Skylake Server, there are a number of free-running counters in each IIO Box that collect counts for per box IO clocks and per Port Input/Output x BW/Utilization. Freerunning counters cannot be written by SW. Counting will be suspended only when the IIO Box is powered down

[PATCH V2 1/4] perf/x86/intel/uncore: use same idx for clinet IMC uncore events

2017-10-19 Thread kan . liang
From: Kan Liang The clinet IMC uncore is the only one who claims two 'fixed counters'. To specially handle it, event->hw.idx >= UNCORE_PMC_IDX_FIXED is used to check fixed counters in the generic uncore_perf_event_update. It does not have problem in current code. Because there

[PATCH V2 3/4] perf/x86/intel/uncore: add infrastructure for freerunning counters

2017-10-19 Thread kan . liang
From: Kan Liang There are a number of freerunning counters introduced for uncore. For example, Skylake Server has IIO freerunning counters to collect Input/Output x BW/Utilization. The freerunning counter is similar as fixed counter, except it cannot be written by SW. It needs to be specially

[PATCH V7 4/8] perf,tools: add backpointer for perf_env to evlist

2015-08-28 Thread Kan Liang
From: Kan Liang Add backpointer to evlist, so we can easily access env when processing something where we have a evsel or evlist. Suggested-by: Arnaldo Carvalho de Melo Signed-off-by: Kan Liang --- tools/perf/util/evlist.h | 1 + tools/perf/util/header.c | 1 + 2 files changed, 2 insertions

[PATCH V7 1/8] perf,tools: introduce generic FEAT for CPU attributes

2015-08-28 Thread Kan Liang
From: Kan Liang This patch introduces generic FEAT for CPU attributes. For the patch set, we only need cpu max frequency. But it can be easily extented to support more other CPU attributes. The cpu max frequency is from the first online cpu. Signed-off-by: Kan Liang --- tools/perf/util

[PATCH V7 5/8] perf evsel: Add a backpointer to the evlist a evsel is in

2015-08-28 Thread Kan Liang
From: Arnaldo Carvalho de Melo So that functions that deal primarily with an evsel to access information that concerns the whole evlist it is in. Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Kan Liang --- tools/perf/util/evlist.c | 2 ++ tools/perf/util/evsel.c | 2 ++ tools/perf

[PATCH V7 0/8] Freq/CPU%/CORE_BUSY% support

2015-08-28 Thread Kan Liang
nv and add backpointer to evlist patches Arnaldo Carvalho de Melo (1): perf evsel: Add a backpointer to the evlist a evsel is in Kan Liang (7): perf,tools: introduce generic FEAT for CPU attributes perf,tools: read msr pmu type from header. perf,tools: rename perf_session_env to perf_env pe

[PATCH V7 2/8] perf,tools: read msr pmu type from header.

2015-08-28 Thread Kan Liang
From: Kan Liang Get msr pmu type when processing pmu_mappings Signed-off-by: Kan Liang --- tools/perf/util/header.c | 3 +++ tools/perf/util/header.h | 1 + 2 files changed, 4 insertions(+) diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c index df4461a..8df0582 100644 --- a

[PATCH V7 3/8] perf,tools: rename perf_session_env to perf_env

2015-08-28 Thread Kan Liang
From: Kan Liang Rename perf_session_env to perf_env. Suggested-by: Arnaldo Carvalho de Melo Signed-off-by: Kan Liang --- tools/perf/arch/common.c| 4 ++-- tools/perf/arch/common.h| 2 +- tools/perf/ui/browser.h | 4 ++-- tools/perf/ui/browsers/header.c | 2

[PATCH V7 7/8] perf,tools: caculate and save freq/CPU%/CORE_BUSY% in he_stat

2015-08-28 Thread Kan Liang
From: Kan Liang Caculate freq/CPU%/CORE_BUSY% in add_entry_cb, and update the value in he_stat. Signed-off-by: Kan Liang --- tools/perf/builtin-report.c | 36 tools/perf/util/sort.h | 3 +++ 2 files changed, 39 insertions(+) diff --git a/tools/perf

[PATCH V7 8/8] perf,tools: Show freq/CPU%/CORE_BUSY% in perf report --stdio

2015-08-28 Thread Kan Liang
From: Kan Liang Show frequency, CPU Utilization and percent performance for each symbol in perf report by --stdio --show-freq-perf In sampling group, only group leader do sampling. So only need to print group leader's freq in --group. Here is an example. $ perf report --stdio --group -

[PATCH V7 6/8] perf,tools: Dump per-sample freq/CPU%/CORE_BUSY% in report -D

2015-08-28 Thread Kan Liang
From: Kan Liang The group read results from cycles/ref-cycles/TSC/ASTATE/MSTATE event can be used to calculate the frequency, CPU Utilization and percent performance during each sampling period. This patch shows them in report -D. Here is an example: $ perf record -e '{cycles,ref-cycle

[PATCH V2 1/2] perf,tools: store cpu socket_id and core_id in perf.date

2015-08-28 Thread Kan Liang
From: Kan Liang This patch stores cpu socket_id and core_id in perf.date, and read them to perf_env in header process. Signed-off-by: Kan Liang --- Changes since V1: - Store core_id and socket_id in perf.date tools/perf/util/header.c | 97

[PATCH V2 2/2] perf,test: test cpu topology

2015-08-28 Thread Kan Liang
From: Jiri Olsa This patch test cpu core_id and socket_id which are stored in perf_env. Signed-off-by: Jiri Olsa Signed-off-by: Kan Liang --- Changes since jirka's original version - Use pr_debug to replace fprintf - Add date_size to avoid warning - Introduce cpu_map, and compare co

[PATCH V3 2/3] perf,tools: store cpu socket_id and core_id in perf.date

2015-08-31 Thread Kan Liang
From: Kan Liang This patch stores cpu socket_id and core_id in perf.date, and read them to perf_env in header process. Although the changes modify the CPU_TOPOLOGY feature, the old perf still can correctly read new perf.data. Because the new codes are added at the end of the cpu_topology section

[PATCH V3 1/3] perf,tools: Separated functions to get core_id and socket_id

2015-08-31 Thread Kan Liang
From: Kan Liang This patch moves the codes which read core_id and socket_id into separated functions, and expose them. Signed-off-by: Kan Liang --- tools/perf/util/cpumap.c | 51 +++- tools/perf/util/cpumap.h | 2 ++ 2 files changed, 35 insertions

  1   2   3   4   5   6   7   8   9   10   >