On Wed, Apr 01, 2015 at 02:32:54PM +0530, Anshuman Khandual wrote:
> Hello,
>
> perf report is not showing up the branch stack sample results in the
> from_symbol ---> to_symbol format even if the perf.data file has got
> the samples (through 'perf record -b ' session). Perf report
Sorry for the
On Tue, Apr 14, 2015 at 10:55:41AM +0200, Ingo Molnar wrote:
>
> * Sukadev Bhattiprolu wrote:
>
> > This is another attempt to resurrect Andi Kleen's patchset so users
> > can specify perf events by their event names rather than raw codes.
> >
> > This is a rebase of Andi Kleen's patchset from
> My suggestion to resolve the technical objections and lift the NAK
> would be:
>
> - to add the tables to the source code, in a more human readable
>format and (optionally) structure the event names better into a
>higher level hierarchy, than the humungous linear dumps with no
>
On Fri, Apr 17, 2015 at 05:31:26PM +0200, Jiri Olsa wrote:
> On Wed, Apr 15, 2015 at 01:50:42PM -0700, Sukadev Bhattiprolu wrote:
>
> SNIP
>
> > |
> > | - to blindly follow some poorly constructed vendor format with no
> > |high level structure, that IMHO didn't work very well when OProfil
> I personaly like having set of event files in JSON notation
> rather than having them directly in C structure
Yes, strings are better and JSON input is also better.
I prototyped translating JSON into the proposed structures. I already had to
add three new fields, and it wouldn't work for uncor
> +/*
> + * Return TRUE if the CPU identified by @vfm, @version, and @type
> + * matches the current CPU. vfm refers to [Vendor, Family, Model],
> + *
> + * Return FALSE otherwise.
> + *
> + * For Powerpc, we only compare @version to the processor PVR.
> + */
> +bool arch_pmu_events_match_cpu(cons
> Obviously, that does not fit into the VFM field. We could either
> add a new PVR field to the mapfile:
>
> [vfm, version, type, pvr]
>
> or, as the patch currently does, let architectures intepret the
> "version" field as they see fit?
>
> IOW, leave it to architectures to keep arch_pmu_
On Wed, May 20, 2015 at 10:02:04PM -0700, Sukadev Bhattiprolu wrote:
> Andi Kleen [a...@linux.intel.com] wrote:
> | If you need something else in vfm to identify the CPU
> | can't you just add it there? I wouldn't really call it vfm, it's
> | really a "abstract
> pmu-events.c depends only on JSON files relevant to the arch perf is
> being built on and there could be several JSON files per arch. So it
> would complicate the Makefiles.
Could just use a wildcard dependency on */$(ARCH)/*.json
Also it would be good to move the generated file into the objec
> Sure, but shouldn't we allow JSON files to be in subdirs
>
> pmu-events/arch/x86/HSX/Haswell_core.json
>
> and this could go to arbitrary levels?
I used a flat hierarchy. Should be good enough.
-Andi
--
a...@linux.intel.com -- Speaking for myself only
_
> So we build tables of all models in the architecture, and choose
> matching one when compiling perf, right? Can't we do that when
> building the tables? IOW, why don't we check the VFM and discard
> non-matching tables? Those non-matching tables are also needed?
We build it for all cpus in an
> > + {
> > +"EventCode": "0x2505e",
> > +"EventName": "PM_BACK_BR_CMPL",
> > +"BriefDescription": "Branch instruction completed with a target
> > address less than current instruction address,",
> > +"PublicDescription": "Branch instruction completed with a target
> > address le
On Thu, May 28, 2015 at 12:01:31AM +0900, Namhyung Kim wrote:
> On Wed, May 27, 2015 at 11:41 PM, Andi Kleen wrote:
> >> > + {
> >> > +"EventCode": "0x2505e",
> >> > +"EventName": "PM_BACK_BR_CMPL&quo
> So instead of this flat structure, there should at minimum be broad
> categorization
> of the various parts of the hardware they relate to: whether they relate to
> the
> branch predictor, memory caches, TLB caches, memory ops, offcore, decoders,
> execution units, FPU ops, etc., etc. - so t
On Thu, May 28, 2015 at 02:39:14PM +0200, Jiri Olsa wrote:
> On Wed, May 27, 2015 at 02:23:28PM -0700, Sukadev Bhattiprolu wrote:
> > From: Andi Kleen
> >
> > Add a --no-desc flag to perf list to not print the event descriptions
> > that were earlier added for JSON ev
On Fri, May 29, 2015 at 11:13:15AM +0200, Jiri Olsa wrote:
> On Thu, May 28, 2015 at 10:45:06PM -0700, Sukadev Bhattiprolu wrote:
> > Jiri Olsa [jo...@redhat.com] wrote:
> > | > if (line[0] == '#' || line[0] == '\n')
> > | > continue;
> > | > +
Ok I did some scripting to add these topics you requested to the Intel JSON
files,
and changed perf list to group events by them.
I'll redirect any questions on their value to you.
And I certainly hope this is the last of your "improvements" for now.
The updated event lists are available in
> please split at least the jevents Topic parsing from the rest
> idelay also the alias update and the display change
What's the point of all these splits? It's already one logical unit,
not too large, and is bisectable.
-andi
--
a...@linux.intel.com -- Speaking for myself only
On Fri, Jun 05, 2015 at 12:21:38PM +0200, Jiri Olsa wrote:
> On Thu, Jun 04, 2015 at 11:27:23PM -0700, Sukadev Bhattiprolu wrote:
>
> SNIP
>
> > ---
> > tools/perf/builtin-list.c | 11 ---
> > 1 file changed, 8 insertions(+), 3 deletions(-)
> >
> > diff --git a/tools/perf/builtin-list
Linus Torvalds writes:
> On Mon, Sep 30, 2013 at 1:01 PM, Waiman Long wrote:
>>
>> I think this patch is worth a trial if relevant hardware is more widely
>> available. The TSX code certainly need to be moved to an architecture
>> specific area and should be runtime enabled using a static key. W
On Wed, Dec 04, 2013 at 04:02:36PM +0530, Anshuman Khandual wrote:
> This patch adds conditional branch filtering support,
> enabling it for PERF_SAMPLE_BRANCH_COND in perf branch
> stack sampling framework by utilizing an available
> software filter X86_BR_JCC.
Newer Intel CPUs a hardware filter
Anton Blanchard writes:
>
> Thoughts? It seems like we could hit a similar situation if a machine
> is balanced but we run out of memory on a single node.
Yes I agree, but your patch doesn't seem to attempt to handle this?
-Andi
>
> Index: b/mm/slub.c
> ==
> I'll NAK any external 'download area' (and I told that Andi
> before): tools/perf/event-tables/ or so is a good enough
> 'download area' with fast enough update cycles.
The proposal was to put it on kernel.org, similar to how
external firmware blobs are distributed. CPU event lists
are data sh
Well I'm tired of discussing this. I don't think what you
proposed makes sense, putting 3.4MB[1] of changing blob into perf.
I'll resubmit the JSON parser without the downloader. Then users
have the option to get their own events and use that.
If you don't like that, standard perf just has to st
Thanks for supporting the JSON format too.
> (c) If not, given we don't know how to get us out of the current
> status quo, can this patchseries still be applied, given the
> original complaint was the size of our events-list.h (whereas
The Intel core event lists are far larger even
(and will gr
On Mon, Nov 30, 2015 at 06:56:55PM -0800, Sukadev Bhattiprolu wrote:
> CPUs support a large number of performance monitoring events (PMU events)
> and often these events are very specific to an architecture/model of the
> CPU. To use most of these PMU events with perf, we currently have to identify
On Mon, Mar 02, 2020 at 11:13:32AM +0100, Peter Zijlstra wrote:
> On Mon, Mar 02, 2020 at 10:53:44AM +0530, Ravi Bangoria wrote:
> > Modern processors export such hazard data in Performance
> > Monitoring Unit (PMU) registers. Ex, 'Sampled Instruction Event
> > Register' on IBM PowerPC[1][2] and 'I
> > diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> > index 9e757d18d713..679aaa655824 100644
> > --- a/tools/perf/util/stat-display.c
> > +++ b/tools/perf/util/stat-display.c
> > @@ -237,8 +237,6 @@ static bool valid_only_metric(const char *unit)
> > if (!unit)
>
On Fri, Jan 17, 2020 at 06:16:19PM +0530, Kajol Jain wrote:
> Patch enhances current metric infrastructure to handle "?" in the metric
> expression. The "?" can be use for parameters whose value not known while
> creating metric events and which can be replace later at runtime to
> the proper value
> Here, '?' will be replaced with a runtime value and metric expression will
> be replicated.
Okay seems reasonable to me.
Thanks,
-Andi
> Moving 64 bytes per cycle is faster on Sandy Bridge, but slower on
> Westmere. Any preference? ;)
You have to be careful with these benchmarks.
- You need to make sure the data is cache cold, cache hot is misleading.
- The numbers can change if you have multiple CPUs doing this in parallel.
-A
Yasuaki Ishimatsu writes:
> + }
> +
> + /*
> + * We use 2M page, but we need to remove part of them,
> + * so split 2M page to 4K page.
> + */
> + pte = alloc_low_page(&pte_ph
Thomas Gleixner writes:
> On Mon, 11 Oct 2010, Tim Pepper wrote:
>
>> I'm not necessarily wanting to open up the age old question of "what is
>> a good HZ", but we were doing some testing on timer tick overheads for
>> HPC applications and this came up...
>
> Yeah. This comes always up when the t
On Tue, Mar 08, 2011 at 07:31:56PM -0500, Stephen Wilson wrote:
>
> Morally, the question of whether an address lies in a gate vma should be asked
> with respect to an mm, not a particular task.
>
> Practically, dropping the dependency on task_struct will help make current and
> future operations
On Thu, Mar 10, 2011 at 08:00:32AM -0800, Andi Kleen wrote:
> On Tue, Mar 08, 2011 at 07:31:56PM -0500, Stephen Wilson wrote:
> >
> > Morally, the question of whether an address lies in a gate vma should be
> > asked
> > with respect to an mm, not a particular
On Thu, Mar 10, 2011 at 11:54:14AM -0500, Stephen Wilson wrote:
>
> On Thu, Mar 10, 2011 at 08:38:09AM -0800, Andi Kleen wrote:
> > On Thu, Mar 10, 2011 at 08:00:32AM -0800, Andi Kleen wrote:
> > > On Tue, Mar 08, 2011 at 07:31:56PM -0500, Stephen Wilson wrote:
> > >
Peter Zijlstra writes:
>
> So does it make sense to have a set of sets?
>
> Why not integrate them all into one set to be ruled by this governor
> thing?
cpuidle is currently optional, that is why the two level hierarchy
is there so that you can still have simple idle selection without it.
% siz
> How about something like this..
> If the arch does not enable CONFIG_CPU_IDLE, the cpuidle_idle_call
> which is called from cpu_idle() should call default_idle without
> involving the registering cpuidle steps. This should prevent bloating
> up of the kernel for archs which dont want to use cpuid
Ben Hutchings writes:
> WARN() is used in some places to report firmware or hardware bugs that
> are then worked-around. These bugs do not affect the stability of the
> kernel and should not set the usual TAINT_WARN flag. To allow for
> this, add WARN_TAINT() and WARN_TAINT_ONCE() macros that t
On Thu, May 12, 2011 at 04:13:54AM -0500, Milton Miller wrote:
>
> Move the smp_rmb after cpu_relax loop in read_seqlock and add
> ACCESS_ONCE to make sure the test and return are consistent.
>
> A multi-threaded core in the lab didn't like the update
Which core was that?
-Andi
hile a cpu does the
additonal work to update the system clock, the seqlock
implementation with the tight rmb spin loop goes back much
further, and is just waiting for the right trigger.
Signed-off-by: Milton Miller
Cc:
Cc: Linus Torvalds
Cc: Andi Kleen
Cc: Nick Piggin
Cc: Benjamin Herrenschmidt
, Ian Munsie wrote:
From: Ian Munsie
This patch converts numerous trivial compat syscalls through the generic
kernel code to use the COMPAT_SYSCALL_DEFINE family of macros.
Why? This just makes the code look uglier and the functions harder
to grep for.
-Andi
__
, Frederic Weisbecker wrote:
On Wed, Jun 23, 2010 at 12:19:38PM +0200, Andi Kleen wrote:
, Ian Munsie wrote:
From: Ian Munsie
This patch converts numerous trivial compat syscalls through the generic
kernel code to use the COMPAT_SYSCALL_DEFINE family of macros.
Why? This just makes the code
I haven't heard any complains about existing syscalls wrappers.
At least for me they always interrupt my grepping.
What kind of annotations could solve that?
If you put the annotation in a separate macro and leave the original
prototype alone. Then C parsers could still parse it.
-Andi
"Markus Gutschke (ÜÒÐ)" writes:
>
> There are a large number of system calls that "normal" C/C++ code uses
> quite frequently, and that are not security sensitive. A typical
> example would be gettimeofday().
At least on x86-64 gettimeofday() (and time(2)) work inside seccomp because
they're vsy
"Chris Friesen" writes:
>
> One of the reasons I brought up this issue is that there is a lot of
> documentation out there that says "softirqs will be processed on return
> from a syscall". The fact that it actually depends on the scheduler
> parameters of the task issuing the syscall isn't ever
> "If a soft irq is raised in process context, raise_softirq() in
> kernel/softirq.c calls wakeup_softirqd() to make sure that ksoftirqd
softirqd is only used when the softirq runs for too long or when
there are no suitable irq exits for a long time.
In normal situations (not excessive time in so
Thomas Gleixner writes:
> Err, no. Chris is completely correct:
>
> if (!in_interrupt())
> wakeup_softirqd();
Yes you have to wake it up just in case, but it doesn't normally
process the data because a normal softirq comes in faster. It's
just a safety policy.
You can ch
> I have one machine SMP flooded by network frames, CPU0 handling all
Yes that's the case softirqd is supposed to handle. When you
spend a significant part of your CPU time in softirq context it kicks
in to provide somewhat fair additional CPU time.
But most systems (like mine) don't do that.
-
On Wed, May 13, 2009 at 09:05:01AM -0600, Chris Friesen wrote:
> Andi Kleen wrote:
> > Thomas Gleixner writes:
>
> >>Err, no. Chris is completely correct:
> >>
> >>if (!in_interrupt())
> >>wakeup_softirqd();
> >
>
On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote:
> Andi Kleen wrote:
>
> > network packets are normally processed by the network packet interrupt's
> > softirq or alternatively in the NAPI poll loop.
>
> If we have a high priority task, ksoftirqd may
On Wed, May 13, 2009 at 01:44:59PM -0600, Chris Friesen wrote:
> Andi Kleen wrote:
> > On Wed, May 13, 2009 at 01:04:09PM -0600, Chris Friesen wrote:
> >> Andi Kleen wrote:
> >>
> >>> network packets are normally processed by the network packet interrupt&
Mike Mason writes:
>
> These patches supersede the previously submitted patch that
> implemented a fundamental reset bit field.
>
> Please review and let me know of any concerns.
Any plans to implement that for x86 too? Right now it seems to be a PPC
specific hack. And where is the driver that i
Arjan van de Ven <[EMAIL PROTECTED]> writes:
> you have more faith in the authors knowledge of how his code actually behaves
> than I think is warranted :)
iirc there was a mm patch some time ago to keep track of the actual unlikely
values at runtime and it showed indeed some wrong ones. But the
> Sometimes, for performance critical paths, I would like gcc to be dumb and
> follow *my* code and not its hard-coded probabilities.
If you really want that, simple: just disable optimization @)
> Maybe one thing we would need would be the ability to assign probabilities
> to each branch based
On Tue, Feb 19, 2008 at 01:33:53PM +1100, Nick Piggin wrote:
> On Tuesday 19 February 2008 01:39, Andi Kleen wrote:
> > Arjan van de Ven <[EMAIL PROTECTED]> writes:
> > > you have more faith in the authors knowledge of how his code actually
> > > behaves than I th
On Tue, Feb 19, 2008 at 08:46:46PM +1100, Nick Piggin wrote:
> On Tuesday 19 February 2008 20:25, Andi Kleen wrote:
> > On Tue, Feb 19, 2008 at 01:33:53PM +1100, Nick Piggin wrote:
>
> > > I actually once measured context switching performance in the scheduler,
> > &
Stefan Richter <[EMAIL PROTECTED]> writes:
>
> 1.) The ieee1394 subsystem is known to work on x86_64 with more than 4
> GB RAM,
It's actually ~3+GB where memory above the 4GB barrier starts appearing.
In some extreme cases even for 2+GB.
> so I gather that architecture code already sets a pr
On Wednesday 12 September 2007 03:56, [EMAIL PROTECTED] wrote:
> Note:
>
> This patch consolidates all the previous patches regarding
> the conversion of static arrays sized by NR_CPUS into per_cpu
> data arrays and is referenced against 2.6.23-rc6 .
Looks good to me from the x86 side. I'll leave
David Miller <[EMAIL PROTECTED]> writes:
> From: Arnd Bergmann <[EMAIL PROTECTED]>
> Date: Tue, 16 Oct 2007 21:50:35 +0200
>
> > The one point where it is expected to have changed now is when you
> > try to do these ioctls on something that is not a block device. Are
> > you sure that the files y
FWIW i turned over the hugepages patchkit to Nick Piggin. So send
all future patches to him please.
-Andi
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev
Haven't reviewed it in detail, just noticed something.
> @@ -614,6 +610,7 @@ static int __init hugetlb_init(void)
> {
> if (HPAGE_SHIFT == 0)
> return 0;
> + INIT_LIST_HEAD(&huge_boot_pages);
> return hugetlb_init_hstate(&global_hstate);
I don't think adding the IN
Andrew Morton <[EMAIL PROTECTED]> writes:
> Do we expect that this change will be replicated in other
> memory-intensive apps? (I do).
The catch with 2MB pages on x86 is that x86 CPUs generally have
much less 2MB TLB entries than 4K entries. So if you're unlucky
and access a lot of mappings you
Benjamin Herrenschmidt <[EMAIL PROTECTED]> writes:
> The ISS simulator is a simple powerpc simulator used among other things
> for hardware bringup. It implements a simple memory mapped block device
> interface.
...
>
> --- /dev/null 1970-01-01 00:00:00.0 +
> +++ linux-work/drivers/bl
"H. Peter Anvin" <[EMAIL PROTECTED]> writes:
>
> For one thing, it looks like we're returning the wrong thing (EINVAL
> rather than ENOTTY) across the board. This was unfortunately a common
> misunderstanding with non-tty-related ioctls in the early days of Linux.
ENOTTY is so excessively misnam
On Fri, Jul 06, 2007 at 11:33:43AM -0700, Jeremy Fitzhardinge wrote:
> Andi Kleen wrote:
> >"H. Peter Anvin" <[EMAIL PROTECTED]> writes:
> >
> >>For one thing, it looks like we're returning the wrong thing (EINVAL
> >>rather than ENOT
> > What's the point of this indirection other than another way of avoiding
> > empty node 0?
>
> Honestly, I do not have any idea. I've traced it down to
> Author: Andi Kleen
> Date: Tue Jan 11 15:35:48 2005 -0800
I don't remember all the details, and
On Fri, Apr 07, 2017 at 05:20:31PM +0200, Peter Zijlstra wrote:
> On Fri, Apr 07, 2017 at 06:47:43PM +0800, Jin Yao wrote:
> > Perf already has support for disassembling the branch instruction
> > and using the branch type for filtering. The patch just records
> > the branch type in perf_branch_ent
> > It's a somewhat common situation with partially JITed code, if you
> > don't have an agent. You can still do a lot of useful things.
>
> Like what? How can you say anything about code you don't have?
For example if you combine the PMU topdown measurement, and see if it's
frontend bound, and t
Laurent Dufour writes:
> From: Peter Zijlstra
>
> One of the side effects of speculating on faults (without holding
> mmap_sem) is that we can race with free_pgtables() and therefore we
> cannot assume the page-tables will stick around.
>
> Remove the reliance on the pte pointer.
This needs a l
On Mon, Mar 26, 2018 at 02:44:48PM -0700, David Rientjes wrote:
> On Tue, 13 Mar 2018, Laurent Dufour wrote:
>
> > Add support for the new speculative faults event.
> >
> > Signed-off-by: Laurent Dufour
>
> Acked-by: David Rientjes
>
> Aside: should there be a new spec_flt field for struct ta
On Mon, Sep 26, 2016 at 12:03:43PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Sep 26, 2016 at 10:35:33AM +0200, Jiri Olsa escreveu:
> > ping.. is that working for you? IMO we can include this
> > as additional patch to the set..
>
> No, it doesn't fails to build on the first cross env I trie
> I don't understand what led Andi Kleen to also move .text.hot and
> .text.unlikely together with .text [2], but this may have
> been a related issue.
The goal was just to move .hot and .unlikely all together, so that
they are clustered and use the minimum amount of cache. On x86
On Wed, Aug 10, 2016 at 12:29:29AM +0200, Arnd Bergmann wrote:
> On Monday, August 8, 2016 8:16:05 PM CEST Andi Kleen wrote:
> > > I don't understand what led Andi Kleen to also move .text.hot and
> > > .text.unlikely together with .text [2], but this may have
On Wed, Aug 10, 2016 at 12:29:29AM +0200, Arnd Bergmann wrote:
> On Monday, August 8, 2016 8:16:05 PM CEST Andi Kleen wrote:
> > > I don't understand what led Andi Kleen to also move .text.hot and
> > > .text.unlikely together with .text [2], but this may have
> hi,
> I had discussion with Ingo about the state of this patchset
> and there's one more requirement from his side - to split
> event files into per topic files
Thanks Jiri.
>
> I made some initial changes over latest Sukadev's branch
> and came up with something like this:
Did you just split
> >
> > >
> > > I've already made some changes in pmu-events/* to support
> > > this hierarchy to see how bad the change would be.. and
> > > it's not that bad ;-)
> >
> > Everything has to be automated, please no manual changes.
>
> sure
>
> so, if you're ok with the layout, how do you want t
> That makes me extremely nervous... there could be all sort of
> assumptions esp. in arch code about the fact that we never populate the
> tree without the mm sem.
>
> We'd have to audit archs closely. Things like the page walk cache
> flushing on power etc...
Yes the whole thing is quite risky.
78 matches
Mail list logo