On 04/24/2014 03:14 AM, Mel Gorman wrote:
> On Mon, Apr 21, 2014 at 11:24:25AM -0700, Dave Hansen wrote:
>> @@ -105,9 +108,10 @@ static void flush_tlb_func(void *info)
>>  
>>      count_vm_tlb_event(NR_TLB_REMOTE_FLUSH_RECEIVED);
>>      if (this_cpu_read(cpu_tlbstate.state) == TLBSTATE_OK) {
>> -            if (f->flush_end == TLB_FLUSH_ALL)
>> +            if (f->flush_end == TLB_FLUSH_ALL) {
>>                      local_flush_tlb();
>> -            else if (!f->flush_end)
>> +                    trace_tlb_flush(TLB_REMOTE_SHOOTDOWN, TLB_FLUSH_ALL);
>> +            } else if (!f->flush_end)
>>                      __flush_tlb_single(f->flush_start);
>>              else {
>>                      unsigned long addr;
> 
> Why is only the TLB_FLUSH_ALL case traced here and not the single flush
> or range of flushes? __native_flush_tlb_single() doesn't have a trace
> point so I worry we are missing visibility on this part in particular
> this part.
> 
>                         while (addr < f->flush_end) {
>                                 __flush_tlb_single(addr);
>                                 addr += PAGE_SIZE;
>                         }

You're right, I missed that bit.  I've corrected in a later version of
the patch.

>> @@ -152,7 +156,9 @@ void flush_tlb_current_task(void)
>>      preempt_disable();
>>  
>>      count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
>> +    trace_tlb_flush(TLB_LOCAL_SHOOTDOWN, TLB_FLUSH_ALL);
>>      local_flush_tlb();
>> +    trace_tlb_flush(TLB_LOCAL_SHOOTDOWN_DONE, TLB_FLUSH_ALL);
>>      if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
>>              flush_tlb_others(mm_cpumask(mm), mm, 0UL, TLB_FLUSH_ALL);
>>      preempt_enable();
> 
> Are the two tracepoints really useful? Are they fine enough to measure
> the cost of the TLB flush? It misses the refill obviously but not much
> we can do there.

It's fine enough, but I did realize over time that the cost of the
tracepoint is about 3x the cost of a 1-page tlb flush itself, so these
are unusable for detailed measurements.  I'll remove it for now.

>>  #endif /* _LINUX_MM_TYPES_H */
>> diff -puN /dev/null include/trace/events/tlb.h
>> --- /dev/null        2014-04-10 11:28:14.066815724 -0700
>> +++ b/include/trace/events/tlb.h     2014-04-21 11:10:35.529868198 -0700
>> @@ -0,0 +1,37 @@
>> +#undef TRACE_SYSTEM
>> +#define TRACE_SYSTEM tlb
>> +
>> +#if !defined(_TRACE_TLB_H) || defined(TRACE_HEADER_MULTI_READ)
>> +#define _TRACE_TLB_H
>> +
>> +#include <linux/mm_types.h>
>> +#include <linux/tracepoint.h>
>> +
>> +extern const char * const tlb_flush_reason_desc[];
>> +
>> +TRACE_EVENT(tlb_flush,
>> +
>> +    TP_PROTO(int reason, unsigned long pages),
>> +    TP_ARGS(reason, pages),
>> +
>> +    TP_STRUCT__entry(
>> +            __field(          int, reason)
>> +            __field(unsigned long,  pages)
>> +    ),
>> +
>> +    TP_fast_assign(
>> +            __entry->reason = reason;
>> +            __entry->pages  = pages;
>> +    ),
>> +
>> +    TP_printk("pages: %ld reason: %d (%s)",
>> +            __entry->pages,
>> +            __entry->reason,
>> +            tlb_flush_reason_desc[__entry->reason])
>> +);
>> +
> 
> I would also suggest you match the output formatting with writeback.h
> which would look like
> 
> pages:%lu reason:%s
> 
> The raw format should still have the integer while the string formatting
> would have something human readable.

I can do that.  The only bummer with the human-readable strings is
turning them back in to something that the filters can take.  I think
I'll just do:

+       TP_printk("pages:%ld reason:%s (%d)",
+               __entry->pages,
+               __print_symbolic(__entry->reason, TLB_FLUSH_REASON),
+               __entry->reason)
+);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to