On Sun, Nov 11, 2018 at 01:07:55PM +0800, Leo Yan wrote:
> The perf sample data contains flags to indicate the hardware trace data
> is belonging to which type branch instruction, thus this can be used to
> print out the human readable string.  Arm CoreSight ETM sample data is
> missed to set flags and it is always set to zeros, this results in perf
> tool skips to print string for instruction types.
> 
> Arm CoreSight ETM supports different kinds instruction of A64, A32 and
> T32; this patch is to set branch instruction flags in packet for these
> ISAs.
> 
> The brief idea for patch implementation is describe as below:
> 
> - For element with OCSD_GEN_TRC_ELEM_TRACE_ON type, it is taken as trace
>   beginning packet; for element with OCSD_GEN_TRC_ELEM_NO_SYNC or
>   OCSD_GEN_TRC_ELEM_EO_TRACE, these two kinds elements are used to set
>   for trace end;
> 
>   As Mike suggested the packet stream might have more than one two
>   TRACE_ON packets, the first one TRACE_ON packet indicates trace end
>   and the second one is taken as trace restarting.  We will handle this
>   special case in the upper layer with packet queue handling, which has
>   more context so it's more suitable fix up for it.  This will be
>   accomplished in the sequential patch.
> 
> - For instruction range packet, mainly base on three factors to decide
>   the branch instruction types:
> 
>   elem->last_i_type
>   elem->last_i_subtype
>   elem->last_instr_cond
> 
>   If the instruction is immediate branch but without link and return
>   flag, we consider it as function internal branch;  in fact the
>   immediate branch also can be used to invoke the function entry,
>   usually this is only used in assembly code to directly call a symbol
>   and don't expect to return back; after reviewing kernel normal
>   functions and user space programs, both of them are very seldom to use
>   immediate branch for function call.  On the other hand, if we want to
>   decide the immediate branch is for function branch jumping or for
>   function calling, we need to rely on the start address of next packet
>   and check the symbol offset for the start address,  this will
>   introduce much complexity in the implementation.  So for this version
>   we simply consider immediate branch as function internal branch.
>   Moreover, we rely on 'elem->last_instr_cond' to decide if the branch
>   instruction is a conditional branch or not.
> 
>   If the instruction is immediate branch with link, it's instruction
>   'BL' and which is used for function call.
> 
>   If the instruction is indirect branch and with subtype
>   OCSD_S_INSTR_V7_IMPLIED_RET, the decoders gives the hint the function
>   return for below cases related with A32/T32 instruction; set this
>   branch flag as function return (Thanks for Al's suggestion).
> 
>     BX R14
>     MOV PC, LR
>     POP {…, PC}
>     LDR PC, [SP], #offset
> 
>   If the instruction is indirect branch without link, this is
>   corresponding to instruction 'BR', this instruction usually is used
>   for dynamic link lib with below usage; so we think it's a return
>   instruction.
> 
>     0000000000000680 <.plt>:
>      680:   a9bf7bf0        stp     x16, x30, [sp, #-16]!
>      684:   90000090        adrp    x16, 10000 <__FRAME_END__+0xf630>
>      688:   f947fe11        ldr     x17, [x16, #4088]
>      68c:   913fe210        add     x16, x16, #0xff8
>      690:   d61f0220        br      x17
> 
>   If the instruction is indirect branch with link, e.g BLR, we think
>   it's a function call.
> 
>   For function return, ARMv8 introduces a dedicated instruction 'ret',
>   which has flag of OCSD_S_INSTR_V8_RET.
> 
> - For exception packets, this patch divides into three types:
> 
>   The first type of exception is caused by external logics like bus,
>   interrupt controller, debug module or PE reset or halt; this is
>   corresponding to flags "bcyi" which defined in doc perf-script.txt;
> 
>   The second type is for system call, this is set as "bcs" by following
>   definition in the doc;
> 
>   The third type is for CPU trap, data and instruction prefetch abort,
>   alignment abort; usually these exceptions are synchronous for CPU, so
>   set them as "bci" type.

This is too long and needs to be broken down into pieces.  I would split this
patch in 3 heat, one for NO_SYNC and TRACE_ON, one for INSTR_RANGE and one for
ELEM_EXCEPTION/ELEM_EXCEPTION_RET. 

> 
> Cc: Mathieu Poirier <mathieu.poir...@linaro.org>
> Cc: Mike Leach <mike.le...@linaro.org>
> Cc: Robert Walker <robert.wal...@arm.com>
> Cc: Al Grant <al.gr...@arm.com>
> Cc: Andi Kleen <a...@firstfloor.org>
> Cc: Adrian Hunter <adrian.hun...@intel.com>
> Cc: Arnaldo Carvalho de Melo <a...@redhat.com>
> Signed-off-by: Leo Yan <leo....@linaro.org>
> ---
>  tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 168 
> ++++++++++++++++++++++++
>  tools/perf/util/cs-etm-decoder/cs-etm-decoder.h |   1 +
>  2 files changed, 169 insertions(+)
> 
> diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c 
> b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
> index d1a6cbc..0e50c52 100644
> --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
> +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
> @@ -303,6 +303,7 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder 
> *decoder,
>       decoder->packet_buffer[et].instr_count = 0;
>       decoder->packet_buffer[et].last_instr_taken_branch = false;
>       decoder->packet_buffer[et].last_instr_size = 0;
> +     decoder->packet_buffer[et].flags = 0;

Since PERF_IP_FLAG_BRANCH is '0', I would set this to UNINT32_MAX.

>  
>       if (decoder->packet_count == MAX_BUFFER - 1)
>               return OCSD_RESP_WAIT;
> @@ -437,6 +438,171 @@ cs_etm_decoder__buffer_exception_ret(struct 
> cs_etm_decoder *decoder,
>                                            CS_ETM_EXCEPTION_RET);
>  }
>  
> +static void cs_etm_decoder__set_sample_flags(
> +                             const void *context,
> +                             const ocsd_generic_trace_elem *elem)
> +{
> +     struct cs_etm_decoder *decoder = (struct cs_etm_decoder *) context;
> +     struct cs_etm_packet *packet;
> +     u32 exc_num;
> +
> +     packet = &decoder->packet_buffer[decoder->tail];
> +
> +     switch (elem->elem_type) {
> +     case OCSD_GEN_TRC_ELEM_TRACE_ON:
> +             packet->flags = PERF_IP_FLAG_BRANCH |
> +                             PERF_IP_FLAG_TRACE_BEGIN;
> +             break;
> +
> +     case OCSD_GEN_TRC_ELEM_NO_SYNC:
> +     case OCSD_GEN_TRC_ELEM_EO_TRACE:
> +             packet->flags = PERF_IP_FLAG_BRANCH |
> +                             PERF_IP_FLAG_TRACE_END;
> +             break;
> +
> +     case OCSD_GEN_TRC_ELEM_INSTR_RANGE:
> +             /*
> +              * Immediate branch instruction without neither link nor
> +              * return flag, it's normal branch instruction within
> +              * the function.
> +              */
> +             if (elem->last_i_type == OCSD_INSTR_BR &&
> +                 elem->last_i_subtype == OCSD_S_INSTR_NONE) {
> +                     packet->flags = PERF_IP_FLAG_BRANCH;
> +
> +                     if (elem->last_instr_cond)
> +                             packet->flags |= PERF_IP_FLAG_CONDITIONAL;
> +             }
> +
> +             /*
> +              * Immediate branch instruction with link (e.g. BL), this is
> +              * branch instruction for function call.
> +              */
> +             if (elem->last_i_type == OCSD_INSTR_BR &&
> +                 elem->last_i_subtype == OCSD_S_INSTR_BR_LINK)
> +                     packet->flags = PERF_IP_FLAG_BRANCH |
> +                                     PERF_IP_FLAG_CALL;
> +
> +             /*
> +              * Indirect branch instruction with subtype of
> +              * OCSD_S_INSTR_V7_IMPLIED_RET, this is explicit hint for
> +              * function return for A32/T32.
> +              */
> +             if (elem->last_i_type == OCSD_INSTR_BR_INDIRECT &&
> +                 elem->last_i_subtype == OCSD_S_INSTR_V7_IMPLIED_RET)
> +                     packet->flags = PERF_IP_FLAG_BRANCH |
> +                                     PERF_IP_FLAG_RETURN;
> +
> +             /*
> +              * Indirect branch instruction without link (e.g. BR), usually
> +              * this is used for function return, especially for functions
> +              * within dynamic link lib.
> +              */
> +             if (elem->last_i_type == OCSD_INSTR_BR_INDIRECT &&
> +                 elem->last_i_subtype == OCSD_S_INSTR_NONE)
> +                     packet->flags = PERF_IP_FLAG_BRANCH |
> +                                     PERF_IP_FLAG_RETURN;
> +
> +             /*
> +              * Indirect branch instruction with link (e.g. BLR), this is
> +              * branch instruction for function call.
> +              */
> +             if (elem->last_i_type == OCSD_INSTR_BR_INDIRECT &&
> +                 elem->last_i_subtype == OCSD_S_INSTR_BR_LINK)
> +                     packet->flags = PERF_IP_FLAG_BRANCH |
> +                                     PERF_IP_FLAG_CALL;
> +
> +             /* Return instruction for function return. */
> +             if (elem->last_i_type == OCSD_INSTR_BR_INDIRECT &&
> +                 elem->last_i_subtype == OCSD_S_INSTR_V8_RET)
> +                     packet->flags = PERF_IP_FLAG_BRANCH |
> +                                     PERF_IP_FLAG_RETURN;

I would swap the last to if() condition so that the (BRANCH | RETURN) flags
are all at the same place.

> +
> +             break;
> +
> +     case OCSD_GEN_TRC_ELEM_EXCEPTION:
> +
> +#define OCSD_EXC_RESET                       0
> +#define OCSD_EXC_DEBUG_HALT          1
> +#define OCSD_EXC_CALL                        2
> +#define OCSD_EXC_TRAP                        3
> +#define OCSD_EXC_SYSTEM_ERROR                4
> +#define OCSD_EXC_INST_DEBUG          6
> +#define OCSD_EXC_DATA_DEBUG          7
> +#define OCSD_EXC_ALIGNMENT           10
> +#define OCSD_EXC_INST_FAULT          11
> +#define OCSD_EXC_DATA_FAULT          12
> +#define OCSD_EXC_IRQ                 14
> +#define OCSD_EXC_FIQ                 15

Where did you get the above?  To me this is something that should come from the
library.

> +
> +             exc_num = decoder->exc_num[packet->cpu];
> +
> +             /*
> +              * The exceptions are triggered by external signals
> +              * from bus, interrupt controller, debug module,
> +              * PE reset or halt.
> +              */
> +             if (exc_num == OCSD_EXC_RESET ||
> +                 exc_num == OCSD_EXC_DEBUG_HALT ||
> +                 exc_num == OCSD_EXC_SYSTEM_ERROR ||
> +                 exc_num == OCSD_EXC_INST_DEBUG ||
> +                 exc_num == OCSD_EXC_DATA_DEBUG ||
> +                 exc_num == OCSD_EXC_IRQ ||
> +                 exc_num == OCSD_EXC_FIQ)
> +                     packet->flags = PERF_IP_FLAG_BRANCH |
> +                                     PERF_IP_FLAG_CALL |
> +                                     PERF_IP_FLAG_ASYNC |
> +                                     PERF_IP_FLAG_INTERRUPT;
> +
> +             /* The exception is for system call. */
> +             if (exc_num == OCSD_EXC_CALL)
> +                     packet->flags = PERF_IP_FLAG_BRANCH |
> +                                     PERF_IP_FLAG_CALL |
> +                                     PERF_IP_FLAG_SYSCALLRET;
> +
> +             /*
> +              * The exception is introduced by trap, instruction &
> +              * data fault or alignment errors.
> +              */
> +             if (exc_num == OCSD_EXC_TRAP ||
> +                 exc_num == OCSD_EXC_ALIGNMENT ||
> +                 exc_num == OCSD_EXC_INST_FAULT ||
> +                 exc_num == OCSD_EXC_DATA_FAULT)
> +                     packet->flags = PERF_IP_FLAG_BRANCH |
> +                                     PERF_IP_FLAG_CALL |
> +                                     PERF_IP_FLAG_INTERRUPT;
> +
> +             break;
> +
> +     case OCSD_GEN_TRC_ELEM_EXCEPTION_RET:
> +
> +             exc_num = decoder->exc_num[packet->cpu];
> +
> +             if (exc_num == OCSD_EXC_CALL)
> +                     packet->flags = PERF_IP_FLAG_BRANCH |
> +                                     PERF_IP_FLAG_RETURN |
> +                                     PERF_IP_FLAG_SYSCALLRET;
> +             else
> +                     packet->flags = PERF_IP_FLAG_BRANCH |
> +                                     PERF_IP_FLAG_RETURN |
> +                                     PERF_IP_FLAG_INTERRUPT;
> +
> +             break;
> +
> +     case OCSD_GEN_TRC_ELEM_UNKNOWN:
> +     case OCSD_GEN_TRC_ELEM_PE_CONTEXT:
> +     case OCSD_GEN_TRC_ELEM_ADDR_NACC:
> +     case OCSD_GEN_TRC_ELEM_TIMESTAMP:
> +     case OCSD_GEN_TRC_ELEM_CYCLE_COUNT:
> +     case OCSD_GEN_TRC_ELEM_ADDR_UNKNOWN:
> +     case OCSD_GEN_TRC_ELEM_EVENT:
> +     case OCSD_GEN_TRC_ELEM_SWTRACE:
> +     case OCSD_GEN_TRC_ELEM_CUSTOM:
> +     default:
> +             break;
> +     }
> +}
> +
>  static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
>                               const void *context,
>                               const ocsd_trc_index_t indx __maybe_unused,
> @@ -484,6 +650,8 @@ static ocsd_datapath_resp_t 
> cs_etm_decoder__gen_trace_elem_printer(
>               break;
>       }
>  
> +     cs_etm_decoder__set_sample_flags(context, elem);
> +

I was toying with the idea of setting the flags in each of the case statement
found in cs_etm_decoder__gen_trace_elem_printer().  But that would move more
code around and the end result would be the same so let's keep it that way until
we have a good reason to split it.

Mathieu

>       return resp;
>  }
>  
> diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h 
> b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
> index 0d1c18d..71df908 100644
> --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
> +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
> @@ -47,6 +47,7 @@ struct cs_etm_packet {
>       u8 last_instr_taken_branch;
>       u8 last_instr_size;
>       int cpu;
> +     u32 flags;
>  };
>  
>  struct cs_etm_queue;
> -- 
> 2.7.4
> 

Reply via email to