Hi, Pierrick,

October 23, 2024 at 5:16 PM, "Pierrick Bouvier" wrote:
> 
> Hi Julian,
> 
> On 10/23/24 05:56, Julian Ganz wrote:
> 
> >  October 22, 2024 at 11:15 PM, "Pierrick Bouvier" wrote:
> > 
> > > 
> > > On 10/22/24 01:21, Julian Ganz wrote:
> > > 
> > 
> >  Ok, I'll introduce an enum and combine the three callbacks in the next
> >  iteration then.
> >  typedef struct {
> >  enum qemu_plugin_cf_event_type ev;
> >  union {
> >  data_for_interrupt interrupt;
> >  data_for_trap trap;
> >  data_for_semihosting semihosting;
> >  } qemu_plugin_cf_event;
> >  /* data_for_... could contain things like from/to addresses, interrupt id, 
> > ... */
> > 
> >  I don't think this is a good idea.
> >  Traps are just too diverse, imo there is too little overlap between
> >  different architectures, with the sole exception maybe being the PC
> >  prior to the trap. "Interrupt id" sounds like a reasonably common
> >  concept, but then you would need to define a mapping for each and every
> >  architecture. What integer type do you use? In RISC-V, for example,
> >  exceptions and interrupt "ids" are differentiated via the most
> >  significant bit. Dou keep that or do you zero it? And then there's
> >  ring/privilage mode, cause (sometimes for each mode), ...
> > 
> > > 
> > > I didn't want to open the per architecture pandora box :).
> > >  I don't think the plugin API itself should deal with per architecture
> > >  details like meaning of a given id. I was just thinking to push this 
> > > "raw" information to the plugin, that may/may not use architecture 
> > > specific knowledge to do its work. We already have plugins that have 
> > > similar per architecture knowledge (contrib/plugins/howvec.c) and it's ok 
> > > in some specific cases.
> > > 
> >  But how would such an interface look? The last PC aside, what would you
> >  include, and how? A GArray with named items that are itself just opaque
> >  blobs?
> > 
> I was not thinking about a new interface for this. Having the "raw" interrupt 
> id is enough for a plugin to do useful things, by having knowledge of which 
> architecture it's instrumenting.

But what is would the "raw" interrupt id even be for a given
architecture? I don't think you can answer this question with "obviously
this _one_ integer" for all of them.

> > 
> > And what would be the benefit compared to just querying the respective
> >  target specific registers through qemu_plugin_read_register? Which btw.
> >  is what we were going to do for our use-case. Even the example you
> >  brought up (howvec) uses querying functions and doesn't expect to get
> >  all the info via parameters.
> > 
> You're right, but it's because it's querying instruction data.
> I may be wrong on that, but at translation time, we may or may not be 
> interested in accessing tb/insn data.
> 
> However, for control flow analysis, beyond a simple counting plugin, we 
> probably want to access further data almost everytime.
> 
> I see it closer from syscall instrumentation, which pushes the syscall id, 
> and all register values, instead of letting the user poke it. Makes more 
> sense compared to that?

Yes, but then you are in "GArray of named, potentially complex value"
terretory again. And the comparison with syscalls also falls apart when
you consider that, for syscalls, they are well defined and enumerated
identically for at least a variety of targets, while the same kind of
"enumeration", if it even exists, is in completely different order for
every architecture.

> > 
> > > 
> > > But having something like from/to address seems useful to start. Even if 
> > > we don't provide it for all events yet, it's ok.
> > > 
> >  Yes, I certainly see the advantages of having either the last PC or the
> >  would-be-next PC as they are sufficiently universal. You can usually
> >  retrieve them from target-specific registers, but that may be more
> >  complicated in practice. In the case of RISC-V for example, the value
> >  of the EPC differs between interrupts and exceptions.
> > 
> To the opposite of interrupt id, a PC is something universal by definition, 
> and with a single meaning across architecture. However, accessing it by name 
> varies per architecture, and even per sub events, as you are stating for 
> RISC-V.

Yes. And for that very reason I would not pass "the EPC" to a callback
but a clearly, target agnostic, defined value such as:

| The PC of the instruction that would have been executed next, were it
| not for that event

or

| The PC of the instruction that was executed befroe the event occurred

And unlike interrupt ids, the plugin API already has a precedent for
what type to use: uint64_t

Regards,
Julian

Reply via email to