On Wed, Feb 28, 2024 at 4:14 PM David Malcolm <dmalc...@redhat.com> wrote:
>
> On Wed, 2024-02-28 at 08:58 +0100, Richard Biener wrote:
> > On Tue, Feb 27, 2024 at 10:20 PM Robert Dubner <rdub...@symas.com>
> > wrote:
> > >
> > > Richard,
> > >
> > > Thank you very much for your comments.
> > >
> > > When I set out to create the capability, I had a "specification" in
> > > mind.
> > >
> > > I didn't have a clue how to create a GENERIC tree that could be fed
> > > to the
> > > middle end in a way that would successfully result in an
> > > executable.  And I
> > > needed to be able to do that in order to proceed with the project
> > > of
> > > creating a COBOL front end.
> > >
> > > So, I came up with the idea of using GCC to compile simple
> > > programs, and to
> > > hook into the compiler to examine the trees fed to the middle end,
> > > and to
> > > display those trees in the human-readable format I needed to
> > > understand
> > > them.  And that's what I did.
> > >
> > > My first incarnation generated pure text files, and I used that to
> > > get
> > > going.
> > >
> > > After a while I realized that when I used the output file, I was
> > > spending a
> > > lot of time searching through the text files.  And I had the
> > > brainstorm!
> > > Hyperlinks!  HTML files!  We have the technology!  So, I created
> > > the .HTML
> > > files as well.
> > >
> > > I found this useful to the point of necessity in order to learn how
> > > to
> > > generate the GENERIC trees.  I believe it would be equally useful
> > > to the
> > > next developer who, for whatever reason, needs to understand, on a
> > > "You need
> > > to learn the alphabet before you can learn how to read" level, what
> > > the
> > > middle end requires from a GENERIC tree generated by a front end.
> > >
> > > But I've never used it on a complex program. I've used it only to
> > > learn how
> > > to create the GENERIC nodes for very particular things, and so I
> > > would use
> > > the -fdump-generic-nodes feature on a very simple C program that
> > > demonstrated, in isolation, the feature I needed.  Once I figured
> > > it out, I
> > > would create front end C routines or macros that used the
> > > tree.h/tree.cc
> > > features to build those GENERIC trees, and then I would move on.
> > >
> > > I decided to offer it up here, in order to to learn how to create
> > > patches
> > > and to get
> > > to know the people and the process, as well as from the desire to
> > > share it.
> > > And instantly I got the "How about a machine-readable format?"
> > > comments.
> > > Which are reasonable.  So, because it wasn't hard, I hacked at the
> > > existing
> > > code to create a JSON output.  (But I remind you that up until now,
> > > nobody
> > > seems to have needed a JSON representation.)
> > >
> > > And your observation that the human readable representation could
> > > be made
> > > from the JSON representation is totally accurate.
> > >
> > > But that wasn't my specification.  My specification was "A tool so
> > > that a
> > > human being can examine a simple GENERIC tree to learn how it's
> > > done."
> > >
> > > But it seems to me that we are now moving into the realm of a new
> > > specification.
> > >
> > > Said another way:  To go from "A human readable representation of a
> > > simple
> > > GENERIC tree" to "A machine readable JSON representation of an
> > > arbitrarily
> > > complex GENERIC tree, from which a human readable representation
> > > can be
> > > created" means, in effect, starting over on a different project
> > > that I don't
> > > need.  I already *have* a project that I am working on -- the COBOL
> > > front
> > > end.
> > >
> > > The complexity of GENERIC trees is, in my experienced opinion, an
> > > obstacle
> > > for the creation of front ends.  The GCC Internals document has a
> > > lot of
> > > information, but to go from it to a front end is like using the
> > > maintenance
> > > manual for an F16 fighter to try to learn to fly the aircraft.
> > >
> > > The program "main(){}" generates a tree with over seventy nodes.  I
> > > see no
> > > way to document why that's true; it's all arbitrary in the sense
> > > that "this
> > > is how GCC works".  -fdump-generic-nodes made it possible for me to
> > > figure
> > > out how those nodes are connected and, thus, how to create a new
> > > front end.
> > > I figure that other developers might find it useful, as well.
> > >
> > > I guess I am saying that I am not, at this time, able to work on a
> > > whole
> > > different tool.  I think what I have done so far does something
> > > useful that
> > > doesn't seem to otherwise exist in GCC.
> > >
> > > I suppose the question for you is, "Is it useful enough?"
> > >
> > > I won't be offended if the answer is "No" and I hope you won't be
> > > offended
> > > by my not having the bandwidth to address your very thoughtful and
> > > valid
> > > observations about how it could be better.
> >
> > No offense taken - I did realize how useful this was to you (and
> > specifically
> > the hyper-linking looked even very useful to me!).  I often lament
> > the lack
> > of domain-specific visualization tools for the various data
> > structures GCC
> > has - having something for GENERIC would be very welcome.
> >
> > We have for example ways to dump graphviz .dot format graphs of the
> > CFG
> > and some other data structures and do that natively, not via JSON
> > indirection.
>
> FWIW for GCC 15 I've been experimenting with adding a
> text_art::tree_widget class; with that, the analyzer can visualize an
> ana::program_state instance like this (potentially with colorization in
> suitable terminals):
>
> State
> ├─ Region Model
> │  ├─ Current Frame: frame: ‘test_7’@1
> │  ├─ Store
> │  │  ╰─ root region
> │  │     ╰─ (*INIT_VAL(a_10(D)))
> │  │        ╰─ bytes 12-15: ‘int’ {(int)42}
> │  ╰─ Constraints
> │     ├─ Equivalence class ec0
> │     │  ├─ (void *)0B
> │     │  ╰─ ‘0B’
> │     ├─ Equivalence class ec1
> │     │  ╰─ INIT_VAL(a_10(D))
> │     ├─ Equivalence class ec2
> │     │  ╰─ (INIT_VAL(a_10(D))+(sizetype)12)
> │     ├─ ec1: {INIT_VAL(a_10(D))} != ec0: {(void *)0B == [m_constant]‘0B’}
> │     ╰─ ec2: {(INIT_VAL(a_10(D))+(sizetype)12)} != ec0: {(void *)0B == 
> [m_constant]‘0B’}
> ╰─ ‘malloc’ state machine
>    ╰─ 0x62082e0: (INIT_VAL(a_10(D))+(sizetype)12): assumed-non-null (in 
> frame: ‘test_7’@1)
>
> and such visualizations could be added for other hierarchical data
> structures.  Also, because it uses text_art::widget, the content at a
> tree node doesn't have to be purely textual, and we could do things
> like the following (which is a mockup):
>
> State
> ├─ Region Model           Bound value │ Effective value
> │  ├─ Stack                           │
> │  │  ├─ frame@0 'foo'                │
> │  │  │  ├─ 'i'           (int)42     │
> │  │  │  ╰─ 'j'                       │ UNINIT(int)
> │  │  ╰─ frame@1 'bar'                │
> │  │     ╰─ 'k'                       │
> │  │        ╰─ [3]                    │
> │  │           ├─ .x      INIT_VAL(p) │
> │  │           ╰─ .y      INIT_VAL(q) │
> │  ╰─ Globals                         │
> │     ╰─ 'baz'                        │
> ├─ Constraints
> │  ╰─ (etc)
> ╰─ 'malloc' state
>    ╰─ CONJURED_VALUE('ptr') unchecked('free')
>
>
> That said, our "tree" structure is arguably a directed graph rather
> than a tree (consider e.g. types)
>
> >
> > Incidentially this looks like something fit for a google summer of
> > code project.
> > Ideally it would hook into print-tree.cc providing an alternate
> > structured output.
> > It currently prints in the style
> >
> >  <function_decl 0x7ffff71bc600 bswap16
> >     type <function_type 0x7ffff71ba5e8
> >         type <integer_type 0x7ffff702b540 short unsigned int public
> > unsigned HI
> >             size <integer_cst 0x7ffff702d108 constant 16>
> >             unit-size <integer_cst 0x7ffff702d120 constant 2>
> >             align:16 warn_if_not_align:0 symtab:0 alias-set -1
> > canonical-type 0x7ffff702b540 precision:16 min <integer_cst
> > 0x7ffff702d138 0> max <integer_cst 0x7ffff702d0f0 65535>>
> >         QI
> >         size <integer_cst 0x7ffff702d048 constant 8>
> >         unit-size <integer_cst 0x7ffff702d060 constant 1>
> > ...
> >
> > where you can see it follows tree -> tree edges up to some depth
> > (and avoids repeated expansion).  When debugging that's all I have
> > and I have to follow edges by matching up the raw addresses printed,
> > re-dumping those that didn't get expanded.  HTML would be indeed
> > so much nicer here (and a more complete output).
> >
> > From a maintainance point I think it's important to have "dump a tree
> > node"
> > once, so when fields are added or deemed useful for presenting in a
> > dump
> > you don't have to chase down more than one place.  Maintenance is
> > also
> > the reason to not simply accept your contribution as-is.
>
> Presumably we'd want some kind of "visitor" code that captures visiting
> the fields of each type in one place, allowing for different consumers
> of this data (HTML generation, JSON generation etc).  Though I'm not
> sure how best to express this double-dispatch (by templates, vfunc, or
> whatnot).
>
> We already have some of this in the form of gengtype and what it
> generates, but I suspect we don't want to add further reliance on
> gengtype.
>
> >
> > I do hope this eventually gets picked up.  I've added a project idea
> > to https://gcc.gnu.org/wiki/SummerOfCode
> > and would be willing to mentor it.
>
> FWIW you added it to the "Other projects and project ideas" below
> "Other Project Ideas", where it's much less likely to get noticed than
> under the "Selected Project Ideas".

Yeah, I didn't feel it was "Selected", but maybe that doesn't mean
anything?

> >
> > Oh, and I'm looking forward to the actual Cobol work!
>
> Likewise
>
> [...snip...]
>
> Dave
>

Reply via email to