On Fri, 2016-10-07 at 12:38 +0200, Bernd Schmidt wrote: > On 10/06/2016 09:53 PM, David Malcolm wrote: > > A benefit of keeping the INSN_UIDs is that if you've spent half an > > hour > > single-stepping through RTL modification and know that INSN_UID > > 1045 is > > the one that gets corrupted, you can have that insn have UID 1045 > > in > > the testcase. Otherwise the UIDs get determined by the parser, and > > can > > change if other insns get inserted. > > That's true, but you wouldn't construct a testcase until you're done > debugging, and this reason is lost for any testcase that has been > committed to the repository for more than say a week. I think we want > a > format that is lean and abstracts away unnecessary details. > > > > Here's a revised version of that example > > (gcc/testsuite/rtl.dg/aarch64/pr71779.rtl), with the things you > > mention > > removed, and with some indentation to aid readability: > > > > ;; { dg-do compile { target aarch64-*-* } } > > ;; { dg-options "-fsingle-pass=rtl-cse1 -fdump-rtl-cse1" } > > > > (function "fragment" > > (insn-chain > > ;; MEM[(struct isl_obj *)&obj1] = &isl_obj_map_vtable; > > (insn 2 (set (reg:SI 480) > > (high:SI (symbol_ref:SI ("isl_obj_map_vtable") > > [flags 0xc0] <var_decl > > isl_obj_map_vtable>))) > > y.c:12702) > > (insn 2 (set (reg/f:SI 479) > > (lo_sum:SI (reg:SI 480) > > (symbol_ref:SI ("isl_obj_map_vtable") > > [flags 0xc0] > > <var_decl 0x7fa0363ea240 > > isl_obj_map_vtable>))) > > y.c:12702 > > (expr_list:REG_EQUAL (symbol_ref:SI ("isl_obj_map_vtable") > > [flags 0xc0] > > <var_decl isl_obj_map_vtable>))) > > (insn 2 (set (reg:DI 481) > > (subreg:DI (reg/f:SI 479) 0)) y.c:12702) > > (insn 2 (set (zero_extract:DI (reg/v:DI 191 [ obj1D.17368 ]) > > (const_int 32 [0x20]) > > (const_int 0 [0])) > > (reg:DI 481)) y.c:12702) > > ;; Extra insn, to avoid all of the above from being deleted by > > DCE > > (insn 2 (set (mem:DI (reg:DI 191) [1 i+0 S4 A32]) > > (const_int 1 [0x1]))) > > > > ) ;; insn-chain > > ) ;; function > > > > This is with the optional cfg and crtl directives omitted. > > > > Does the above look acceptable? > > One thing this doesn't show is CODE_LABELs, which will need some sort > of > unique identifier.
Note that all of my examples are hand-edited, so there may be mistakes, inconsistencies, etc. We could simply print the INSN_UID for CODE_LABELs; something like this (see the "(code_label 16" below): (jump_insn (set (pc) (if_then_else (ge (reg:CCGC 17 flags) (const_int 0 [0])) (label_ref 16) (pc))) "test.c":3 -> 16) (note [bb 4] NOTE_INSN_BASIC_BLOCK) (insn (set (reg:SI 90) (mem/c:SI (plus:DI (reg/f:DI virtual-stack-vars) (const_int -12 [0xfffffffffffffff4])) [1 k+0 S4 A32])) "test.c":4 ) (insn (parallel [ (set (reg:SI 87 [ _1 ]) (plus:SI (reg:SI 90) (const_int 4 [0x4]))) (clobber (reg:CC 17 flags)) ]) "test.c":4 (expr_list:REG_EQUAL (plus:SI (mem/c:SI (plus:DI (reg/f:DI virtual-stack-vars) (const_int -12 [0xfffffffffffffff4])) [1 k+0 S4 A32]) (const_int 4 [0x4])) )) (jump_insn (set (pc) (label_ref 20)) "test.c":4 -> 20) (barrier) (code_label 16 [1 uses]) (note [bb 5] NOTE_INSN_BASIC_BLOCK) Should we drop the "[1 uses]" in the code_label? > There's still the 0x7fa0363ea240 for the var-decl. You want the > dump-noaddr flag. (indeed; I'm hand-editing the examples) > > The "2" values after each "insn" are the basic block indices. > > Should > > these be omitted also? > > I think they probably don't buy us much if we have block notes. Unfortunately, the (code_label 16) above is in basic block *5*, not basic_block 4 i.e. as you traverse the insn chaing the change of basic block can happen before the NOTE_INSN_BASIC_BLOCK. There is a barrier and a code_label, but even so, this seems rather counter-intuitive to me. You appear to have trimmed the idea of enclosing the insns with (basic-block) directives without commenting on it. Did you like this idea? It would make the above look like: (basic-block 2 ;; insns snipped (jump_insn (set (pc) (if_then_else (ge (reg:CCGC 17 flags) (const_int 0 [0])) (label_ref 16) (pc))) "test.c":3 -> 16) ) ;; basic-block 2 (basic-block 4 (note [bb 4] NOTE_INSN_BASIC_BLOCK) ;; insns snipped (jump_insn (set (pc) (label_ref 20)) "test.c":4 -> 20) ) ;; basic-block 4 (barrier) (basic-block 5 (code_label 16 [1 uses]) (note [bb 5] NOTE_INSN_BASIC_BLOCK) ;; etc ) ;; basic-block 5 Note how the above format expresses clearly that: * the (barrier) is part of the insn chain, but not in a basic block, and * some insns can happen in a basic block Taking this idea further: if we have (basic-block) directives integrated into the insn-chain like this, we could express the CFG by adding (edge) directives. Here's a suggestion for doing it with (edge-from) and (edge-to) directives, expressing the predecessor and successor edges in the CFG, along with : (basic-block 2 (edge-from 0) ;; insns snipped (jump_insn (set (pc) (if_then_else (ge (reg:CCGC 17 flags) (const_int 0 [0])) (label_ref 16) (pc))) "test.c":3 -> 16) (edge-to 4 (flags "FALLTHRU")) (edge-to 5 (flags "")) ) ;; basic-block 2 (basic-block 4 (edge-from 2 (flags "FALLTHRU")) (note [bb 4] NOTE_INSN_BASIC_BLOCK) ;; some insns snipped (jump_insn (set (pc) (label_ref 20)) "test.c":4 -> 20) (edge-to 6 (flags "")) ) ;; basic-block 4 (barrier) (basic-block 5 (edge-from 2 (flags "")) (code_label 16 [1 uses]) (note [bb 5] NOTE_INSN_BASIC_BLOCK) ;; some insns snipped (edge-to 6 (flags "")) ) ;; basic-block 5 Do you like this approach? It could also support insns that are on edges (though I don't know if that's something we need to support in dumps). Should we spell "0" and "1" as "entry" and "exit" when parsing/dumping basic block indices? e.g.: (basic-block 2 (edge-from entry) > What about file names/line numbers? We'll probably want some tests > for > these, but for most others they will be irrelevant. Yeah. We cover this below. > I think maybe you want a separate compact form of insns and notes > (cinsn/cnote maybe), with a flag selecting which gets written out in > the > dumper. The reader could then read them in like any other rtx code, > and > there'd be a postprocessing stage to reconstruct the insn chain. By this separate compact form, do you mean the form we've been discussing above, with no INSN_UID/PREV/NEXT, etc? Or something else? As for "cinsn", "cnote", how about just "insn" and "note", and having the compactness be expressed at the top of the dump e.g. implicitly by the presence of a "(function" directive. Could even have a format version specifier in the function directive, to give us some future -proofing e.g. (function (format 20161007) or somesuch. > > My hope here is to that the input format is something that existing > > gcc > > developers will be comfortable reading, which is why I opted for > > parsing the existing output - and I'm nervous about moving away too > > much from the existing format. > > Hey, I'm as existing a gcc developer as they come, and I don't much > like > staring at unpruned rtl dumps. They've gotten worse over the years > too. Fair enough :) Do you want to want to try hand-edited a test case, using some of the format ideas we've been discussing? That might suggest further improvements to the format. > > Some other possible changes: locations could do with a wrapper, > > and/or > > to quote the filename, since otherwise we can't cope with spaces in > > filenames). So instead of: > > > > (insn 2 (set (reg:DI 481) > > (subreg:DI (reg/f:SI 479) > > 0)) y.c:12702) > > > > we could have: > > > > (insn 2 (set (reg:DI 481) > > (subreg:DI (reg/f:SI 479) 0)) "y.c":12702) > > > > or: > > > > (insn 2 (set (reg:DI 481) > > (subreg:DI (reg/f:SI 479) > > 0)) (srcloc "y.c" 12702)) > > > > Any preferences on these? > > I suspect just putting it in quotes is good enough. Might be worth > doing > that in general. Good. Let's quote the filenames. > > > There was also a testcase where virtual regs still occurred with > > > register number, I think it would be best to disallow this and > > > add > > > the > > > ability to parse "(reg:DI virtual-outgoing-args)". > > > > When we discussing pseudos you expressed a need to keep printing > > the > > regno for REGs, so that the dump format is usable in the debugger, > > for > > array indices etc. Hence I'd prefer to keep printing the regno for > > all > > regnos, including virtuals. > > This comes back to different needs for long-lived testcases vs. > debugging something. As suggested above, I think you need a flag for > different output formats, and for vregs the ability to parse "(reg:M > virtual-something)". OK. If so, do we need to print the regno for hard registers? Or should we just print the name for those, for consistency with virtual regs? (i.e. have it be controlled by the same flag?) > > > Also I think I'd prefer testcases submitted separately from the > > > RTL > > > frontend. It seems we have two different frontend approaches > > > here, > > > one > > > RTL frontend and one modification for the C frontend - which do > > > you > > > prefer? > > > > Is it acceptable to do both? I'd like to have an actual RTL > > frontend, > > though this punts on handling structs etc, whereas Richi seems to > > prefer the modification to the C frontend, since it better supports > > structs, aliasing, etc. At this point, whichever gets me past > > patch > > review... > > Does the C one have an advantage in terms of meaningful decls in > MEM_ATTRs/SYMBOL_REFs? Probably. Thanks Dave