On Thu, May 12, 2016 at 4:29 PM, David Malcolm <dmalc...@redhat.com> wrote: > On Wed, 2016-05-11 at 11:57 +0200, Richard Biener wrote: >> On Wed, May 11, 2016 at 3:31 AM, Trevor Saunders < >> tbsau...@tbsaunde.org> wrote: >> > On Tue, May 10, 2016 at 05:01:00PM -0400, David Malcolm wrote: >> > > [CCing Prasad since this may be useful for his gimple FE work, by >> > > replacing "rtl" with "gimple" in the patch] >> > > >> > > On Mon, 2016-05-09 at 11:44 +0200, Richard Biener wrote: >> > > > On Wed, May 4, 2016 at 10:49 PM, David Malcolm < >> > > > dmalc...@redhat.com> >> > > > wrote: >> > > >> > > > > This patch kit introduces an RTL frontend, for the purpose >> > > > > of unit testing: primarly for unit testing of RTL passes, and >> > > > > possibly for unit testing of .md files. >> > > > > >> > > > > It's very much a work-in-progress; I'm posting it now to get >> > > > > feedback. >> > > >> > > [...snip...] >> > > >> > > > > * The RTL frontend doesn't have any knowledge of the name of >> > > > > the >> > > > > function, >> > > > > of parameters, types, locals, globals, etc. It creates a >> > > > > single >> > > > > function. >> > > > > The function is currently hardcoded to have this signature: >> > > > > >> > > > > int test_1 (int, int, int); >> > > > > >> > > > > since there's no syntax for specify otherwise, and we need >> > > > > to >> > > > > provide >> > > > > a FUNCTION_DECL tree when building a function object (by >> > > > > calling >> > > > > allocate_struct_function). >> > > > > >> > > > > * Similarly, there are no types beyond the built-in ones; all >> > > > > expressions >> > > > > are treated as being of type int. I suspect that this >> > > > > approach >> > > > > will be too simplistic when it comes to e.g. aliasing. >> > > > >> > > > To address this and the previous issue I suggest to implement >> > > > the RTL >> > > > FE >> > > > similar to how I proposed the GIMPLE FE - piggy-back on the C >> > > > FE and >> > > > thus >> > > > allow >> > > > >> > > > int __RTL foo (int a, int b) // gets you function decl and >> > > > param >> > > > decls >> > > > { >> > > > (insn ...) >> > > > ... >> > > > >> > > > } >> > > > >> > > > int main() >> > > > { >> > > > if (foo (1) != 0) >> > > > abort (); >> > > > } >> > > > >> > > > That would also allow dg-do run testcases and not rely solely >> > > > on >> > > > scanning >> > > > RTL dumps. >> > > >> > > The following is an attempt at implementing this, by adding a new >> > > "__RTL" keyword, and detecting it in the C frontend, switching >> > > to a custom parser for the function body. >> > > >> > > Does this look like the kind of thing you had in mind for the >> > > RTL and gimple "frontends"? >> >> Yes! >> >> > > Wiring this up to the existing RTL parser might be awkward: the >> > > existing RTL parser in read-md.c etc works at the level of >> > > characters (read_char and unread_char from a FILE *), whereas the >> > > C frontend is in terms of tokens. >> > > >> > > I have a patch that ports the RTL parser to using libcpp for >> > > location-tracking, and another that updates it to use libcpp >> > > for diagnostics. This adds more dependency on a build-time >> > > libcpp to the gen* tools. Both patches are currently messy. >> > > Potentially I could build on them and attempt to update the RTL >> > > parser further, to use libcpp's tokenizer. >> > > >> > > Does that general approach sound sane? In particular: >> > > - is it sane to eliminate errors.c in favor of building >> > > diagnostics*.c for the build machine as well as the host machine? >> > > - is it sane to rewrite the read-md.c/read-rtl.c code to >> > > a token-based approach, using libcpp? >> > > >> > > Alternatively, the shorter term approach would be to kludge >> > > in reading from a FILE * in read-md.c based on where the >> > > C parser has got to, with a hybrid of the two approaches >> > > (character-based vs token-based). >> > >> > Another option is to make read-md.c use tokens, but instead of >> > building >> > libcpp for the build machine write a new token parser that is text >> > compatible with the libcpp one, but just enough to do what read >> > -md.c >> > needs. However that seems silly, and suggests just using libcpp is >> > the >> > sane thing to do :) >> > >> > > Thoughts? >> > >> > I'm not aware of any pitfalls in using libcpp in build tools, >> > though it >> > does seem slightly unfortunate to need to build so much build tool >> > stuff. >> > >> > Thinking about this I wonder if libcpp would be useful in gengtype >> > to >> > get around some of the wierdness with headers (and maybe even >> > languages?) but that doesn't need to be thought about now. >> >> genmatch also uses libcpp, it's really convenient for the diagnostics >> as well. >> >> That said, another kludge would be to simply use cpp_token_as_text >> (see genmatch.c:c_expr::gen_transform), write the whole function >> to a temporary file and parse that back in with read_md ;) >> >> In my mind sth to continue prototyping is more important than to >> clean >> this piece up righ now. > > Thanks. > > Given that I have a semi-working "rtl1" and "make check-rtl", I've > moved the parser out of the gcc/rtl directory and into a new > gcc/read-rtl-function.c > so that it can be shared between cc1 and rtl1, using "rtl1" as a > testbed whilst I bring up the __RTL idea within cc1. > > I tried the idea of writing the tokens out to a tempfile, but it didn't > work well: the existing parser I have is very sensitive to whitespace > (for dealing with all of the non-standard syntax in print-rtl.c) and > once we just have tokens we've lost that whitespace information.
Ah, indeed. But you know whether there was whitespace before a token at least... > So I tried a different approach: record the locations of the opening > and closing brace of the __RTL function body, consume all of the > tokens, and then reopen the file as a FILE * and run the RTL parser on > it, filtering the results of read_char () so it returns only whitespace > until the open brace, and sees only EOF from the closing brace onwards. I guess that works as well unless you get input from stdin or the parseable code results from macro expansion. > This seems to work well enough for now, and my prototype is able to > have cc1 read in an existing RTL dump (with the C function wrapper) and > then (as a test) dump it out again; an example input file can be seen > at: > https://dmalcolm.fedorapeople.org/gcc/2016-05-12/gcc-git-rtl-frontend-v13/0022-FIXME-add-test-case.patch > > The prototype parses the RTL as the C is being parsed. Another way to > do it might be to delay parsing the RTL until the function reaches a > desired RTL pass: have a dummy tree for the body of the function, and > compile the function normally (perhaps marking it noinline noclone or > somesuch). On reaching the pass we wish to test, we could then parse > the RTL, replacing whatever RTL we already had. (I don't know if this > is a good idea, but it seems worth mentioning here). What would be the benefit of doing this though? I guess parsing might rely on all declarations parsed (so you have more freedom in placing the __RTL)? For the GIMPLE FE I'd like to annotate functions with an attribute telling the pass manager where to start/end compilation. For the RTL FE you could tell it to start at a random RTL pass and make a missing PROP_rtl start parsing... > One wart I ran into is that system.h has this: > > /* Front ends should never have to include middle-end headers. Enforce > this by poisoning the header double-include protection defines. */ > #ifdef IN_GCC_FRONTEND > #pragma GCC poison GCC_RTL_H GCC_EXCEPT_H GCC_EXPR_H > #endif > > i.e. the idea of running RTL code from inside the C frontend seems to > be banned. > > For now I simply removed that pragma. Presumably I can avoid it > properly by having c/c-parser.c call into something in the gcc > directory that isn't compiled with -DIN_GCC_FRONTEND. Yeah, or do the delayed parsing (though that makes you need to implement your own symbol lookup for parsing things like MEM_EXPRs). > Next steps: > * I plan to try to get single passes running, and some form of round > -trip dump test. > * I'll also try to split out some of the work into generic cleanups of > read-md.c that I hope can go into trunk independently of this work. > > Hope this sounds sane yeah. Thanks, Richard. > Dave > > FWIW I've backed up the latest version of the (very messy) patch kit > to: > https://dmalcolm.fedorapeople.org/gcc/2016-05-12/gcc-git-rtl-frontend-v13/ > It's on top of r235783, and has various unrelated stuff in it. >