On Sat, 2019-12-07 at 08:01 -0700, Jeff Law wrote: > On Fri, 2019-11-15 at 20:23 -0500, David Malcolm wrote: [...] > > diff --git a/gcc/analyzer/analyzer.cc b/gcc/analyzer/analyzer.cc > > new file mode 100644 > > index 0000000..399925c > > --- /dev/null > > +++ b/gcc/analyzer/analyzer.cc [...] > > + > > +/* Helper function for checkers. Is the CALL to the given > > function > > name? */ > > + > > +bool > > +is_named_call_p (const gcall *call, const char *funcname) > > +{ > > + gcc_assert (funcname); > > + > > + tree fndecl = gimple_call_fndecl (call); > > + if (!fndecl) > > + return false; > > + > > + return 0 == strcmp (IDENTIFIER_POINTER (DECL_NAME (fndecl)), > > funcname); > > +} > > + > > +/* Helper function for checkers. Is the CALL to the given > > function > > name, > > + and with the given number of arguments? */ > > + > > +bool > > +is_named_call_p (const gcall *call, const char *funcname, > > + unsigned int num_args) > > +{ > > + gcc_assert (funcname); > > + > > + if (!is_named_call_p (call, funcname)) > > + return false; > > + > > + if (gimple_call_num_args (call) != num_args) > > + return false; > > + > > + return true; > These seem generic enough to live elsewhere.
There's a potential can of worms here: these functions are used by the checkers to detect fndecls by name, so that checkers can associate behavior with them. Examples: * sm-malloc.c recognizes "malloc" and "free" as being special * sm-signal.c knows that fprintf is async-signal-unsafe (but currently doesn't know about any other fns; it's a proof-of-concept) * sm-file.c currently doesn't know anything about "__fsetlocking", which is one of the reasons the analyzer currently doesn't detect the leak in intl/localealias.c reported as PR 47170, as it assumes that the FILE * might have been closed and stops tracking state for it. etc. So we're going to have a lot more "recognize this function by name" just to finish the existing proof-of-concept checkers. In a perfect world, perhaps we'd have attributes for all of this, and the user's code and their system headers would have dutifully marked the pertinent decls with attributes, and the use of is_named_call_p could be thought of as a bug (or wart)... but we don't live in that perfect world, and a good static analyzer shouldn't need to rely on the user marking their code. There's also integration and chicken-and-egg issues with attributes where if we rely e.g. on the user's libc headers having attributes for the checker, then we need to coordinate with e.g. glibc to add the attributes, and implement them, and then the checker doesn't work if someone is using a different libc, etc, etc. So I think we want a concept of "known functions" in the analyzer, where the analyzer can have its own "on the side" model of APIs - perhaps a mixture of baked-in (e.g. for malloc/free), perhaps from an overridable config file, but I'm not quite sure what form it should take. Maybe even a pragma that lets us tag named functions with attributes, or somesuch. For now, however, I want to take the pragmatic approach, and use these functions as needed (and to review this as we gain experience with the analyzer). So I think I prefer to keep them in the analyzer subdir (but I'm happy to move them if you'd prefer that) Does the above sound sane? > > +} > > + > > +/* Return true if stmt is a setjmp call. */ > > + > > +bool > > +is_setjmp_call_p (const gimple *stmt) > > +{ > > + /* TODO: is there a less hacky way to check for "setjmp"? */ > > + if (const gcall *call = dyn_cast <const gcall *> (stmt)) > > + if (is_named_call_p (call, "_setjmp", 1)) > > + return true; > > + > > + return false; > > +} > > + > > +/* Return true if stmt is a longjmp call. */ > > + > > +bool > > +is_longjmp_call_p (const gcall *call) > > +{ > > + /* TODO: is there a less hacky way to check for "longjmp"? */ > > + if (is_named_call_p (call, "longjmp", 2)) > > + return true; > > + > > + return false; > > +} > I thought we already had routines to detect these. We certainly have > *code* to detect them. If it's the former we really just want one > implementation (that hands the various permutations we've seen > through > the years). If it's the latter, then we probably want to use these > routines to simplify those implementations. We have several: * calls.c has: * setjmp_call_p (const_tree fndecl) * special_function_p which does name matching on "setjmp", "sigsetjmp", "savectx", "vfork", "getcontext" and sets ECF_RETURNS_TWICE (dropping leading single and double underscores) * except.c has "tree setjmp_fn;" used with #ifdef DONT_USE_BUILTIN_SETJMP * omp-low.c has: setjmp_or_longjmp_p (const_tree fndecl) (and maybe more). I reviewed where I'm using the two the patch proposed adding above: * I use is_setjmp_call_p in diagnostic_manager::add_events_for_eedge for PK_BEFORE_STMT on the gcall to capture recording the setjmp buf (so that the event can be cross-referenced at the longjmp call), and in engine.cc for identifying that initial "store" call * I use is_longjmp_call_p in only one place, in engine.cc, for detecting the fn and handling the non-standard control flow As for the other functions identified in calls.c special_function_p: * "sigsetjmp": looks like I can generalize my code to handle this * "savectx": https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71876 suggests it's Solaris-specific. I have no idea of the signature or semantics, so I'm inclined not to handle it in the analyzer. * "vfork": appears to me to be an old performance hack from before COW pages. Semantics appear sufficiently different from setjmp/longjmp that it would need its own special-casing (and isn't a priority for me) * "getcontext"/"setcontext": have slightly different semantics to setjmp/longjmp (e.g. no val is passed like for longjmp; they can fail and set errno); again, would need their own special-casing, and again, these are not a priority for me) I think the requirements for the analyzer here differ from that of the rest of the compiler: the analyzer needs to capture semantics of the fns in terms of the effects on states and paths, which is rather different to that needed by code generation. For example the analyzer's special-casing captures subtleties like "Passing 0 as the val to longjmp leads to setjmp returning 1". So I'm thinking that for the next iteration of the kit I'll drop these two functions in favor of: setjmp_like_p longjmp_like_p to detect setjmp/sigsetjmp and longjmp/siglongjmp, specifically for the analyzer, so that it can trigger the special-casing for these. Does that sound reasonable? > > +/* Generate a label_text instance by formatting FMT, using a > > + temporary clone of the global_dc's printer (thus using its > > + formatting callbacks). > > + > > + Colorize if the global_dc supports colorization and > > CAN_COLORIZE > > is > > + true. */ > > + > > +label_text > > +make_label_text (bool can_colorize, const char *fmt, ...) > > +{ > > + pretty_printer *pp = global_dc->printer->clone (); > > + pp_clear_output_area (pp); > > + > > + if (!can_colorize) > > + pp_show_color (pp) = false; > > + > > + text_info ti; > > + rich_location rich_loc (line_table, UNKNOWN_LOCATION); > > + > > + va_list ap; > > + > > + va_start (ap, fmt); > > + > > + ti.format_spec = _(fmt); > > + ti.args_ptr = ≈ > > + ti.err_no = 0; > > + ti.x_data = NULL; > > + ti.m_richloc = &rich_loc; > > + > > + pp_format (pp, &ti); > > + pp_output_formatted_text (pp); > > + > > + va_end (ap); > > + > > + label_text result = label_text::take (xstrdup (pp_formatted_text > > (pp))); > > + delete pp; > > + return result; > > +} > Is this better in the pretty-printer infrastructure? This touches on another can of worms... it's rather clunky to be cloning the printer and using it to generate a temporary string. I've been thinking that maybe diagnostic_event::get_desc should instead print to a pp, which might avoid a lot of printer cloning, but I've attempted that before, unsuccessfully. I guess I'll try again; it feels cleaner (I think it required cloning the printer if we're going to use it from within a rich_location label, since otherwise the printing of the description interferes with that of diagnostic_show_locus, and we end up with garbled text). Dave