gamesh411 added inline comments.
================ Comment at: clang/docs/analyzer/checkers.rst:2358 Default sources defined by ``GenericTaintChecker``: -``fdopen``, ``fopen``, ``freopen``, ``getch``, ``getchar``, ``getchar_unlocked``, ``gets``, ``scanf``, ``socket``, ``wgetch`` + ``_IO_getc``, ``fdopen``, ``fopen``, ``freopen``, ``get_current_dir_name``, ``getch``, ``getchar``, ``getchar_unlocked``, ``getcw``, ``getcwd``, ``getgroups``, ``gethostname``, ``getlogin``, ``getlogin_r``, ``getnameinfo``, ``getopt``, ``getopt_long``, ``getopt_only``, ``gets``, ``getseuserbyname``, ``readlink``, ``scanf``, ``scanf_s``, ``socket``, ``wgetch`` ---------------- steakhal wrote: > typo/dup? > I cannot recognize the `getcw()` call. Could you please refer to the > specification or an instance where it was defined? `getwd` is the right one instead of `getcw` ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:546-548 {{"gets"}, TR::Source({{0}, ReturnValueIndex})}, {{"scanf"}, TR::Source({{}, 1})}, + {{"scanf_s"}, TR::Source({{}, {1}})}, ---------------- steakhal wrote: > If we handle `gets`, `scanf`, we should also model the `*_s` versions as well. > ```lang=C > char *gets_s(char *s, rsize_t n); > int scanf_s(const char *restrict format, ...); > int fscanf_s(FILE *restrict stream, const char *restrict format, ...); > int sscanf_s(const char *restrict s, const char *restrict format, ...); > int vscanf_s(const char *restrict format, va_list arg); > int vfscanf_s(FILE *restrict stream, const char *restrict format, va_list > arg); > int vsscanf_s(const char *restrict s, const char *restrict format, va_list > arg); > ``` I have added gets_s, and will add the _s variants for the others in the other patch that deals with the propagatorsj. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:550-552 + {{"getopt"}, TR::Source({{ReturnValueIndex}})}, + {{"getopt_long"}, TR::Source({{ReturnValueIndex}})}, + {{"getopt_long_only"}, TR::Source({{ReturnValueIndex}})}, ---------------- steakhal wrote: > IMO these functions are highly domain-specific. > On errors, they return specific/well-defined values e.g. `-1`, `'?'` or `':'`. > That being said, the analyzer does not have this knowledge, thus it will > model these as `conjured` symbols. > If these values were modeled as tainted, we would likely introduce the number > of false-positives regarding our limited capabilities of modeling the > function accurately. > > tldr; I'm against these three rules; or alternatively prove that my concerns > are not issues on real code bases. I agree that the handling of these should be in another checker, I remember some false positives ( mainly uninteresting, typical "just won't fix" errors ) relating to `getopt`, and a domain-specific checker could be more appropriate here. Removed them. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:556 + {{"getwd"}, TR::Source({{0, ReturnValueIndex}})}, + {{"readlink"}, TR::Source({{1, ReturnValueIndex}})}, + {{"get_current_dir_name"}, TR::Source({{ReturnValueIndex}})}, ---------------- steakhal wrote: > We should check `readlinkat` as well. Added ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:559 + {{"gethostname"}, TR::Source({{0}})}, + {{"getnameinfo"}, TR::Source({{2, 4}})}, + {{"getseuserbyname"}, TR::Source({{1, 2}})}, ---------------- steakhal wrote: > In what cases can this function introduce taint? The getnameinfo converts from ``` struct sockaddr_in { sa_family_t sin_family; /* address family: AF_INET */ in_port_t sin_port; /* port in network byte order */ struct in_addr sin_addr; /* internet address */ }; /* Internet address */ struct in_addr { uint32_t s_addr; /* address in network byte order */ }; ``` to hostname and servername strings. One could argue that by crafting a specific IP address, that is known to resolve to a specific hostname in the running environment could lead an attacker injecting a chosen (in some circumstances arbitrary) string into the code at the point of this function. I know this is a bit contrived, and more on the cybersecurity side of things, so I am not sure whether to add this here, or add this in a specific checker, or just leave altogether. Please share your opinion about this. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:561 + {{"getseuserbyname"}, TR::Source({{1, 2}})}, + {{"getgroups"}, TR::Source({{1}})}, + {{"getlogin"}, TR::Source({{ReturnValueIndex}})}, ---------------- steakhal wrote: > >On success, `getgroups()` returns the number of supplementary group IDs. On > >error, -1 is returned, and `errno` is set appropriately. > > According to this, the return value index should be also tainted. Added ================ Comment at: clang/test/Analysis/taint-generic.c:385 + struct option long_opts[] = {{0, 0, 0, 0}}; + int opt = getopt_long_only(argc, argv, "a:b:02", long_opts, &option_index); + return 1 / opt; // expected-warning {{Division by a tainted value, possibly zero}} ---------------- steakhal wrote: > Well, this can never return zero. > What we should do is to do a state-split for the failure case; since the > application should definitely handle a failure in this part; thus the split > would be justified. Removed ================ Comment at: clang/test/Analysis/taint-generic.c:392 +int underscore_IO_getc_is_source(_IO_FILE *fp) { + char c = _IO_getc(fp); + return 1 / c; // expected-warning {{Division by a tainted value, possibly zero}} ---------------- steakhal wrote: > Sometimes we taint the `fd` and propagate based on that, and othertimes, we > simply just return taint. > However, I think it still looks like a better tradeoff this way. > I just wanted to highlight this. Maybe a comment on the CallDescription would > be beneficial in describing this discrepancy. Added a comment ================ Comment at: clang/test/Analysis/taint-generic.c:417 +char *get_current_dir_name(void); +int get_current_dir_name_is_source() { + char *d = get_current_dir_name(); ---------------- steakhal wrote: > Please avoid spelling the given function in the name of the test directly in > a verbatim manner. > If we would use the `CDF_MaybeBuiltin` matching mode, the > `get_current_dir_name_is_source` would match for the `CallDescription > {"get_current_dir_name"}` due to the way we fuzzy match for builtins. > Prefixing with `Test` like `testGet_current_dir_name` would be fine though, > only the underscores are handled differently. > This applies to the rest of the test cases as well. fixed the test names Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D120236/new/ https://reviews.llvm.org/D120236 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits