steakhal added a comment. Huh, this was a long one. 😅 🚀
================ Comment at: clang/docs/analyzer/checkers.rst:2361 Default propagations defined by ``GenericTaintChecker``: -``atoi``, ``atol``, ``atoll``, ``fgetc``, ``fgetln``, ``fgets``, ``fscanf``, ``sscanf``, ``getc``, ``getc_unlocked``, ``getdelim``, ``getline``, ``getw``, ``pread``, ``read``, ``strchr``, ``strrchr``, ``tolower``, ``toupper`` +``atoi``, ``atol``, ``atoll``, ``basename``, ``dirname``, ``fgetc``, ``fgetln``, ``fgets``, ``fnmatch``, ``fread``, ``fscanf``, ``fscanf_s``, ``index``, ``inflate``, ``isalnum``, ``isalpha``, ``isascii``, ``isblank``, ``iscntrl``, ``isdigit``, ``isgraph``, ``islower``, ``isprint``, ``ispunct``, ``isspace``, ``isupper``, ``isxdigit``, ``memchr``, ``memrchr``, ``sscanf``, ``getc``, ``getc_unlocked``, ``getdelim``, ``getline``, ``getw``, ``memcmp``, ``memcpy``, ``memmem``, ``memmove``, ``mbtowc``, ``pread``, ``qsort``, ``qsort_r``, ``rawmemchr``, ``read``, ``readv``, ``recv``, ``recvfrom``, ``rindex``, ``strcasestr``, ``strchr``, ``strchrnul``, ``strcasecmp``, ``strcmp``, ``strcspn``, ``strlen``, ``strncasecmp``, ``strncmp``, ``strndup``, ``strndupa``, ``strpbrk``, ``strrchr``, ``strsep``, ``strspn``, ``strstr``, ``strtol``, ``strtoll``, ``strtoul``, ``strtoull``, ``tolower``, ``toupper``, ``ttyname``, ``ttyname_r``, ``vfscanf``, ``vfscanf``, ``wctomb``, ``wcwidth`` ---------------- I cannot see the corresponding propagation rule. That being said, it would be handy to mention that this is for `zlib` decompression and this should be probably a taint source anyway. `vfscanf` occurs two times. `vscanf` is not mentioned here; and there are probably a couple others like this. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:561 + {{"vscanf"}, TR::Prop({{0}}, {{}, 1})}, + {{"vfscanf"}, TR::Prop({{0}}, {{}, 2})}, + ---------------- This function has exactly 3 arguments. I'm also puzzled how tainting `va_list` will work out. That should be modeled separately IMO. This comment applies to all of the similar `va_list` accepting functions, such as `vscanf`, `vfscanf`, and possibly others. That being said, I think `vscanf` is more like `scanf`, so it should be modeled as a taint source instead of a propagator. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:575 + {{"fread"}, TR::Prop({{3}}, {{0, ReturnValueIndex}})}, + {{"readv"}, TR::Prop({{0}}, {{1, ReturnValueIndex}})}, + {{"recv"}, TR::Prop({{0}}, {{1, ReturnValueIndex}})}, ---------------- I'm on board with marking read operations as props/sources. Let's look at the declaration: `ssize_t readv(int fd, const struct iovec *iov, int iovcnt);` I'm not sure how could the pointee of `iov` be modified by this call, as its `const`. Additionally, I doubt the effectiveness of the rule, since I don't think it would be too likely to have a path leading to a taint sink with an `iov` pointer. That being said, let it be, but don't expect much from this rule :D ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:577 + {{"recv"}, TR::Prop({{0}}, {{1, ReturnValueIndex}})}, + {{"recvfrom"}, TR::Prop({{0}}, {{1, ReturnValueIndex}})}, + ---------------- It's sad that we don't model `recvmsg`, but it would be quite difficult to model. I can see why you left that out. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:579-580 + + {{"ttyname"}, TR::Prop({{0}}, {{ReturnValueIndex}})}, + {{"ttyname_r"}, TR::Prop({{0}}, {{1, ReturnValueIndex}})}, + ---------------- I'm not sure how could these be used as gadgets, but let it be. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:582-583 + + {{"dirname"}, TR::Prop({{0}}, {{ReturnValueIndex}})}, + {{"basename"}, TR::Prop({{0}}, {{ReturnValueIndex}})}, + {{"fnmatch"}, TR::Prop({{0}}, {{ReturnValueIndex}})}, ---------------- These should be sorted, isn't it? ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:584 + {{"basename"}, TR::Prop({{0}}, {{ReturnValueIndex}})}, + {{"fnmatch"}, TR::Prop({{0}}, {{ReturnValueIndex}})}, + {{"memchr"}, TR::Prop({{0}}, {{ReturnValueIndex}})}, ---------------- > `int fnmatch(const char *pattern, const char *string, int flags);` >The `fnmatch()` function checks whether the string argument matches the >pattern argument, which is a shell wildcard pattern (see `glob(7)`). From this //man// entry I would think that we should propagate from the **second** argument. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:596 + {{"memmove"}, TR::Prop({{1}}, {{0, ReturnValueIndex}})}, + {{"memmem"}, TR::Prop({{0}}, {{ReturnValueIndex}})}, + ---------------- One could say that if `memmem` was called with a tainted //needle// and actually finds something in the //haystack//, that would mean essentially that the value pointed by the return value has the same content as the //needle//. However, I still agree with the current propagation rule, depending only on `arg 0`. This might reasoning might deserve a comment though. `strstr` also shares this property. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:606 + + // FIXME: in case of arrays ,only the first element of the array gets + // tainted. ---------------- ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:615-616 + {{"strncasecmp"}, TR::Prop({{0, 1, 2}}, {{ReturnValueIndex}})}, + {{"strspn"}, TR::Prop({{0}}, {{ReturnValueIndex}})}, + {{"strcspn"}, TR::Prop({{0}}, {{ReturnValueIndex}})}, + {{"strpbrk"}, TR::Prop({{0}}, {{ReturnValueIndex}})}, ---------------- IMO these should propagate from `{0,1}`, similarly how the `strncmp` does propagate from its last argument. ================ Comment at: clang/test/Analysis/taint-generic.c:372-378 +int testVscanf(int *d) { + char format[10]; + scanf("%9s", format); // fake a tainted a file descriptor + + vscanf(format, &d); + return 1 / *d; // expected-warning {{Division by a tainted value, possibly zero}} +} ---------------- This test case definitely needs to be reworked. 1) We don't have FDs in this context, unlike the comment suggests. 2) `vscanf` should be an unconditional taint source, instead of being a propagator. ================ Comment at: clang/test/Analysis/taint-generic.c:392-395 + if (some_global_flag_to_branch_on) // just to have 2 branches, and assert 2 division by zero messages + return 1 / *buffer; // expected-warning {{Division by a tainted value, possibly zero}} + + return 1 / read; // expected-warning {{Division by a tainted value, possibly zero}} ---------------- For this type of things I can recommend using the `clang_analyzer_isTainted(T)` `debug.ExprInspection` introspection function. It will do the right thing, without sinking the execution path. I've seen in some other test cases that you used an extra top-level fn parameter for the same. That's slightly better than using a global, but I would still recommend the debug function. ================ Comment at: clang/test/Analysis/taint-generic.c:399-400 +struct iovec { + void *iov_base; /* Starting address */ + size_t iov_len; /* Number of bytes to transfer */ +}; ---------------- Please use single-line comments. It makes debugging test files easier in some cases. ================ Comment at: clang/test/Analysis/taint-generic.c:408 + size_t read = readv(fd, iov, iovcnt); + // FIXME: should be able to assert that iov is also tainted + return 1 / read; // expected-warning {{Division by a tainted value, possibly zero}} ---------------- `clang_analyzer_isTainted(*iov)` ================ Comment at: clang/test/Analysis/taint-generic.c:905 +int isspace(int c); +int testIsspace() { + char c; ---------------- You should group these all into a single test case. This way the setup code for the test dominates compared to the actual content. ================ Comment at: clang/test/Analysis/taint-generic.c:927-928 +int cmp_less(const void *lhs, const void *rhs) { + return *(int *)lhs < *(int *)rhs ? -1 : *(int *)lhs > *(int *)rhs ? 1 + : 0; +} ---------------- >The comparison function must return an integer less than, equal to, or greater >than zero if the first argument is considered to be respectively less than, >equal to, or greater than the second. ================ Comment at: clang/test/Analysis/taint-generic.c:939-942 +int cmp_less_than(const void *lhs, const void *rhs, void *baseline) { + return *(int *)lhs < *(int *)baseline ? -1 : *(int *)lhs > *(int *)baseline ? 1 + : 0; +} ---------------- Just use the `cmp_less` instead. You can also fuse the qsort test cases into a single function. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D120369/new/ https://reviews.llvm.org/D120369 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits