[PATCH] D141389: [DFSAN] Add support for strnlen, strncat, strsep, sscanf and _tolower

Tomasz Kuchta via Phabricator via cfe-commits Thu, 02 Feb 2023 06:08:47 -0800

tkuchta added inline comments.


================
Comment at: compiler-rt/lib/dfsan/dfsan_custom.cpp:213
+  char *res = strsep(s, delim);
+  s_label = dfsan_read_label(base, strlen(base));
+  if (res && (res != base)) {
----------------
browneee wrote:
> tkuchta wrote:
> > browneee wrote:
> > > tkuchta wrote:
> > > > browneee wrote:
> > > > > The `s_label` represents the taint label for `s` (the pointer).
> > > > > 
> > > > > This line would clobber the taint label of the pointer (`s`) with a 
> > > > > taint label from `s[0][0..n]`.
> > > > > 
> > > > > I think this line should be deleted.
> > > > Agree, s_label represents the taint associated with the **s pointer. 
> > > > However I am now wondering if that is the taint wich we would like to 
> > > > return.
> > > > For example, if we have
> > > > if (flags().strict_data_dependencies) {
> > > >     *ret_label = res ? s_label : 0;
> > > > 
> > > > We would taint the return value with the value of the pointer, not the 
> > > > data. It means that if we operate on a string for which the characters 
> > > > are tainted, but the pointer itself isn't, we are likely going to 
> > > > return label 0. My understanding was that we are more concerned with 
> > > > the taint of the data, not the pointer, am I missing something?
> > > > 
> > > Yes, we are usually more concerned with the taint of the data, not the 
> > > pointer.
> > > 
> > > With strict dependencies:
> > > // If the input pointer is tainted, the output pointer would be tainted 
> > > (because it is derived from the input pointer - maybe the same value).
> > > taint(s[0]) == dfsan_read_label(s, sizeof(s)) ====> taint(ret) == 
> > > ret_label[0]
> > > 
> > > // If the input data is tainted, the output data would be tainted 
> > > (because it is derived from the input data).
> > > taint(s[0][0]) == MEM_TO_SHADOW(s[0])[0] ====> taint(ret[0]) == 
> > > MEM_TO_SHADOW(ret)[0]
> > > 
> > > Because s[0] == ret  (or ret==null), (for the non-null case) the output 
> > > shadow bytes are the same bytes as input shadow bytes and so these taint 
> > > labels for the string data in shadow memory do not need to be explicitly 
> > > propagated in this function. 
> > > 
> > > I think the only case actually changing/copying string data is writing a 
> > > delimiter byte to NULL, which you handled.
> > I am working on the changes and I came across a strange behavior that I 
> > would like to ask about.
> > 
> > It turned out that if we do
> > 
> > char *s = " ... ";
> > dfsan_set_label(label, &p_s, sizeof(&p_s));
> > 
> > Then, the s_label within the handler is 0, not "label". This is unexpected, 
> > as we would like the pointer itself to be labeled here.
> > I believe this has something to do with the fact that the input string in 
> > strsep is a double pointer. For example this works as expected for the 
> > delimiter string, which is a single pointer. 
> > It's either I'm doing something incorrectly or there is some issue with 
> > propagating labels for double pointers.
> > Have you perhaps come across this behavior before?
> I'm not sure what p_s is in your example. Could you provide a more complete 
> example?
> (and maybe all in one function, not half in __dfsw_strsep and half in another 
> function)
> 
> Here is an example showing taint labels at different levels of indirection:
> 
> ```
> #include <assert.h>
> #include <string.h>
> #include <sanitizer/dfsan_interface.h>
> 
> int main(void) {
>   char *s = " ... ";
>   char **p_s = &s;
>   char ***pp_s = &p_s;
> 
>   dfsan_label i_label = 1 << 0;
>   dfsan_label j_label = 1 << 1;
>   dfsan_label k_label = 1 << 2;
>   dfsan_label m_label = 1 << 3;
> 
>   // string data
>   dfsan_set_label(i_label, s, strlen(s));
>   // pointer s
>   dfsan_set_label(j_label, &s, sizeof(s));
>   // pointer to pointer s
>   dfsan_set_label(k_label, &p_s, sizeof(p_s));
>   // pointer to pointer to pointer s
>   dfsan_set_label(m_label, &pp_s, sizeof(pp_s));
> 
>   assert(pp_s[0][0] == s);
> 
>   // string data
>   assert(dfsan_read_label(s, strlen(s)) == i_label);
>   // pointer s
>   assert(dfsan_read_label(&s, sizeof(s)) == j_label);
>   // pointer to pointer s
>   assert(dfsan_read_label(&p_s, sizeof(p_s)) == k_label);
>   // pointer to pointer to pointer s
>   assert(dfsan_read_label(&pp_s, sizeof(pp_s)) == m_label);
> 
>   return 0;
> }
> ```
Hello,

Thank you for the comment.

I should have provided a more complete example.
What I meant is a behavior I've found while working on the tests.
In the test file I have something like that:

```
char *s = "Hello world/";
char *delim = " /";
dfsan_set_label(label, &s, sizeof(&s));
char *rv = strep(&s, delim);
 ```

If I get this right, the line 
```
dfsan_set_label(label, &s, sizeof(&s));

``` 
should have applied a taint to the `s` pointer.
However, when inside `strsep`, if we check the `s_label`, it's `0`, not `label`

```
SANITIZER_INTERFACE_ATTRIBUTE char *__dfsw_strsep(char **s, const char *delim,
                                                  dfsan_label s_label,
                                                  dfsan_label delim_label,
                                                  dfsan_label *ret_label) {
    fprintf(stderr, "s_label = %d\n", s_label); // -> this prints out "0"
```

If we do exactly the same trick with the `delim` pointer, the associated 
`delim_label` will be equal to `label`, as expected. 

It seems to me that either I am missing some important point here or the 
`s_label` is not propagated correctly by the compiler runtime for a double 
pointer to the user-provided handler (in my case `__dfsw_strsep`).

I believe that strsep is currently the first function within dfsan_custom.cpp 
in which we operate on or check the label of the double pointer.

I would like to ask if you maybe came across this behavior. It might be that I 
misunderstood how this should work and would be grateful for your help.
Thank you


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141389/new/

https://reviews.llvm.org/D141389

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141389: [DFSAN] Add support for strnlen, strncat, strsep, sscanf and _tolower

Reply via email to