browneee added inline comments.

================
Comment at: compiler-rt/lib/dfsan/dfsan_custom.cpp:213
+  char *res = strsep(s, delim);
+  s_label = dfsan_read_label(base, strlen(base));
+  if (res && (res != base)) {
----------------
tkuchta wrote:
> browneee wrote:
> > tkuchta wrote:
> > > browneee wrote:
> > > > tkuchta wrote:
> > > > > browneee wrote:
> > > > > > The `s_label` represents the taint label for `s` (the pointer).
> > > > > > 
> > > > > > This line would clobber the taint label of the pointer (`s`) with a 
> > > > > > taint label from `s[0][0..n]`.
> > > > > > 
> > > > > > I think this line should be deleted.
> > > > > Agree, s_label represents the taint associated with the **s pointer. 
> > > > > However I am now wondering if that is the taint wich we would like to 
> > > > > return.
> > > > > For example, if we have
> > > > > if (flags().strict_data_dependencies) {
> > > > >     *ret_label = res ? s_label : 0;
> > > > > 
> > > > > We would taint the return value with the value of the pointer, not 
> > > > > the data. It means that if we operate on a string for which the 
> > > > > characters are tainted, but the pointer itself isn't, we are likely 
> > > > > going to return label 0. My understanding was that we are more 
> > > > > concerned with the taint of the data, not the pointer, am I missing 
> > > > > something?
> > > > > 
> > > > Yes, we are usually more concerned with the taint of the data, not the 
> > > > pointer.
> > > > 
> > > > With strict dependencies:
> > > > // If the input pointer is tainted, the output pointer would be tainted 
> > > > (because it is derived from the input pointer - maybe the same value).
> > > > taint(s[0]) == dfsan_read_label(s, sizeof(s)) ====> taint(ret) == 
> > > > ret_label[0]
> > > > 
> > > > // If the input data is tainted, the output data would be tainted 
> > > > (because it is derived from the input data).
> > > > taint(s[0][0]) == MEM_TO_SHADOW(s[0])[0] ====> taint(ret[0]) == 
> > > > MEM_TO_SHADOW(ret)[0]
> > > > 
> > > > Because s[0] == ret  (or ret==null), (for the non-null case) the output 
> > > > shadow bytes are the same bytes as input shadow bytes and so these 
> > > > taint labels for the string data in shadow memory do not need to be 
> > > > explicitly propagated in this function. 
> > > > 
> > > > I think the only case actually changing/copying string data is writing 
> > > > a delimiter byte to NULL, which you handled.
> > > I am working on the changes and I came across a strange behavior that I 
> > > would like to ask about.
> > > 
> > > It turned out that if we do
> > > 
> > > char *s = " ... ";
> > > dfsan_set_label(label, &p_s, sizeof(&p_s));
> > > 
> > > Then, the s_label within the handler is 0, not "label". This is 
> > > unexpected, as we would like the pointer itself to be labeled here.
> > > I believe this has something to do with the fact that the input string in 
> > > strsep is a double pointer. For example this works as expected for the 
> > > delimiter string, which is a single pointer. 
> > > It's either I'm doing something incorrectly or there is some issue with 
> > > propagating labels for double pointers.
> > > Have you perhaps come across this behavior before?
> > I'm not sure what p_s is in your example. Could you provide a more complete 
> > example?
> > (and maybe all in one function, not half in __dfsw_strsep and half in 
> > another function)
> > 
> > Here is an example showing taint labels at different levels of indirection:
> > 
> > ```
> > #include <assert.h>
> > #include <string.h>
> > #include <sanitizer/dfsan_interface.h>
> > 
> > int main(void) {
> >   char *s = " ... ";
> >   char **p_s = &s;
> >   char ***pp_s = &p_s;
> > 
> >   dfsan_label i_label = 1 << 0;
> >   dfsan_label j_label = 1 << 1;
> >   dfsan_label k_label = 1 << 2;
> >   dfsan_label m_label = 1 << 3;
> > 
> >   // string data
> >   dfsan_set_label(i_label, s, strlen(s));
> >   // pointer s
> >   dfsan_set_label(j_label, &s, sizeof(s));
> >   // pointer to pointer s
> >   dfsan_set_label(k_label, &p_s, sizeof(p_s));
> >   // pointer to pointer to pointer s
> >   dfsan_set_label(m_label, &pp_s, sizeof(pp_s));
> > 
> >   assert(pp_s[0][0] == s);
> > 
> >   // string data
> >   assert(dfsan_read_label(s, strlen(s)) == i_label);
> >   // pointer s
> >   assert(dfsan_read_label(&s, sizeof(s)) == j_label);
> >   // pointer to pointer s
> >   assert(dfsan_read_label(&p_s, sizeof(p_s)) == k_label);
> >   // pointer to pointer to pointer s
> >   assert(dfsan_read_label(&pp_s, sizeof(pp_s)) == m_label);
> > 
> >   return 0;
> > }
> > ```
> Hello,
> 
> Thank you for the comment.
> 
> I should have provided a more complete example.
> What I meant is a behavior I've found while working on the tests.
> In the test file I have something like that:
> 
> ```
> char *s = "Hello world/";
> char *delim = " /";
> dfsan_set_label(label, &s, sizeof(&s));
> char *rv = strep(&s, delim);
>  ```
> 
> If I get this right, the line 
> ```
> dfsan_set_label(label, &s, sizeof(&s));
> 
> ``` 
> should have applied a taint to the `s` pointer.
> However, when inside `strsep`, if we check the `s_label`, it's `0`, not 
> `label`
> 
> ```
> SANITIZER_INTERFACE_ATTRIBUTE char *__dfsw_strsep(char **s, const char *delim,
>                                                   dfsan_label s_label,
>                                                   dfsan_label delim_label,
>                                                   dfsan_label *ret_label) {
>     fprintf(stderr, "s_label = %d\n", s_label); // -> this prints out "0"
> ```
> 
> If we do exactly the same trick with the `delim` pointer, the associated 
> `delim_label` will be equal to `label`, as expected. 
> 
> It seems to me that either I am missing some important point here or the 
> `s_label` is not propagated correctly by the compiler runtime for a double 
> pointer to the user-provided handler (in my case `__dfsw_strsep`).
> 
> I believe that strsep is currently the first function within dfsan_custom.cpp 
> in which we operate on or check the label of the double pointer.
> 
> I would like to ask if you maybe came across this behavior. It might be that 
> I misunderstood how this should work and would be grateful for your help.
> Thank you
> If we do exactly the same trick with the delim pointer, the associated 
> delim_label will be equal to label, as expected.

> `char *rv = strep(&s, delim);`

The difference is `&s` adds another level of indirection, `delim` does not

I haven't tested, but I expect this example would work...

```
char *s = "Hello world/";
char **ps = &s;
char *delim = " /";
dfsan_set_label(label, &ps, sizeof(&ps));
char *rv = strep(ps, delim);

// Inside __dfsw_strsep ...
    fprintf(stderr, "s_label = %d\n", s_label); // should now print nonzero 
label
```


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141389/new/

https://reviews.llvm.org/D141389

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to