tkuchta added inline comments.
================ Comment at: compiler-rt/lib/dfsan/dfsan_custom.cpp:213 + char *res = strsep(s, delim); + s_label = dfsan_read_label(base, strlen(base)); + if (res && (res != base)) { ---------------- browneee wrote: > tkuchta wrote: > > browneee wrote: > > > tkuchta wrote: > > > > browneee wrote: > > > > > The `s_label` represents the taint label for `s` (the pointer). > > > > > > > > > > This line would clobber the taint label of the pointer (`s`) with a > > > > > taint label from `s[0][0..n]`. > > > > > > > > > > I think this line should be deleted. > > > > Agree, s_label represents the taint associated with the **s pointer. > > > > However I am now wondering if that is the taint wich we would like to > > > > return. > > > > For example, if we have > > > > if (flags().strict_data_dependencies) { > > > > *ret_label = res ? s_label : 0; > > > > > > > > We would taint the return value with the value of the pointer, not the > > > > data. It means that if we operate on a string for which the characters > > > > are tainted, but the pointer itself isn't, we are likely going to > > > > return label 0. My understanding was that we are more concerned with > > > > the taint of the data, not the pointer, am I missing something? > > > > > > > Yes, we are usually more concerned with the taint of the data, not the > > > pointer. > > > > > > With strict dependencies: > > > // If the input pointer is tainted, the output pointer would be tainted > > > (because it is derived from the input pointer - maybe the same value). > > > taint(s[0]) == dfsan_read_label(s, sizeof(s)) ====> taint(ret) == > > > ret_label[0] > > > > > > // If the input data is tainted, the output data would be tainted > > > (because it is derived from the input data). > > > taint(s[0][0]) == MEM_TO_SHADOW(s[0])[0] ====> taint(ret[0]) == > > > MEM_TO_SHADOW(ret)[0] > > > > > > Because s[0] == ret (or ret==null), (for the non-null case) the output > > > shadow bytes are the same bytes as input shadow bytes and so these taint > > > labels for the string data in shadow memory do not need to be explicitly > > > propagated in this function. > > > > > > I think the only case actually changing/copying string data is writing a > > > delimiter byte to NULL, which you handled. > > I am working on the changes and I came across a strange behavior that I > > would like to ask about. > > > > It turned out that if we do > > > > char *s = " ... "; > > dfsan_set_label(label, &p_s, sizeof(&p_s)); > > > > Then, the s_label within the handler is 0, not "label". This is unexpected, > > as we would like the pointer itself to be labeled here. > > I believe this has something to do with the fact that the input string in > > strsep is a double pointer. For example this works as expected for the > > delimiter string, which is a single pointer. > > It's either I'm doing something incorrectly or there is some issue with > > propagating labels for double pointers. > > Have you perhaps come across this behavior before? > I'm not sure what p_s is in your example. Could you provide a more complete > example? > (and maybe all in one function, not half in __dfsw_strsep and half in another > function) > > Here is an example showing taint labels at different levels of indirection: > > ``` > #include <assert.h> > #include <string.h> > #include <sanitizer/dfsan_interface.h> > > int main(void) { > char *s = " ... "; > char **p_s = &s; > char ***pp_s = &p_s; > > dfsan_label i_label = 1 << 0; > dfsan_label j_label = 1 << 1; > dfsan_label k_label = 1 << 2; > dfsan_label m_label = 1 << 3; > > // string data > dfsan_set_label(i_label, s, strlen(s)); > // pointer s > dfsan_set_label(j_label, &s, sizeof(s)); > // pointer to pointer s > dfsan_set_label(k_label, &p_s, sizeof(p_s)); > // pointer to pointer to pointer s > dfsan_set_label(m_label, &pp_s, sizeof(pp_s)); > > assert(pp_s[0][0] == s); > > // string data > assert(dfsan_read_label(s, strlen(s)) == i_label); > // pointer s > assert(dfsan_read_label(&s, sizeof(s)) == j_label); > // pointer to pointer s > assert(dfsan_read_label(&p_s, sizeof(p_s)) == k_label); > // pointer to pointer to pointer s > assert(dfsan_read_label(&pp_s, sizeof(pp_s)) == m_label); > > return 0; > } > ``` Hello, Thank you for the comment. I should have provided a more complete example. What I meant is a behavior I've found while working on the tests. In the test file I have something like that: ``` char *s = "Hello world/"; char *delim = " /"; dfsan_set_label(label, &s, sizeof(&s)); char *rv = strep(&s, delim); ``` If I get this right, the line ``` dfsan_set_label(label, &s, sizeof(&s)); ``` should have applied a taint to the `s` pointer. However, when inside `strsep`, if we check the `s_label`, it's `0`, not `label` ``` SANITIZER_INTERFACE_ATTRIBUTE char *__dfsw_strsep(char **s, const char *delim, dfsan_label s_label, dfsan_label delim_label, dfsan_label *ret_label) { fprintf(stderr, "s_label = %d\n", s_label); // -> this prints out "0" ``` If we do exactly the same trick with the `delim` pointer, the associated `delim_label` will be equal to `label`, as expected. It seems to me that either I am missing some important point here or the `s_label` is not propagated correctly by the compiler runtime for a double pointer to the user-provided handler (in my case `__dfsw_strsep`). I believe that strsep is currently the first function within dfsan_custom.cpp in which we operate on or check the label of the double pointer. I would like to ask if you maybe came across this behavior. It might be that I misunderstood how this should work and would be grateful for your help. Thank you CHANGES SINCE LAST ACTION https://reviews.llvm.org/D141389/new/ https://reviews.llvm.org/D141389 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits