[PATCH] D106833: [dfsan][NFC] Add compile flags and environment variables to doc

2021-07-26 Thread George Balatsouras via Phabricator via cfe-commits
gbalats accepted this revision.
gbalats added inline comments.
This revision is now accepted and ready to land.



Comment at: clang/docs/DataFlowSanitizer.rst:153
+If the flag is true, the label of ``v`` is the union of the label of ``p`` and
+the label of ``*p``. If the flag is false, the label of ``v`` is the label of
+``*p``.





Comment at: clang/docs/DataFlowSanitizer.rst:163
+If the flag is true, the label of ``*p`` is the union of the label of ``p`` and
+the label of ``v``. If the flag is false, the label of ``*p`` is the label of
+``v``.




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106833/new/

https://reviews.llvm.org/D106833

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106903: [dfsan][NFC] Describe how origin trace tracking works

2021-07-27 Thread George Balatsouras via Phabricator via cfe-commits
gbalats added inline comments.



Comment at: clang/docs/DataFlowSanitizerDesign.rst:142
+Every four 4-bytes aligned application bytes share a 4-byte origin value. A
+4-byte origin contains a 4-bit depth and a 28-bit hash ID of a chain.
+

What is a chain? It's not yet explained at this point.



Comment at: clang/docs/DataFlowSanitizerDesign.rst:152
+
+A chain starts by `dfsan_set_label` with non-zero labels. A new chain is added
+at stores or memory-transfer when ``-dfsan-track-origins`` is 1. Memory 
transfers

Do you mean that the chain is extended?



Comment at: clang/docs/DataFlowSanitizerDesign.rst:155
+include LLVM memory transfer instructions and wrapped glibc memcpy and memmove.
+When ``-dfsan-track-origins`` is 2, a new chain is also added at loads.
+

same here



Comment at: clang/docs/DataFlowSanitizerDesign.rst:157
+
+Other instructions do not create new chains, but simply propagate origin 
values.
+If an instruction has more than one operands with non-zero labels, the origin

I'm not sure I understand the definition of a chain. What are the links? When 
you say, "create a new chain" you mean adding a new link to an existing chain? 
I think you might be using this expression to reflect the implementation (e.g., 
chain has a pointer to its tail) but it doesn't help when explaining.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106903/new/

https://reviews.llvm.org/D106903

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D103745: [dfsan] Add full fast8 support

2021-06-07 Thread George Balatsouras via Phabricator via cfe-commits
gbalats marked an inline comment as done.
gbalats added inline comments.



Comment at: clang/docs/DataFlowSanitizer.rst:172-178
 assert(ij_label == 3);  // Verifies all of the above
 
+// Or, equivalently:
+assert(dfsan_has_label(ij_label, i_label));
+assert(dfsan_has_label(ij_label, j_label));
+assert(!dfsan_has_label(ij_label, k_label));
+

stephan.yichao.zhao wrote:
> If we swap assert(ij_label == 3) with the 3 dfsan_has_label, the two 
> equivalent blocks are close to each other.
I think this way is better to demonstrate the differences:
- the first block exposes dfsan internals (the integer representation) and 
makes explicit statements about label values
- the second block uses dfsan_has_label to abstract the internals, not exposing 
the integer representation (or the fact that labels are OR'd).



Comment at: clang/docs/DataFlowSanitizerDesign.rst:60
 
-As stated above, the tool must track a large number of taint
-labels. This poses an implementation challenge, as most multiple-label
-tainting systems assign one label per bit to shadow storage, and
-union taint labels using a bitwise or operation. This will not scale
-to clients which use hundreds or thousands of taint labels, as the
-label union operation becomes O(n) in the number of supported labels,
-and data associated with it will quickly dominate the live variable
-set, causing register spills and hampering performance.
-
-Instead, a low overhead approach is proposed which is best-case O(log\
-:sub:`2` n) during execution. The underlying assumption is that
-the required space of label unions is sparse, which is a reasonable
-assumption to make given that we are optimizing for the case where
-applications mostly copy data from one place to another, without often
-invoking the need for an actual union operation. The representation
-of a taint label is a 16-bit integer, and new labels are allocated
-sequentially from a pool. The label identifier 0 is special, and means
-that the data item is unlabelled.
-
-When a label union operation is requested at a join point (any
-arithmetic or logical operation with two or more operands, such as
-addition), the code checks whether a union is required, whether the
-same union has been requested before, and whether one union label
-subsumes the other. If so, it returns the previously allocated union
-label. If not, it allocates a new union label from the same pool used
-for new labels.
-
-Specifically, the instrumentation pass will insert code like this
-to decide the union label ``lu`` for a pair of labels ``l1``
-and ``l2``:
-
-.. code-block:: c
-
-  if (l1 == l2)
-lu = l1;
-  else
-lu = __dfsan_union(l1, l2);
-
-The equality comparison is outlined, to provide an early exit in
-the common cases where the program is processing unlabelled data, or
-where the two data items have the same label.  ``__dfsan_union`` is
-a runtime library function which performs all other union computation.
+We use an 8-bit unsigned integers for the representation of a
+label. The label identifier 0 is special, and means that the data item

stephan.yichao.zhao wrote:
> integer
Done.



Comment at: clang/docs/DataFlowSanitizerDesign.rst:65
+join point (any arithmetic or logical operation with two or more
+operands, such as addition), we can simply OR the two labels in O(1).
 

stephan.yichao.zhao wrote:
> the labels, and each OR is in O(1).
Not sure how this changes the meaning, since the two labels would need one OR 
instruction.



Comment at: clang/docs/DataFlowSanitizerDesign.rst:68
+Users are responsible for managing the 8 integer labels (i.e., keeping
+track of what labels they have used so far, pick one that is yet
+unused, etc).

stephan.yichao.zhao wrote:
> picking
Done.



Comment at: clang/docs/DataFlowSanitizerDesign.rst:74
 
 The following is the current memory layout for Linux/x86\_64:
 

stephan.yichao.zhao wrote:
> memory layout
Done.



Comment at: clang/docs/DataFlowSanitizerDesign.rst:99
 associated directly with registers.  Loads will result in a union of
-all shadow labels corresponding to bytes loaded (which most of the
-time will be short circuited by the initial comparison) and stores will
-result in a copy of the label to the shadow of all bytes stored to.
+all shadow labels corresponding to bytes loaded and stores will result
+in a copy of the label to the shadow of all bytes stored to.

stephan.yichao.zhao wrote:
> , and
Done



Comment at: clang/docs/DataFlowSanitizerDesign.rst:100
+all shadow labels corresponding to bytes loaded and stores will result
+in a copy of the label to the shadow of all bytes stored to.
 

stephan.yichao.zhao wrote:
> the label of a stored value
Done.



Comment at: compiler-rt/lib/dfsan/dfsan.cpp:209
 
-// Like 

[PATCH] D103745: [dfsan] Add full fast8 support

2021-06-07 Thread George Balatsouras via Phabricator via cfe-commits
gbalats added inline comments.



Comment at: llvm/test/Instrumentation/DataFlowSanitizer/external_mask.ll:6
 define i32 @test(i32 %a, i32* nocapture readonly %b) #0 {
-; CHECK: @"dfs$test"
-; CHECK: %[[RV:.*]] load{{.*}}__dfsan_shadow_ptr_mask
-; CHECK: ptrtoint i32* {{.*}} to i64
-; CHECK: and {{.*}}%[[RV:.*]]
-; CHECK16: mul i64
+  ; CHECK: @"dfs$test"
+  ; CHECK: %[[RV:.*]] load{{.*}}__dfsan_shadow_ptr_mask

browneee wrote:
> Fix whitespace
Acknowledged. Not a tab; just how phabricator indicates whitespace changes.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103745/new/

https://reviews.llvm.org/D103745

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D104494: [dfsan] Replace dfs$ prefix with .dfsan suffix

2021-06-17 Thread George Balatsouras via Phabricator via cfe-commits
gbalats marked an inline comment as done.
gbalats added inline comments.



Comment at: llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp:1134
+  Asm.replace(Pos, 1, Suffix + "@");
+}
 GV->getParent()->setModuleInlineAsm(Asm);

stephan.yichao.zhao wrote:
> Based on http://web.mit.edu/rhel-doc/3/rhel-as-en-3/symver.html, there must 
> be a @ in the .symver line after the first match.
> Please change  Pos != std::string::npos to be like
> ```
> Pos = Asm.find("@", Pos);
> assert(Pos != std::string::npos);
> ```
Done. Used `report_fatal_error` instead of `assert` since this can be triggered 
by user input. Ideally, I think we should be using some recoverable error 
mechanism, but since we're not doing it elsewhere, I'm not going to introduce 
this with this change.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104494/new/

https://reviews.llvm.org/D104494

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D104494: [dfsan] Replace dfs$ prefix with .dfsan suffix

2021-06-18 Thread George Balatsouras via Phabricator via cfe-commits
gbalats added a comment.

In D104494#2827639 , @vitalybuka 
wrote:

> https://lab.llvm.org/buildbot/#/builders/37/builds/4620 looks broken by this 
> patch

Looking into this. I missed (due to the "\\" in the expression) updating this:
https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/fuzzer/dataflow/DataFlow.cpp#L93-L94

I will try to get `ninja check-fuzzer` working again and either do a small fix 
forward or revert (if it turns out to be more complicated than just updating 
this expression).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104494/new/

https://reviews.llvm.org/D104494

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D104494: [dfsan] Replace dfs$ prefix with .dfsan suffix

2021-06-18 Thread George Balatsouras via Phabricator via cfe-commits
gbalats added a comment.

In D104494#2827708 , @gbalats wrote:

> In D104494#2827639 , @vitalybuka 
> wrote:
>
>> https://lab.llvm.org/buildbot/#/builders/37/builds/4620 looks broken by this 
>> patch
>
> Looking into this. I missed (due to the "\\" in the expression) updating this:
> https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/fuzzer/dataflow/DataFlow.cpp#L93-L94
>
> I will try to get `ninja check-fuzzer` working again and either do a small 
> fix forward or revert (if it turns out to be more complicated than just 
> updating this expression).

https://reviews.llvm.org/D104568


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104494/new/

https://reviews.llvm.org/D104494

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits