[PATCH] D100161: Redistribute energy for Corpus

2021-07-02 Thread taotao gu via Phabricator via cfe-commits
gtt1995 added a comment.



  I redesigned the algorithm and did a complete long-term evaluation by myself, 
and got very good results. Whether it is -entropic=0 or 1, it performs very 
well, and -fork mode is now better than paralllel fuzzing mode Better 
performance, 
  please move to D105084. There are detailed data. Thanks  a lot.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100161/new/

https://reviews.llvm.org/D100161

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100161: Redistribute energy for Corpus

2021-04-08 Thread taotao gu via Phabricator via cfe-commits
gtt1995 created this revision.
gtt1995 added reviewers: 01alchemist, 0b01.
gtt1995 requested review of this revision.
Herald added projects: clang, Sanitizers.
Herald added subscribers: Sanitizers, cfe-commits.

Divide the corpus into n parts according to size. Each job executes each corpus 
in turn, Job one executes the corpus with the smallest size, Job two executes 
the relatively larger corpus,...Job N executes the seed of the largest corpus, 
in turn,. i.e. each job choose some seeds from corpus 1, corpus2 2,..., corpus 
N, corpus1,corpus2...corpus N .
that is, allocate more energy to the small seeds, trigger the common path in 
advance, and prefer to keep the small seeds.
In my experiment , It is found that the bugs rate is greatly accelerated, the 
cov is greatly increased (equal to the effect of entropic improvement), and the 
size of the newly generated interesting seeds is very small.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D100161

Files:
  clang/tools/clang-format/clang-format.py
  compiler-rt/lib/fuzzer/FuzzerDriver.cpp
  compiler-rt/lib/fuzzer/FuzzerFlags.def
  compiler-rt/lib/fuzzer/FuzzerFork.cpp
  compiler-rt/lib/fuzzer/FuzzerFork.h

Index: compiler-rt/lib/fuzzer/FuzzerFork.h
===
--- compiler-rt/lib/fuzzer/FuzzerFork.h
+++ compiler-rt/lib/fuzzer/FuzzerFork.h
@@ -18,7 +18,7 @@
 namespace fuzzer {
 void FuzzWithFork(Random &Rand, const FuzzingOptions &Options,
   const Vector &Args,
-  const Vector &CorpusDirs, int NumJobs);
+  const Vector &CorpusDirs, int NumJobs, int NumCorpuses);
 } // namespace fuzzer
 
 #endif // LLVM_FUZZER_FORK_H
Index: compiler-rt/lib/fuzzer/FuzzerFork.cpp
===
--- compiler-rt/lib/fuzzer/FuzzerFork.cpp
+++ compiler-rt/lib/fuzzer/FuzzerFork.cpp
@@ -114,7 +114,7 @@
 .count();
   }
 
-  FuzzJob *CreateNewJob(size_t JobId) {
+  FuzzJob *CreateNewJob(size_t JobId, int NumCorpuses) {
 Command Cmd(Args);
 Cmd.removeFlag("fork");
 Cmd.removeFlag("runs");
@@ -135,11 +135,23 @@
 std::string Seeds;
 if (size_t CorpusSubsetSize =
 std::min(Files.size(), (size_t)sqrt(Files.size() + 2))) {
+  size_t AverageSize = Files.size()/NumCorpuses +1;
   auto Time1 = std::chrono::system_clock::now();
+  size_t StartIndex = ((JobId-1)%NumCorpuses) *  AverageSize;
+  printf("\n Job %d Choose Corpus  %d ",JobId,(JobId)%NumCorpuses);
   for (size_t i = 0; i < CorpusSubsetSize; i++) {
-auto &SF = Files[Rand->SkewTowardsLast(Files.size())];
-Seeds += (Seeds.empty() ? "" : ",") + SF;
-CollectDFT(SF);
+size_t j = Rand->SkewTowardsLast(AverageSize);
+size_t m = j + StartIndex;
+if (m < Files.size()) {
+auto &SF = Files[m];
+Seeds += (Seeds.empty() ? "" : ",") + SF;
+CollectDFT(SF);
+}
+else  {
+auto &SF = Files[Rand->SkewTowardsLast(Files.size())];
+Seeds += (Seeds.empty() ? "" : ",") + SF;
+CollectDFT(SF);
+}
   }
   auto Time2 = std::chrono::system_clock::now();
   auto DftTimeInSeconds = duration_cast(Time2 - Time1).count();
@@ -284,7 +296,7 @@
 // This is just a skeleton of an experimental -fork=1 feature.
 void FuzzWithFork(Random &Rand, const FuzzingOptions &Options,
   const Vector &Args,
-  const Vector &CorpusDirs, int NumJobs) {
+  const Vector &CorpusDirs, int NumJobs, int NumCorpuses) {
   Printf("INFO: -fork=%d: fuzzing in separate process(s)\n", NumJobs);
 
   GlobalEnv Env;
@@ -341,8 +353,9 @@
   Vector Threads;
   for (int t = 0; t < NumJobs; t++) {
 Threads.push_back(std::thread(WorkerThread, &FuzzQ, &MergeQ));
-FuzzQ.Push(Env.CreateNewJob(JobId++));
+FuzzQ.Push(Env.CreateNewJob(JobId++, NumCorpuses));
   }
+  //printf("\n 创建%d个jobs\n",NumJobs);
 
   while (true) {
 std::unique_ptr Job(MergeQ.Pop());
@@ -399,7 +412,7 @@
   break;
 }
 
-FuzzQ.Push(Env.CreateNewJob(JobId++));
+FuzzQ.Push(Env.CreateNewJob(JobId++, NumCorpuses));
   }
 
   for (auto &T : Threads)
Index: compiler-rt/lib/fuzzer/FuzzerFlags.def
===
--- compiler-rt/lib/fuzzer/FuzzerFlags.def
+++ compiler-rt/lib/fuzzer/FuzzerFlags.def
@@ -56,6 +56,7 @@
 FUZZER_FLAG_INT(max_total_time, 0, "If positive, indicates the maximal total "
"time in seconds to run the fuzzer.")
 FUZZER_FLAG_INT(help, 0, "Print help.")
+FUZZER_FLAG_INT(NumCorpuses, 1, "Divide the corpus into N parts according to size.")
 FUZZER_FLAG_INT(fork, 0, "Experimental mode where fuzzing happens "
 "in a subprocess")
 FUZZER_FLAG_INT(ignore_timeouts, 1, "Ignore timeouts in fork mode")
Index: compiler-rt/lib/fuzzer/FuzzerDriver

[PATCH] D100161: Redistribute energy for Corpus

2021-04-10 Thread taotao gu via Phabricator via cfe-commits
gtt1995 added a comment.

In D100161#2679922 , @morehouse wrote:

> Thanks for the patch!  Would you mind sharing the experimental data/results 
> you obtained for this patch?
>
> Additionally, could you submit this patch to FuzzBench 
>  for an independent evaluation?
>
> Thanks,
> Matt

Hello,How can i share the experiment data?
It seems that  FuzzBench does not accept this parallel mode evaluation.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100161/new/

https://reviews.llvm.org/D100161

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100161: Redistribute energy for Corpus

2021-04-10 Thread taotao gu via Phabricator via cfe-commits
gtt1995 added a comment.

F16230733: experiment.tar 

This is part of raw data , the object from oss-fuzz project.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100161/new/

https://reviews.llvm.org/D100161

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100161: Redistribute energy for Corpus

2021-04-10 Thread taotao gu via Phabricator via cfe-commits
gtt1995 added a comment.

In D100161#2681045 , @gtt1995 wrote:

> F16230733: experiment.tar 
>
> This is part of raw data , the object from oss-fuzz project.

The patched version data in average dir,  data of libfuzzer dir  is from 
original libfuzzer.
Thanks.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100161/new/

https://reviews.llvm.org/D100161

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100161: Redistribute energy for Corpus

2021-04-12 Thread taotao gu via Phabricator via cfe-commits
gtt1995 added a comment.

In D100161#2683632 , @morehouse wrote:

> Also, the descriptions states:
>
>> Divide the corpus into n parts according to size.
>
> Is it really according to size?  IIUC when there are multiple worker 
> processes, any new coverage they have simply gets appended to `Files`.  So 
> `Files` is not necessarily sorted by size.

Yes, Grouping the corpus has two advantages, one is that it can reduce the 
repetitive work of each process, and the other is that small seeds trigger the 
same path in advance.  And triggering new paths in advance is the key reason to 
improve efficiency. so 'Files' should sorted by size.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100161/new/

https://reviews.llvm.org/D100161

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100161: Redistribute energy for Corpus

2021-04-12 Thread taotao gu via Phabricator via cfe-commits
gtt1995 added a comment.

In D100161#2683622 , @morehouse wrote:

> Thanks for sharing your data.  Took a quick look and seems promising.
>
> I would like to try this on FuzzBench before accepting the patch though.  
> FuzzBench has a very nice experimental framework for evaluating changes like 
> this.
>
>> It seems that FuzzBench does not accept this parallel mode evaluation.
>
> I talked to @metzman who manages FuzzBench.  Sounds like you're correct, 
> FuzzBench uses only one worker process in fork mode.  @metzman said we could 
> probably run a special experiment with more workers to evaluate this patch.
>
> Another approach that might be worth doing, is to make the patch effective 
> even for a single worker.  For example, maybe we randomly pick from a subset 
> of the corpus for that single worker.
>
> Also, I'm curious how the number of fork-mode workers affects efficacy.  I 
> can imagine with lots of workers that this patch could perform much worse.  
> Specifically if we have a small number of corpus elements per wOorker, the 
> crossover mutation becomes quite limited.

OK, Thanks for your work.
There are some  thoughtful tips to tell you :

1. The effect of Grouping corpus energy  may be similar to the effect of 
entropic (distribute energy for single seeds)on some goals, but there are also 
differences. So you should enable -entropic=0 when you evalutate them . Of 
course , At the same time, Enable -entropic and -NumCorpus will also have a 
certain effect .If you are interested, you can test four groups of subjects

(1).-entropic=0,NumCorpuses=1;
(2),-entropic=1,NumCorpuses=1
(3),-entropic=0, NumCorpuses=N (i set 30, others are also possible, I think 
this changes with the total number of seeds, it should change dynamically,)
(4),-entropic=1,NumCorpuses=30

2. I set -fork=30,-NumCorpuses=30,-entropic=0 in my evaluation. But the -fork 
value can not be equal to the -NumCorpuses value, because each job will execute 
each corpus in turn from small to large.
3. According to 2, it should be worked well in the single core mode,  Single 
process executes each corpus in turn from small to large. for in-process 
libfuzzer, frequent interaction with fs brings additional overhead. Therefore, 
it is still suitable for energy scheduling in parallel fuzzing  when each child 
process maintains the same coverage bitmap in time .
4. The degree of parallelism depends on how long you want to get results .I 
think more grouping and -fork equal to -NumCorpuses will be much better. They 
can be regarded as: traversing all corpora in one loop.If this is not the case, 
the corpues that are biased to the back will not be fully tested, because the 
merged results of the previous jobs will be written back to fs (if there is a 
large seed generated by small jobs ) and will be taken out again, which will 
have some negative effects.
5. I am sorry  I haven't tried it on -workers .
6. Can you share the official results to me?
7. Thanks for your work once again!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100161/new/

https://reviews.llvm.org/D100161

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100161: Redistribute energy for Corpus

2021-04-12 Thread taotao gu via Phabricator via cfe-commits
gtt1995 added a comment.

In D100161#2683632 , @morehouse wrote:

> Also, the descriptions states:
>
>> Divide the corpus into n parts according to size.
>
> Is it really according to size?  IIUC when there are multiple worker 
> processes, any new coverage they have simply gets appended to `Files`.  So 
> `Files` is not necessarily sorted by size.

1.Sorry, I want to know  'Files' is sorted by size necessarily in the trunk  
version ,is it different from the https://github.com/Dor1s/libfuzzer-workshop? 
The shared data is collected on the https://github.com/Dor1s/libfuzzer-workshop 
, and the new evaluation data is still tested in the trunk version ,the data 
seems promising.

2. I have a question. When they created the task for the first time, the 
’FILES‘ were sorted strictly according to the size. The newly covered seeds 
were added after the Files were appended, which means that the new seeds in the 
future may not necessarily be sorted according to the size. , Then my theory 
may not hold true, but from the experimental data, at the same time, the 
average seed size of the main corpus is very small.The cov is more, this method 
may give little seeds more energy from a certain angle, but I don’t seem to be 
able to explain this phenomenon. Can you help me?
3. Wouldn't it be better if FILEs were always sorted strictly according to 
size? Because the use of corpus grouping is equivalent to N more locations for 
extracting seeds in the entire corpus, so Rand->SkewTowardsLast(Files.size()) 
may not be appropriate, so it tends to extract newly added seeds .
4. How to implement File is strictly sorted according to seeds. I only came 
into contact with libfuzzer code last two weeks and I am not very familiar with 
it .


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100161/new/

https://reviews.llvm.org/D100161

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100161: Redistribute energy for Corpus

2021-04-13 Thread taotao gu via Phabricator via cfe-commits
gtt1995 added a comment.

If not sorted by size ,  Just a simple grouping of corpus, the effect is 
similar to entropic.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100161/new/

https://reviews.llvm.org/D100161

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100161: Redistribute energy for Corpus

2021-04-13 Thread taotao gu via Phabricator via cfe-commits
gtt1995 added a comment.

Maybe uniform-random approach change efficacy!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100161/new/

https://reviews.llvm.org/D100161

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100161: Redistribute energy for Corpus

2021-04-13 Thread taotao gu via Phabricator via cfe-commits
gtt1995 added a comment.

In D100161#2686130 , @morehouse wrote:

> If the effect is similar to entropic, why do we need this patch as well?

They just have some similarities, they will be better after patching.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100161/new/

https://reviews.llvm.org/D100161

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100161: Redistribute energy for Corpus

2021-04-14 Thread taotao gu via Phabricator via cfe-commits
gtt1995 added a comment.

Hello.
Due to the time zone difference, I think our communication is a bit 
inefficient. Can we arrange a convenient time for you to focus on the 
discussion?
We use CST.
Thanks.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100161/new/

https://reviews.llvm.org/D100161

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100161: Redistribute energy for Corpus

2021-04-14 Thread taotao gu via Phabricator via cfe-commits
gtt1995 added a comment.

In D100161#2689156 , @morehouse wrote:

> At this point I am not convinced this patch will provide benefit for the 
> default use case when `-entropic=1`.  I am hesitant to add complexity to the 
> code for unsure benefit.
>
> If you request a FuzzBench experiment 
> 
>  to get some data on this, and the results look good, then I'll be willing to 
> invest more time into reviewing this patch.
>
> Please CC me on the FuzzBench pull request, so I can make sure we are 
> evaluating this properly.

Hello,
Fuzzbench don't accept the parallel mode testing .
I will share my complete experiment data in the future.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100161/new/

https://reviews.llvm.org/D100161

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits