On Sat, Jan 7, 2017 at 9:29 AM, Paul Eggert <egg...@cs.ucla.edu> wrote: > > Could you remind us about the latest status of your proposal compared to > Zev Weiss's? Does <http://bugs.gnu.org/24689> contain the latest thing > you have? Zev Weiss's latest version is at <https://github.com/zevweiss/g > rep>. Comparing the two was the thing Jim Meyering asked for at < > http://bugs.gnu.org/22239#8>, and you can follow up by sending email to > 22...@debbugs.gnu.org.
Yes that github link is the latest version. I haven't made any changes to that since last year September. Basically the main thread traverses the file tree and assign the file to be searched to each thread. There is also a dynamic buffer so that the output is identical to the original grep program. I tested the program on a server. On a directory containing 4 files, grep -r on that directory is 4 times faster. On a directory containing 8 files, grep -r is 6 times faster. On a directory containing 12 files, grep -r is 8.5 times faster. I think using multithreading is essentially different from not using multithreading, and we also don't use multithreading all the time for grep. When we're not using multithreading, i.e. when we pass in other options for grep, more functions would call those functions whose function signatures we changed. This is hard to keep track of, because the program is fairly complicated. If we had overloading in C++ I would overload those functions. But since we don't, I made it very clear in the code which functions are the counterparts of the original versions. I did this to contain any potential problems so that if there are any problems with multithreading it would not affect the sequential program, whereas if we interleave the two scenarios we might lose track of what's going on. At least this is what I initially thought. I saw that there were some recent commits by Zev together with Jim, for example: in commit 9365ed6536d4fabf42ec17fef1bbe5d78884f950 * src/grep.c (compile_fp_t): Now returns an opaque pointer (the compiled pattern). (execute_fp_t): Now passed the pointer returned by a compile_fp_t. All call sites updated accordingly. (compiled_pattern): New static variable. * src/dfasearch.c (GEAcompile): Return a void pointer (dummy NULL). (EGexecute): Receive a void pointer argument (unused). * src/kwsearch.c (Fcompile): Return a void pointer (dummy NULL). (Fexecute): Receive a void pointer argument (unused). * src/pcresearch.c (Pcompile): Return a void pointer (dummy NULL). (Pexecute): Receive a void pointer argument (unused). * src/search.h: Update compile/execute function prototypes. So we have different approaches. They are trying to add extra pointer arguments for the multithreading case. The pointer argument would be NULL in the case multithreading is not in effect. Whereas my approach is to replicate the functions so the counterparts of the original functions are used in the multithreading scenario. This was done in an attempt to reduce the complexity of each of the functions and make the program less monolithic. I leave you guys to decide.