Re: Optimization for reading *.d files

brenorg Sun, 19 Mar 2017 11:20:44 -0700

Paul Smith-20 wrote
> On Sat, 2017-03-18 at 22:49 -0700, brenorg wrote:
>> > I'd prefer to investigate improving the existing parser, rather than
>> > create a completely separate parser path.
>> 
>> I could agree if the difference were 10x or more. But I believe 2x a
>> reasonable gain from removing so many features. From what I looked on the
>> code, the main hassle comes from the target specific variable assignment.
> 
> The thing is, the parser has to walk through all the characters on each
> line at least once.  It can know, after that walk, whether there's
> anything special about a given word or complete line.  There's no reason
> the parser should have to do a lot of extra work, if it already knows
> that there isn't anything interesting here to work on.


Yes, that was the hope I had before seeing the code. Unfortunately, the code
is not well structured enough to make this optimization simple to implement.
That's why I followed the "simpler scanner" path.


Paul Smith-20 wrote
>> So to sum up:
>> 0 - I will get back with results for a newer version.
>> 1 - How crazy it would be to make it multi-threaded?
> 
> The question we have to ask is what is the upside.  If your disk is
> slow, then you're waiting for your disk: a HD has only one head so you
> can only read one file at a time from the disk no matter how many
> threads you have.  Waiting for MORE disk IO isn't going to speed things
> up appreciably, if the time spent actually parsing files is small
> compared to the wait time for more content to parse.
> 
> If the parse time is more equal to the disk IO time, then you might get
> some benefit from having some amount of lookahead, either by async IO or
> one extra thread.
> 
> The question is do you REALLY get performance gains for this added
> complexity?  I'm not convinced it's a no-brainer.  I'd need to see some
> analysis showing exactly where the time is going during the parsing.

I don't think disk plays much into this. If the SO file cache is hot, most
time should be spent on the parser - and that is what I see.
I ran perf on the actual code parsing a large number of files, and 80% of
the time goes to eval_makefile/eval.



Paul Smith-20 wrote
>> 2- This should be configurable with a very strong disclaimer. The
>> alternative scanner wouldn't do any sanity check, so it could be
>> dangerous.
>> 3 - Other option could involve creating a separate tool to collect a
>> bunch
>> of "simple files" and pre-process them into a compact database. That
>> resulting file could then be read into the makefile. By doing that, Make
>> would have to understand this internal compact database format. Still, it
>> would probably need a lot code, even more than the simple scanner.
> 
> It's quite possible something like this could be done via an extension,
> either Guile or a shared library, that maintained a database.  To make
> it really efficient we'd need a new API that allowed extensions to
> define new rules, or at least prerequisite definitions, but even without
> that condensing the values to a single instance (as you've discovered)
> could be helpful.
> 
> I mean something like, defining a new function that would parse a .d
> file and add content into some kind of database. 

I love the idea. A generic callback API would be nice and easy to support.
I don't know much about Guile. I will take a look at that.

Next steps are to see how far the "condensing" values takes me and get back
in here if I think we can do better.





--
View this message in context: 
http://gnu-make.2324884.n4.nabble.com/Optimization-for-reading-d-files-tp17656p17664.html
Sent from the Gnu - Make - Bugs mailing list archive at Nabble.com.

_______________________________________________
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make

Re: Optimization for reading *.d files

Reply via email to