Paul Smith-20 wrote > On Sat, 2017-03-18 at 22:49 -0700, brenorg wrote: >> > I'd prefer to investigate improving the existing parser, rather than >> > create a completely separate parser path. >> >> I could agree if the difference were 10x or more. But I believe 2x a >> reasonable gain from removing so many features. From what I looked on the >> code, the main hassle comes from the target specific variable assignment. > > The thing is, the parser has to walk through all the characters on each > line at least once. It can know, after that walk, whether there's > anything special about a given word or complete line. There's no reason > the parser should have to do a lot of extra work, if it already knows > that there isn't anything interesting here to work on.
Yes, that was the hope I had before seeing the code. Unfortunately, the code is not well structured enough to make this optimization simple to implement. That's why I followed the "simpler scanner" path. Paul Smith-20 wrote >> So to sum up: >> 0 - I will get back with results for a newer version. >> 1 - How crazy it would be to make it multi-threaded? > > The question we have to ask is what is the upside. If your disk is > slow, then you're waiting for your disk: a HD has only one head so you > can only read one file at a time from the disk no matter how many > threads you have. Waiting for MORE disk IO isn't going to speed things > up appreciably, if the time spent actually parsing files is small > compared to the wait time for more content to parse. > > If the parse time is more equal to the disk IO time, then you might get > some benefit from having some amount of lookahead, either by async IO or > one extra thread. > > The question is do you REALLY get performance gains for this added > complexity? I'm not convinced it's a no-brainer. I'd need to see some > analysis showing exactly where the time is going during the parsing. I don't think disk plays much into this. If the SO file cache is hot, most time should be spent on the parser - and that is what I see. I ran perf on the actual code parsing a large number of files, and 80% of the time goes to eval_makefile/eval. Paul Smith-20 wrote >> 2- This should be configurable with a very strong disclaimer. The >> alternative scanner wouldn't do any sanity check, so it could be >> dangerous. >> 3 - Other option could involve creating a separate tool to collect a >> bunch >> of "simple files" and pre-process them into a compact database. That >> resulting file could then be read into the makefile. By doing that, Make >> would have to understand this internal compact database format. Still, it >> would probably need a lot code, even more than the simple scanner. > > It's quite possible something like this could be done via an extension, > either Guile or a shared library, that maintained a database. To make > it really efficient we'd need a new API that allowed extensions to > define new rules, or at least prerequisite definitions, but even without > that condensing the values to a single instance (as you've discovered) > could be helpful. > > I mean something like, defining a new function that would parse a .d > file and add content into some kind of database. I love the idea. A generic callback API would be nice and easy to support. I don't know much about Guile. I will take a look at that. Next steps are to see how far the "condensing" values takes me and get back in here if I think we can do better. -- View this message in context: http://gnu-make.2324884.n4.nabble.com/Optimization-for-reading-d-files-tp17656p17664.html Sent from the Gnu - Make - Bugs mailing list archive at Nabble.com. _______________________________________________ Bug-make mailing list Bug-make@gnu.org https://lists.gnu.org/mailman/listinfo/bug-make