Hello Paul Sorry to take so long to reply. I wanted to think your input over, and I've had a pretty heavy load lately.
Signing over the copyright, and any other legal steps won't be a problem. My company has no rights to work I do in my own time. I'm mainly worried about the technical issues, and finding the time to do the work. Until now I've been pretty happy to let Make run in the background, and haven't put a lot of thought into how it works. Obviously that will have to change. I'd like to thank you for your thoughtful response. I'm gratified that you took the time to engage in a technical analysis, and start the ball rolling on the design discussion. The points you raised merit thought and discussion. After reading over your mail a couple of times, I realized that I hadn't thought things through very well. In fact, rather than saying "hash instead of time", I should have said "optional additional hash check when timestamp has changed". I think this fixes all the performance concerns, and opens the door to adding additional checks (like the is-a-comment-only) check, which I think is an exciting idea. Here are my additional thoughts: 1 Maintaining the state. ======================== Your point about Make not maintaining any external state beyond what the filesystem tracks is well made. I'm reluctant to add the extra complexity of tracking extra state, and it's clear to me that this will likely be the source of some "Oh, I hadn't thought of that" moments. But in this case I think the benefit is worth the cost. 2 Adding additional "is-changed" checks. ======================================== You asked "what if people want to define their own "out-of-date-ness" test?". I found that a really exciting idea. As I thought about this, I realized I what I really want is not to replace Make's current behavior, but to add an additional check to the existing timestamp check. My thinking is that the timestamp is in fact an overly conservative test. We never have the case that the timestamp indicates something *has not* been changed when in fact it has (i.e. we always build if something has changed), but we do have an issue that building is unecessarily performed, causing an undue performance penalty -- the cost of building the target and its dependants. Thus we get a big build-time win whenever the additional test takes less time than building the target and its dependants. I think it's very important that Make remain reliable from the point of view that if something *should* be built, it *will* be built. Unecessarily rebuilding something is less of a fail than failing to rebuild something which should be. So I propose modify Make to accept a tool to perform additional checks, the first being a hash checker. Any additional checkers should have the property that while they may return a false positive, they never return a false negative (they never incorrectly say no, nothing important was changed). We need only specify the interface of that tool, and people can write tools which satisfy their needs -- I'm interested in exploring the hash tool first, but might be interested in making further such 'plugins', and projects with special needs could specify their own. Very exciting. As I see it, like this, the project becomes a way of simplifying the syntax of Yukimasa Sugizaki's suggestion, and officially supporting that workflow. My off-the-cuff suggestion for the interface of the external tool would be a simple executable, returning 0 if no rebuid is needed, 1 if one is needed, and perhaps another number(s) for error cases . This strikes me as having several advantages -- the biggest being the flexibility it offers Make users. For the case where users want to apply mutltiple additional criteria requiring state, this could be done in a single file. The only downside I see is the performance cost of starting and terminating the executable, but I'm assuming this will be small in comparision to the file-access operations, and non-existant compared to the cost of unecessary builds. I guess the relevant benchmark will be increase in clean build time, which I imagine will be negligent for most real cases. 3 One file per target ===================== - The issues you raised regarding one-file-per-directory are tricky and would significantly slow development. I especialy think the concurrency issues would be nice to avoid, at least in a first iteration. - One file per target would mean approximately factor 2 increase in the number of build targets. Not beautiful, but only systems which are already approaching their limits would be affected. These systems could continue using the default Make (timestamp based) behavior. - This somehow seems more consistent with Make's current behavior to me, which in turn seems lower risk. - I don't have any better ideas. - For projects on teams where 2n build targets is impractical, they can use the default, timestamp only behavior. 4 What kind of state? ===================== Based on the performance and reliability of GIT, I'm inclined to suggest using SHA1 stored in a one-file-per-target basis. To start with I think making it a text file is reasonable. I'm unfamiliar with xxhash, but I'm open to trying anything. With the right implementation it should be trivial to evaluate a few possibilities. 5 Perfromance implications ========================== As mentioned earlier, if we change the goal from replacing the time-stamp to supplementing the time-stamp, I think a lot of the performance implications fall away. The 'nothing-to-do' build will remain unchanged. The worst case scenario, I'm thinking is a full build, where no hashes have yet been written. As long as hash-generation and file-saving is negligible compared to build-time, that should be no problem. In the use-cases I deal with on a daily bassis (building big ugly c++ files), this will be easility satisfied. If you can think of some good test-cases where this might not be satisfied, let me know, and I'll run some benchmarks. Again though, if we keep the timestamp as default, project can decide based on their circumstances if the tradeoff is worthwhile. Per block sounds like a good idea as a later optimization, if we, or someone else determines it would be valuable. To start with I woulde keep it simple. 6 Next steps ============ My tentative suggestion, depending on your next feedback, is to do something like the following: - Determine a syntax for makefiles to specify which additional checks (and perhaps in what order) should be perfomed. I think this should be easy to use for one additional test, but open to adding additional tests later. It should be easy for Makefile generators (like autotools and cmake) to take advantage of. I could see using an environment variable, but I could also imagine being able to steer the beavior on a Makefile to Makefile, or target to target basis. I ask for input from the experts here. - Hash out in rough strokes how the call would be made -- my ad-hoc approach would be a seperate executable with integer return value indicating needs-rebuild, doesn't-need, or error, but again I ask for input from the experts. If that sounds reasonable, I should probably start poking around the Make codebase so I can get started at some point. Again, many thanks for your time, Glen Stark On Fri, 2015-03-27 at 11:48 -0400, Paul Smith wrote: > On Fri, 2015-03-27 at 11:45 -0400, Paul Smith wrote: > > * Do we really need to hash the file? Maybe simply expanding the > > current checking is sufficient. For example, if in addition to > > mod time we also considered the size of the file (and maybe > > other things maintained by the filesystem like inode, for tools > > which don't just overwrite the same file) we could increase our > > accuracy WITHOUT resorting to a separate state file. Is that > > good enough? > > Actually I typed faster than my brain: we still need a state file of > course to compare sizes. But at least it's still based on filesystem > metadata and doesn't require make to hash the contents of every file in > the build. > _______________________________________________ Bug-make mailing list Bug-make@gnu.org https://lists.gnu.org/mailman/listinfo/bug-make