On Wed, Sep 10, 2014 at 04:38:32PM -0400, David Malcolm wrote: > > So when last I played in this area, I wanted a command line tool that > > would bin-pack from the command line. I would then grab the seconds > > per for each .exp, and bin pack to the fixed N, where N was the core > > count or related to it like, like N+1, N*1.1+1, N*2, ceil(N*1.1)). > > Then, I would just have 60-100 bins, and that -j64 run would be nicer. > > The only reason why I didn’t push that patch up was I didn’t know of > > any such program. :-( I mention this in case someone knows of such a > > tool that is open source, hopefully GNU software. The idea being, if > > a user has a 64 cores or want the .exp files to be more balanced on > > their target, they can be bothered to download the tool, don’t have > > it, and you get something a little more static. > > > > Another way is to just make the buckets 60 seconds apiece. This way, > > have nice box, 60 seconds to test, otherwise, the test time is at > > most 1 minute unbalanced. > > Perhaps this is a silly question, but has anyone tried going the whole > way and not having buckets, going to an extremely fine-grained approach: > split out all of the dj work into three phases: > (A) test discovery; write out a fine-grained Makefile in which *every* > testcase is its own make target (to the extreme limit of > parallelizability e.g. on the per-input-file level) > (B) invoke the Makefile, with -jN; each make target invokes dejagnu for > an individual testcase, and gets its own .log file > (C) combine the results > > That way all parallelization in (B) relies on "make" to do the right > thing in terms of total number running jobs, available cores, load > average etc, albeit with a performance hit for all of the extra > reinvocations of "expect" (and a reordering of the results, but we can > impose a stable sort in phase (C) I guess). > > Has anyone tried this?
I fear that is going to be too expensive, because e.g. all the caching that dejagnu and our tcl stuff does would be gone, all the tests for lp64 etc. would need to be repeated for each test. Perhaps better approach might be if we have some way how to synchronize among multiple expect processes and spawn only as many expects (of course, per check target) as there are CPUs. E.g. if mkdir is atomic on all hosts/filesystems we care about, we could have some shared directory that make would clear before spawning all the expects, and after checking runtest_file_p we could attempt to mkdir something (e.g. testcase filename with $(srcdir) part removed, or *.exp filename / counter what test are we considering or something similar) in the shared directory, if that would succeed, it would tell us that we are the process that should run the test, if that failed, we'd know some other runtest did that. Or perhaps not for every single test, but every 10 or 100 tests or something. E.g. we could just override runtest_file_p itself, so that it would first call the original dejagnu version, and then do this check. Jakub