Performance loss when trying to use grouped targets

Stephen Touset via Users list for the GNU implementation of make Mon, 04 Mar 2024 11:27:25 -0800

I have a large project whose build process consists of running some
independent sub-build steps within several hundred directories. As a
preparatory step, we copy each of these directories into a temporary build
location:


# collect a list of all source files
sources := $(shell find src -mindepth 2 -type f)

define build_component
# drop a flag that indicates a successful build
build/.$(1): build/$(1)
[actual build process goes here]
touch $$@

# the build directory depends upon each of the individual files within it
build/$(1): $$(patsubst src/%,build/%,$(filter src/$(1)/%,$(sources))

# every file in each build subdirectory depends on its corresponding file
# in the src directory, but they can all be copied into place in one step
$$(patsubst src/%,build/%,$(filter src/$(1)/%,$(sources))) &: $$(filter
src/$(1)/%,$(sources))
rm -rf build/$(1)
cp -a src/$(1) build/$(1)

.SECONDARY: build/.$(1) build/$(1)
.SECONDARY: $$(patsubst src/%,build/%,$(filter src/$(1)/%,$(sources))
ended

# generate the build steps for each component
$(foreach component,$(components),$(eval $(call
build_component,$(component))))

We jump through all these hoops for some important reasons. The build
process often moves and/or removes files inside of the directories it’s
working with. The above rules ensures that if the build is interrupted or
fails a some point, if any files are missing for some component of the
build, we’ll blow away that directory and start from scratch. It also
ensures that each component is rebuilt if any of its individual sources
changes.

This is working well, but it doesn’t perform particularly well thanks to
hundreds of independent calls to cp that could theoretically be merged into
one. This is where I’m running into problems. I rewrote the patsubst rule
and its corresponding entry in .SECONDARY:

# every file in the build directory depends on its corresponding file in
# thee src directory, but they can all be copied into place in one step
$$(patsubst src/%,build/%,$(sources)) &: $(sources)
rm -rf build
cp -a src build

.SECONDARY: $$(patsubst src/%,build/%,$(sources))

When run in one thread, this works as expected. It cuts down significantly
on the overall build speed. When run in parallel, though, it unfortunately
causes a massive slowdown. It seems that make isn’t efficiently sharing
analysis info amongst parallel workers. It does function as expected, but
debug output shows that tens of millions of Pruning file steps are
generated before any work is performed.

I think the fundamental issue is that this rule doesn’t quite express what
I want. The above rule says, roughly, that “each file in the build
directory depends upon *all* the files in the source directory”, and so a
freshness check for each individual file in the build fans out to a
freshness check on every file in the source. This needs to be done over and
over and over. Is it possible to have our cake and eat it too? Is there a
way to write a rule that expresses something closer to “each file in the
build directory depends upon its corresponding file in the source
directory” but still uses grouped targets?

Note: The above rules were simplified for clarity from what’s in our actual
build system. I may have typo’d something in the process. I apologize
preemptively if there winds up being any confusion caused by this.

Performance loss when trying to use grouped targets

Reply via email to