kmod's blog

25Mar/144

Troubles of using GNU Make

There seem to be lots of posts these days about people "discovering" how using build process automation can be a good thing.  I've always felt like the proliferation of new build tools is largely a result of peoples' excitement at discovering something new; I've always used GNU Make and have always loved it.

As I use Make more and more, I feel like I'm getting more familiar with some of its warts.  I wouldn't say they're mistakes or problems with Make, but simply consequences of the assumptions it makes.  These assumptions are also what make it so easy to reason about and use, so I'm not saying they should be changed, but they're things I've been running into lately.

Issue #1: Make is only designed for build tasks

Despite Make's purpose as a build manager, I tend to use it for everything in a project.  For instance, I use a makefile target to program microcontrollers, where the "program" target depends on the final build product, like this:

program.bin: $(SOURCES)
    ./build.py

.PHONY: program
program: program.bin
    ./program.py program.bin

 

This is a pretty natural usage of Make; typing "make program" will rebuild what needs to be remade, and then calls a hypothetical program.py to program the device.

 

Making the outcome more complicated, though, quickly makes the required setup much more complicated.  Let's say that I also want to use Make to control my actual program -- let's call it run.py -- which communicates with the device. I want to be able to change my source files, type "make run", and have Make recompile the program, program the microcontroller, and then call run.py.  The attractive way to write this would be:

.PHONY: run
run: program other_run_input.bin
    ./run.py other_run_input.bin

 

This has a big issue, however: because "program" is defined as a phony target, Make will execute it every time, regardless of whether its prerequisites have changed.  This is the only logical thing for Make to do in this situation, but it means that we'll be programming the microcontroller every time we want to run our program.

How can we avoid this?  One way is to have "program" be an actual file that gets touched, so that program is no longer a phony target, with the result that we track the last time the microcontroller was programmed and will only reprogram if the binary is newer.  This is pretty workable, although ugly, and for more complicated examples it can get very messy.

 

Issue #2: Make assumes that it has no overhead

There are two main ways to structure a large Makefile project: using included Makefiles, or to use recursive Makefiles.  While the "included Makefiles" approach seems to often be touted as better, many projects tend to use a recursive Make setup.  I can't speak for other projects for why they choose to do that, but one thing I've noticed is that Make can itself take a long time to execute, even if there are no recipes that are executed.  It seems not too surprising: with a large project with hundreds or thousands of source files, and many many rules (which can themselves spawn exponentially more implicit search paths), it can take a long time to determine if anything needs to be done or not.

This often isn't an issue, but for my current project it is: I have a source dependency on a large third-party project, LLVM, which is large enough that it's expensive to even check to see if there is anything that needs to be rebuilt.  Fortunately, I very rarely modify my LLVM checkout, so most of the time I just skip checking if I need to rebuild it.  But sometimes I do need to dive into the LLVM source code and make some modifications, in which case I want to have my builds depend on the LLVM build.

This, as you might guess, is not as easy as it sounds.  The problem is that a recursive make invocation is not understood by Make as a build rule, but just as an arbitrary command to run, and thus my solution to this problem runs into issue #1.

 

My first idea was to have two build targets, a normal one called "build", and one called "build_with_llvm" which checks LLVM.  Simple enough, but it'd be nice to reduce duplication between them, and have a third target called "build_internal" which has all the rules for building my project, and then let "build" and "build_with_llvm" determine how to use that.  We might have a Makefile like this:

.PHONY: build build_internal build_with_llvm llvm
build_internal: $(SOURCES)
    ./build_stuff.py

build: build_internal
build_with_llvm: build_internal llvm

 

This mostly works; typing "make build" will rebuild just my stuff, and typing "make build_with_llvm" will build both my stuff and LLVM.  The problem, though, is that build_with_llvm does not understand that there's a dependency of build_internal on llvm.  The natural way to express this would be by adding llvm to the list of build_internal dependencies, but this will have the effect of making "build" also depend on llvm.

Enter "order-only dependencies": these are dependencies that are similar to normal dependencies, but slightly different: it won't trigger the dependency to get rebuilt, but if the dependency will be rebuilt anyway, the target won't be rebuilt until the dependency is finished.  Order-only dependencies sound like the thing we want, but they unfortunately don't work with phony targets (I consider this a bug): phony order-only dependencies will always get rebuilt, and behave exactly the same as normal phony dependencies.  So that's out.

The only two solutions I've found are to either 1) use dummy files to break the phony-ness, or 2) use recursive make invocations like this:

build_with_llvm: llvm
    $(MAKE) build_internal

This latter pattern solves the problem nicely, but Make no longer understands the dependence of build_with_llvm on build_internal, so if there's another target that depends on build_internal, you can end up doing duplicate work (or in the case of a parallel make, simultaneous work).

Issue #3: Make assumes that all build steps result in exactly one modified file

I suppose this is more-or-less the same thing as issue #1, but feels different in a different context: I'm using a makefile to control the building and programming of some CPLDs I have.  The Makefile looks somewhat like this:

# Converts my input file (in a dsl) into multiple cpld source files:
cpld1.v: source.dsl
    ./process.py source.dsl # generates cpld1.v and cpld2.v

# Compile a cpld source file into a programming file (in reality this is much more complicated):
cpld%.svf: cpld1.v
    ./compile.py cpld%.v

program: cpld1.svf cpld2.svf
    ./program.py cpld1.svf cpld2.svf

 

I have a single input file, "source.dsl", which I process into two Verilog sources, cpld1.v and cpld2.v.  I then use the CPLD tools to compile that to a SVF (programming) file, and then program that to the devices.  Let's ignore for the fact that we might want to be smart about knowing when to program the cplds, and just say we only call "make program" as the target.

The first oddity is that I had to choose a single file to represent the output of processing the source.dsl file.  Make could definitely represent that both files depended on processing that file, but I don't know of any other way of telling it that they can both use the same execution of that recipe, ie that it generates both files.  We could also make both cpld1.v and cpld2.v depend on a third phony target, maybe called "process_source", but this has the same issue with phony targets that it will always get run.  We'll need to make sure that process.py spits out another file that we can use as a build marker, or perhaps make it ourselves in the Makefile.

In reality, I'm actually handling this using a generated Makefile.  When you include another Makefile, by default Make will check to see if the candidate Makefile needs to be rebuilt, either because it is out of date or because it doesn't exist.  This is interesting because every rule in the generated makefile implicitly becomes dependent on the the rule used to generate the Makefile.

 

Another issue, which is actually what I originally meant to talk about, is that in fact process.py doesn't always generate new cpld files!  It's common that in modifying the source file, only one of the cpld.v outputs will get changed; process.py will not update the timestamp of the file that doesn't change.  This is because compiling CPLD files is actually quite expensive, with about 45 seconds of overhead (darn you Xilinx and your prioritization of large projects over small ones), and I like to avoid it whenever possible.  This is another situation that took quite a bit of hacking to figure out.

 

 

Conclusion

Well this post has gotten quite a bit more meandering than I was originally intending, and I think my original point got lost (or maybe I didn't realize I didn't have one), but it was supposed to be this: despite Make's limitations, the fact that it has a straightforward, easy to understand execution model, it's always possible to work around the issues.  If you work with a more contained build system this might not be possible, which is my guess as to why people branch off and build new ones: they run into something that can't be worked around within their tool, so they have no choice but to build another tool.  I think this is really a testament to the Unix philosophy of making tools simple and straightforward, because that directly leads to adaptability, and then longevity.

Comments (4) Trackbacks (0)
  1. With regards to point #3, make doesn’t care how many files a target produces, only that it’s deterministic and that your chosen target represents the overall up-to-dateness of the rule. If none of the real output files are suitable, you can just use a fake “marker” file and touch it when the target is up to date.

    Frankly, tools that dump many output files in a single step are a design smell and actively work against usability (“where did all these files come from?”). LaTeX builds are notoriously hard to encapsulate in a Makefile, but they’re also very annoying for humans too, in that you have to visually scan the output to see how many more times you need to run the tools, 1970s style. I don’t think this can be blamed on make — it’s a hard problem to solve in a general-purpose tool and would be much better solved by the tools in questions having incremental output.

  2. Your point #1 can usually be solved with a “marker” too. Just use the timestamp of an empty file to remember when you last ran a target.

    I use this approach for storing GNOME GSettings preferences in INI files. The files are parsed (by a little AWK script) and passed as arguments to the gsettings tool, which doesn’t produce any output files (well it does, but the tool is opaque about how it’s done). So I use a marker file as the target, and the work is done as a side-effect of the recipe. The DAG and the timestamps are still correct — it’s just a little more indirect than usual.

  3. Make can indeed be told that multiple outputs are produced by the same execution of a rule (your point #3). This has to be done via implicit rules and sometimes requires the evil SECONDEXPANSION. For something simple, consider the following makefile:

    FLAVORS = 1 2 3

    FOO = $(foreach flavor,$(FLAVORS),foo$(flavor).out)
    BAR = $(foreach flavor,$(FLAVORS),bar$(flavor).out)

    $(foreach flavor,$(FLAVORS),foo$(flavor).%) : program
    @echo “$*”
    @echo “$(@F)”

    $(BAR): %.out : program
    @echo “$*”
    @echo “$(@F)”

    .PHONY : foo bar

    foo: $(FOO)

    bar: $(BAR)

    Try
    make -n foo
    make -n bar

    though I may have misunderstood you.


Leave a comment

No trackbacks yet.