CMake Functions and Macros

by | Feb 1, 2021 | CMake | 0 comments

Every scripting language needs to provide some facilities for code reuse. CMake is no different. It enables users to define both functions and macros allowing for encapsulation of repetitive tasks. However, they might be somewhat inconvenient to use at first. This is especially true for functions that are expected to return a value. Due to CMake language scope rules, this requires some special handling and is rather unintuitive, which often causes people to relegate to macros altogether.

In this post I’ll discuss the basic use of functions and macros in CMake. Once the basics are covered I then move on to demonstrating the uses of cmake_parse_arguments.

Functions and Macros basics

At a first glance, CMake functions and macros behave exactly as one would expect, having experience with any of the C/C++ family of languages. The most important difference between the two is that functions introduce a new scope, whereas macros don’t – their body is essentially copy-pasted into the call site.

Functions and macros can be defined as follows:

function(myFunction foo bar)
    message("myFunction: ${foo}, ${bar}")
endfunction()
macro(myMacro foo bar)
    message("myMacro: ${foo}, ${bar})
endmacro()

The parameters foo and bar are regular variables in case of the function. In the macro’s case they are string-substitutions, but for day-to-day use this shouldn’t make much difference to the user – they’re still accessed the same way within the macro’s body.

Once defined, both can be called just like builtin CMake commands:

myFunction("hello" "functions")
myMacro("hello" "macros")

Executing this code results in the expected output:

$ cmake -S . -B build
myFunction: hello, functions
myMacro: hello, macros

Nothing surprising there. Before we move on to more interesting examples let me touch on one important point.

Because macros don’t introduce new scope they’re often chosen over functions for their ease of use – it’s much more intuitive to “return” a value by defining or mutating a variable in the caller’s scope than it is to use a function. I’d advise against it for the same reason one would avoid macros in C/C++ – it’s best not to pollute the calling scope with whatever variables are defined within the macro’s body. Accidentally overwriting existing variables is also likely. I’d suggest sticking with functions whenever possible. The little extra effort is worth it, as it could prevent some debugging sessions as the project grows.

Scope and returning values

As I already mentioned functions introduce a new scope, whereas macros don’t. This makes macros easier to use when returning a value is necessary. Neither functions nor macros in CMake have a dedicated return channel. Returning is done via an output parameter. With macros, this is obvious and is often done implicitly. A simple set command will define a variable in the caller’s scope

macro(setFoo value)
    set(Foo ${value})
endmacro()

setFoo("Assign this value to Foo")
if (DEFINED Foo)
    message("Foo: ${Foo}")
else()
    message("Foo is undefined")
endif()
$ cmake build
setFoo: Assign this value to Foo
Foo: Assign this value to Foo

With functions it’s not so simple:

function(setBar value)
    set(Bar ${value})
    message("setBar: ${Bar}")
endfunction()

setBar("Assign this value to Bar")
message("${Bar}")

if (DEFINED Bar)
    message("Bar: ${Bar}")
else()
    message("Bar is undefined")
endif()
$ cmake build
setBar: Assign this value to Bar
Bar is undefined

The variable Bar is set, but only within the functions scope. It is no longer accessible once the function returns.

There’s a simple solution to this. The set command takes an optional argumet – PARENT_SCOPE. It causes the variable to be set in the parent (calling) scope, rather than in the current scope. Note however, that it sets the variable ONLY in the parent scope, not both locally AND in parent scope. It’s best to see an example. Simply adding PARENT_SCOPE, to the setBar function will result in the following:

function(setBar value)
    set(Bar ${value} PARENT_SCOPE)
    message("setBar: ${Bar}")
endfunction()
# Same as before
$ cmake build
setBar: 
Bar: Assign this value to Bar

The Bar variable is now defined in the caller’s scope but undefined in the function scope. This is most often what’s needed when returning a value. If defining the variable in both scopes is required simply call set twice:

function(setBar value)
set(Bar ${value})
set(Bar ${Bar} PARENT_SCOPE)
message("setBar: ${Bar}")
endfunction()
# Same as before

$ cmake build
setBar: Assign this value to Bar
Bar: Assign this value to Bar

This simple mechanism allows to easily return values from CMake functions and greatly reduces the use cases for macros. Functions give much greater control over what variables we’re exposing the calling scope to.

One more thing. In the above example the name of the variable defined in the calling scope was implicit (it should be at least documented in real code). As an alternative, and common good practice, the name of the output variable(s) can be specified as the arguments to the function, like so:

function(setVariable varName value)
    set(${varName} ${value})
    set(${varName} ${${varName}} PARENT_SCOPE)
    message("setVariable: ${varName} = ${${varName}}")
endfunction()

setVariable(Foobar "Assign this value to Foobar")
message("Foobar = ${Foobar}")
$ cmake build
setVariable: Foobar = Assign this value to Foobar
Foobar = Assign this value to Foobar

Here the variable varName holds the name of the variable to be set in the parent scope – this means that it needs to be dereferenced – set(${varName} ${ARGN}). Neglecting to do so, would overwrite the variable varName rather than set whatever value it holds. One more caveat – this also means that if accessing the value of the variable within the function scope is necessary, varName needs to be dereferenced twice – message(“setVariable: ${varName} = ${${varName}}”).

For the remainder of the post, I’ll discuss everything in terms of functions. Know that macros could be used in exactly the same way. The only significant difference being the lack of introduced scope and all the consequences it carries.

Functions in detail

Let’s take a closer look at the trivial function defined in the beginning of this post.

function(myFunction foo bar)
    message("myFunction: ${foo}, ${bar}")
endfunction()

We’ve given it two parameters – foo and bar. These are the required parameters. However, every CMake function accepts an arbitrary number of arguments. The remaining arguments are accessible via automatically defined variables:

  • ARGV – A list of all arguments passed to the function (including the required ones).
  • ARGN – A list of non-required arguments passed to the function.
  • ARGC – The total number of arguments passed to the function.
  • ARGVx – Where x = index of the argument. This gives access to every argument in the function.

Let’s see an example of how these could be used.

function(myFunction foo bar)
    message("myFunction:")
    message("  foo: ${foo}, bar: ${bar}")
    message("  ARGC: ${ARGC}")
    message("  ARGV: ${ARGV}")
    message("  ARGN: ${ARGN}")

    math(EXPR indices "${ARGC} - 1")
    foreach(index RANGE ${indices})
        message("  ARGV${index}: ${ARGV${index}}")
    endforeach()
endfunction()

Again, nothing really surprising. The only difficulty may be remembering which is which, with the ARGV and ARGN variables. I used to always get these mixed up.

Note that when looping over arguments using the ARGVx variables it is necessary to decrement the value of ARGC by 1, because the foreach(… RANGE …) loop is inclusive. Neglecting to decrement the count would result in an attempt to access a non-existant argument, which is technically undefined behavior.

As a more practical example, we’ll implement a function that encapsulates the code defining a test target. Using just CMake builtins this could be done as follows

add_executable(testFoo test_foo.cpp)
target_link_libraries(testFoo PRIVATE Bar)
add_test(NAME    testFoo
         COMMAND testFoo
        )

This code should be self explanatory. If you’re unfamiliar with these commands check out my CMake Fundamentals series [link].

This process gets repetitive very quickly. Let’s wrap it in a function.

function(AddTest targetName dependency)
    add_executable(${targetName} ${ARGN})

    target_link_libraries(${targetName} PRIVATE ${dependency})

    add_test(NAME ${targetName}
             COMMAND ${targetName}
            )
endfunction()

The AddTest function takes required targetName and dependency arguments. The remaining arguments are passed in as sources to the test target. The call site would look as follows:

AddTest(testFoo Bar test_foo.cpp)

This gets the job done, however the interface is a litte awkward. The name of the test is passed in first, then a dependency, then source files for the test. That’s somewhat arbitrary, and not very readable. Additionally, only a single dependency can be given, which is a serious limitation.

CMake provides facilities that allow us to do better. It’s possible to implement functions that behave and look just like built-in CMake commands.

cmake_parse_arguments

CMake provides a builtin command for parsing function and macro arguments – cmake_parse_arguments. With the help of the cmake_parse_arguments function it is possible to implement functions that look just like native CMake commands. And with not too much effort.

Let’s consider how we’d like to interface to AddTest to look like. Specifying the name of the test as the first required argument is probably fine, so let’s leave it as is. The remainder o the interface needs to be reworked. Both the test’s sources and its dependencies could potentially be arbitrarily long lists. Moreso, the dependencies should be optional, but at least on source file is required. The call site could look as follows:

AddTest(testFoo
        SOURCES test_foo.cpp
        DEPENDENCIES Bar
        )

Much more readable than before. This exact interface can be implemented quite easily with cmake_parse_arguments. Let’s see how.

function(AddTest targetName)
    set(flags)
    set(args)
    set(listArgs SOURCES DEPENDENCIES)

    cmake_parse_arguments(arg "${flags}" "${args}" "${listArgs}" ${ARGN})
    
    if (NOT arg_SOURCES)
        message(FATAL_ERROR "[AddTest]: SOURCES is a required argument")
    endif()
    if (SOURCES IN_LIST arg_KEYWORDS_MISSING_VALUES)
        message(FATAL_ERROR "[AddTest]: SOURCES requires at least one value")
    endif()

    add_executable(${targetName} ${arg_SOURCES})

    target_link_libraries(${targetName} PRIVATE ${arg_DEPENDENCIES})

    add_test(NAME    ${targetName}
             COMMAND ${targetName}
            )

endfunction()

Same as before, targetName is a required argument, it specifies the name of both the executable and the test target.

The boilerplate code necessary to setup the cmake_parse_arguments call follows. Three variables are defined – flags, args and listArgs. The naming of these variables doesn’t matter. All that matters is the order they’re passed in into the cmake_parse_arguments command:

cmake_parse_arguments(arg "${flags}" "${args}" "${listArgs}" ${ARGN})

The cmake_parse_arguments takes four required positional arguments, plus arbitrary number of arguments to parse – here ${ARGN}. The first one – arg – is the prefix that will be added to each of the remaining argument values. This can be any string of our choosing. See how in the body of the function the SOURCES and DEPENDENCIES are used with “arg” prepended – arg_SOURCES and arg_DEPENDENCIES? That’s how the prefix is used – cmake_parse_arguments output arguments are defined with the given prefix. With functions it doesn’t matter that much what prefix is used – since functiond define a scope of their own, all we need to be carefull about is to not clash with any other variables within the function’s scope. With macros a little more care is required, since the macro is messing with the caller’s scope. A common practice is to use an all uppercase macro name as the prefix. Or you know, just use a function instead.

The second argument – flags variable – is a list of boolean flags, or options, that the function accepts.

The third argument – args variable – is a list of single-value keyword arguments. A good example of this would be the NAME keyword of the built-in add_test command.

The fourth argument – listArgs variable – is a list of keyword arguments that accept lists of values. In this example both the SOURCES and DEPENDENCIES can take an arbitrary number of, or a list, of values. These arguments are used exactly the same as, for example, the PRIVATE specifier in target_link_libraries – everything following the keyword, up to another keyword or the closing paranthesis is appended to the list.

The remaining arguments passed to cmake_parse_arguments will be treated as the arguments to parse. Here ${ARGN} is passed in. This is the most common use of cmake_parse_arguments. There are other forms that accept all arguments – ${ARGV} – but it is not required most of the time, so I won’t discuss it here.

As already mentioned, cmake_parse_arguments will return the processed arguments by prepending the specified prefix to them. We then do some error handling. The SOURCES arugment is required, we check that it has been specified and that at least one value has been given:

if (NOT arg_SOURCES)
    message(FATAL_ERROR "[AddTest]: SOURCES is a required argument")
endif()
if (SOURCES IN_LIST arg_KEYWORDS_MISSING_VALUES)
    message(FATAL_ERROR "[AddTest]: SOURCES requires at least one value")
endif()

The first if should be obvious – the arg_SOURCES variable will be defined only if SOURCES is specified as one of the arguments to AddTest. The cmake_parse_arguments also defines a set of variables purely for error-handling, one of more useful ones is <prefix>KEYWORDS_MISSING_VALUES – this is a list that will contain any of the keyword arguments that were passed in to the function, but that didn’t have any value assigned to them. Here we check it to ensure that SOURCES is given at lest one value.

The remainder of the function body is pretty much the same as before. Only the names of the variables have changed.

This newly defined function could be called exactly as we specified before:

AddTest(testFoo
        SOURCES test_foo.cpp
        DEPENDENCIES Bar
        )

If the caller neglects to pass a value to the SOURCES keyword argument, or neglects to specify it alltogether, an error message will be printed:

AddTest(testFoo
        SOURCES
        DEPENDENCIES Bar
        )
$ cmake build
CMake Error at CMakeLists.txt:50 (message):
  [AddTest]: SOURCES requires at least one value
Call Stack (most recent call first):
  CMakeLists.txt:77 (AddTest)

The interface of the AddTest function could be expanded further. The builtin add_test allows for specifying the working directory of the test – this is convenient when the test code depends on some runtime data – it’s easy to access it relative to the working directory. This argument could be exposed by using the single-value arguments list.

It can also be useful to specify that a test is expected to fail – either when writing negative test cases, or otherwise when a failing test should be temporarily silenced. This is normally done with the WILL_FAIL test property. Here, the AddTest function could expose the same functionality using the flags variable.

The extended interface can be implemented like so:

function(AddTest targetName)
    set(flags WILL_FAIL)
    set(args WORKING_DIRECTORY)
    set(listArgs SOURCES DEPENDENCIES)

    cmake_parse_arguments(arg "${flags}" "${args}" "${listArgs}" ${ARGN})
    
    # ...
    if (WORKING_DIRECTORY IN_LIST arg_KEYWORDS_MISSING_VALUES)
        message(FATAL_ERROR "[AddTest]: WORKING_DIRECTORY is missing a value")
    endif()

    add_test(NAME    ${targetName}
             COMMAND ${targetName}
             WORKING_DIRECTORY ${arg_WORKING_DIRECTORY}
            )

    set_tests_properties(${targetName} PROPERTIES WILL_FAIL ${arg_WILL_FAIL})

endfunction()

The call to cmake_parse_arguments itself is unchanged. The WILL_FAIL and WORKING_DIRECTORY keywords were added to the variables defined before. That’s why it might be worth it to always define these. The logic itself is straightforward. The passed in values are propagated to the appropriate builtin facilities. There’s some extra error checking for the WORKING_DIRECTORY argument, that should be clear by now.

The updated function can be called like so:

AddTest(testFoo
        SOURCES test_foo.cpp
        DEPENDENCIES Bar
        WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
        WILL_FAIL
        )

And that’s it. Knowing how to use cmake_parse_arguments gives you the ability to write almost arbitrarily complex functions that look exactly the same as builtin CMake commands. It should now be easy to encapsulate repetitive tasks and cleanup you CMakeLists files.

Summary

Any code base relies on some form of abstraction to ensure maintainability. A build system is no different. Recognizing and encapsulating common patterns, and abstracting away implementation details quickly pays off. Using such basic abstractions as functions and macros in combination with cmake_parse_arguments should be sufficient for most use cases.

Have you been using functions or macros at all, or are you relying strictly on built-in CMake commands? Do you see any use for cmake_parse_arguments or encapsulation at all in your own CMake scripts? Or are you fine with relying strictly on built-in CMake commands?

Refrences

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Share This