CMake Lists

by | Mar 15, 2021 | CMake | 0 comments

So, let me warn you right away – I rant a little in this one. I generally like CMake and I think it’s currently the best tool out there for building cross-platform C++ projects, but it does suffer from some of the (what I assume are) legacy design decisions. One of them is that virtually everything is a string, including the main subject of this post – lists. Let’s get right into it.

Just a string, but also a list.

What is a list? In the broadest sense, it’s a sequential collection, or container, of elements. We may want to be more precise about it and consider implementation details like if it’s a node-based or contiguous memory layout. We could also be all functional and say that a list is a data structure that consists of a head and a tail, where the tail is itself a list, that may be empty. But let’s move on before we get too philosophical.

CMake takes a different approach. A concrete definition could be formulated as follows:

A CMake list is a semicolon-separated sequence of elements.

And since everything in CMake is a string, this means that a list is a semicolon-separated sequence of strings, making itself a string. Because who needs a type system, right? This may also be true the other way around – a string may be a list, but isn’t necessarily one. Kind of like every square is a rectangle, but not every rectangle is a square. You get the point. Maybe. If not, here’s an example of how a list might be declared:

set(imaList1 "This;is;a;list")
set(imaList2 This is a list as well)

set(notaList "This is just a string")

message("imaList1: ${imaList1}")
message("imaList2: ${imaList2}")
message("notaList: ${notaList}")

The above results in the following output

$ cmake -S . -B build
imaList1: This;is;a;list
imaList2: This;is;also;a;list
notaList: This is not a list

As you can see a list may be declared using the set command. This may be done explicitly – by assigning the variable to a string containing semicolons, or implicitly, by specifying each element separated with whitespace. In fact, anytime you supply a sequence of whitespace-separated arguments to a command they are implicitly converted to a list. A caveat of that is that if whitespace is supposed to be part of the string, the entire string must be quoted – this is the only way to disambiguate.

So we have a list, let’s look at what can be done with it.

Operations on lists

CMake lists can be iterated, searched, sorted, reversed, transformed. Recent versions of CMake support quite a rich set of operations – pretty much everything you’d expect is there. I’ve already shown how to define a list, let’s move on to something equally basic – appending.

set(foo A B C)
set(bar 2 3 4)

set(foo ${foo} D)
set(bar "1;${bar}")
set(foobar ${foo} ${bar})

This results in the following:

foo: A;B;C;D
bar: 1;2;3;4
foobar: A;B;C;D;1;2;3;4

As you can see appending (and prepending actually) can be done with the set command and variable expansion. Nothing surprising there, really, just relying on the basic properties of a list described in the previous paragraph. This may not be the most readable way to do it though, so allow me to introduce the list command – it supports a wide range of operations.

set(foo A B C)
set(bar 2 3 4)

list(APPEND foo D)
list(PREPEND bar 1)
list(APPEND foobar ${foo} ${bar})

list(POP_BACK foobar four three)
list(POP_FRONT foobar a b)
list(INSERT foobar 2 ${four} ${a} ${three} ${b})

This definitely improves readability by making the intent explicit. The last one may be a little iffy – there’s no “concatenate” mode for the list command, so the way to achieve that is to expand the lists you wish to concatenate and append their elements to the resulting variable. I’d probably use set here, it’s shorter and just as, if not more, readable in my opinion. There’s also element removal at both back and front and insertion at an arbitrary offset. All of these support operations on multiple elements at the same time. Anyway, here’s the result:

foo: A;B;C;D
bar: 1;2;3;4
foobar: C;D;4;A;3;B;1;2

Index-based operations

As you may have gleaned from the last example the list command also supports a set of index-based operation modes. I have already shown INSERT. There’s also LENGTH, GET, and REMOVE_AT.

set(foobar 1 a 2 b 3 c 4 d)

list(LENGTH foobar foobar_len)
list(FIND foobar 2 index_2)
list(SUBLIST foobar ${index_2} 4 middle)
list(GET foobar 1 3 5 7 alpha)
list(REMOVE_AT foobar 1 3 5 7)

The output if printed would be:

foobar_len: 8
foobar: 1;2;3;4
middle: 2;b;3;c
alpha: a;b;c;d

It’s all quite obvious, really. The indices are 0-based, as one would expect. All indexing commands support operation on multiple indices in one call.

Mutating operations

Technically most of the examples shown have been mutating (modifying) the lists, but I didn’t know what to call this subparagraph, so here we are.

There are a few more modes available: REMOVE_ITEM, REMOVE_DUPLICATES, REVERSE, SORT, TRANSFORM, and FILTER. I won’t cover all of these since most do exactly what they advertise and are absolutely straightforward to use. Transform supports regular expressions so it is quite powerful. A super simple useless example follows.

set(foobar 1 " 2A" 3 "4B " 5 6C)
list(TRANSFORM foobar STRIP)        # remove whitespace
list(TRANSFORM foobar TOLOWER)      # to lower case
list(TRANSFORM foobar REPLACE "([0-9])([a-z])" "\\2\\1\\2" REGEX "[0-9][a-z]")

This results in the following

foobar: 1;a2a;3;b4b;5;c6c

Iteration

When you need to perform operations on each element of a list and the job is too complex for what the list command has to offer, you can always fall back on manually iterating over the elements of the list using foreach. This may also be more readable in some cases. This isn’t a post about foreach, so I won’t go into too much detail. But the command offers some choices.

The most obvious one is to just iterate element-wise over the list:

foreach(elem ${my_list})
    # use elem here
endforeach()

or

foreach(elem IN LISTS my_list other_list)
    # use elem here
endforeach()

This is straightforward. The second form allows for easy iteration over multiple lists in one go.

If you need to know the offset of each element for some reason you can use range (index) based foreach. Note that it iterates up to and including the given end of the range, so you’ll need to use the math command to avoid undefined behavior:

set(foobar 1 a 2 b 3)

list(LENGTH foobar foobar_len)
# subtract 1 from foobar_len and place the result back in foobar_len
math(EXPR foobar_len "${foobar_len} - 1")

foreach(i RANGE ${foobar_len})
    list(SUBLIST foobar ${i} 2 subbar)
    message("${i}: ${subbar})
endforeach()

This results in the following:

0: 1;a
1: a;2
2: 2;b
3: b;3
4: 3

Note that it’s actually safe to specify a fixed SUBLIST length – it always truncates to the valid range.

A recent addition to the foreach family is the IN ZIP_LISTS form:

set(foo 1 2 3)
set(bar a b c)
foreach(al num IN ZIP_LISTS foo bar)
    message("${al}: ${num}")
endforeach()
$ cmake -S . -B build
1: a
2: b
3: c

This may further reduce the number of use cases for index-based iteration in your CMake code. My personal preference is to stay away from explicit loops unless absolutely necessary. I find the list command operations much more succinct and more readable.

Summary

Once you get past the fact that CMake has no type system of any kind and lists are actually just strings, they can be really useful and rather easy to use. Just keep in mind that the semicolons and whitespace are special in CMake and you’ll be fine.

 

References

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Share This