Oleksandr Koval’s blog

Multi-version Doxygen documentation with GitHub Pages

2024-06-05T08:05:00+00:00

Introduction
Problems with mono-version documentation
Welcome multi-version documentation
Prerequisites
Overall design
Version switch mechanics
Automation
- Actions
- Workflows
Upgrading from a mono to multi-version documentation
Wrap-up

Introduction

In this article I’ll show how to host a multi-version Doxygen documentation on GitHub Pages and automate its generation using GitHub Actions.

Problems with mono-version documentation

I have a small hobby project that generates documentation using Doxygen and hosts it on GitHub Pages. Nothing special but as the project begins to evolve, I realized that having a single documentation is not nice for multiple reasons:

it requires special notes like “available/deprecated since version N” around API.
it requires similar notes for non-API things, like describing overall design, recommended practices, examples.
the above notes are not generally useful because changes are usually driven by users who are supposed to migrate to the latest version.

Basically, over the time, documentation gets polluted with those notes to cover all the existing versions. Of course, another approach is just to remove older documentation altogether but it’s not friendly to users that for some reasons can’t migrate to the latest version immediately.

Welcome multi-version documentation

All of the above problems can be solved when a project has dedicated documentation for each release. Pretty every large project like Boost or Python uses this approach. Unfortunately, Doxygen doesn’t support this out of the box. It generates a completely standalone set of HTML pages for a given version but has no functionality to combine multiple such sets and allow the user to switch between them.

I’ve found no existing tutorial to achieve this but ChatGPT suggested an approach and guided me through most of its steps. Note that I’m not a front-end nor a DevOps expert so there’s a chance that there’s a space for further improvement but, overall, it should be a solid start for anyone with a similar goal.

Prerequisites

This tutorial uses CMake 3.29.3 and Doxygen 1.10. It also assumes that project version is specified using CMake project() command and available via ${PROJECT_VERSION} variable.

The very basic CMake + Doxygen setup looks like this:

find_package(Doxygen REQUIRED)

# Doxygen options in the form of `DOXYGEN_`...

doxygen_add_docs(
    doc                      # target name
    "${PROJECT_SOURCE_DIR}"  # sources to scan for docs
)

All the following changes are made on top of it. It’s not a problem if you’re using another approach, just apply similar settings in a way you prefer.

The final demo repository is here, it contains all the parts described below. Its docs are here.

Overall design

Here’s the directory structure we need:

/           # docs root
    1.0.0/  # version-specific docs
    2.0.0/

These dirs will be located in a separate branch (e.g. gh-pages) that contains nothing but the docs themselves. On each release (or another event), a new documentation will be generated using Doxygen and pushed to the root of that branch into a version-specific directory. As was said above, generated docs know nothing about each other so our two main problems are:

version switch mechanics
automated docs generation and population of the docs branch

Version switch mechanics

Adjusting folder names

By default, Doxygen generates docs into html directory, to have them in a version-specific directory we can use HTML_OUTPUT option:

set(DOXYGEN_HTML_OUTPUT "${PROJECT_VERSION}")

It will produce the docs into a path like build_dir/doc/1.0.0, where doc is the Doxygen target name. But there’s a minor problem here. When we build the doc target, we don’t know the actual version and hence the path to generated docs. Without this knowledge it’s pretty hard to automate the process. To solve it, let’s add another directory into that path using Doxygen OUTPUT_DIRECTORY setting:

set(DOXYGEN_OUTPUT_DIRECTORY "docs")

Now, docs are located in build_dir/doc/docs/1.0.0 and the new docs directory holds nothing else but our version-specific docs. To determine the generated version, we can simply enumerate directories in docs, actually, there will always be a single one. This trick will be used later in the building documentation section.

Main page

OK, now we can have docs in separate version-specific directories but what should be the main page? When docs are hosted on a standalone server over which you have full control, you can try to update the path to index page on each new release. GitHub Pages is not so flexible, it always looks for index.html in the root directory. To make it work, we have to implement HTML redirect:




    
    
    Redirecting...


    If you are not redirected automatically, click here.

The target URL changes with each release so the generation of this file will be automated later. Here’s how the root directory will look like at this stage:

/           # root of our documentation
    1.0.0/  # version-specific directories
        index.html # version-specific main page generated by Doxygen
        ...
    2.0.0/
        index.html
        ...

    index.html # redirects to the latest release docs, e.g. `2.0.0/index.html`

Version selector

Now comes the interesting part, we need that dropdown selector element that does the actual switch between versions. Doxygen has PROJECT_NUMBER option, when set, it nicely displays the project version next to the project name. doxygen_add_docs automatically sets it to ${PROJECT_VERSION}. If you’re using standalone Doxygen config, be sure to set it by hand. That static number has to be replaced with a version selector. Doxygen allows customization of the header HTML part that’s common for all pages. First, we need to generate the default one:

doxygen -w html header.html footerFile styleSheetFile Doxyfile.doc

Here, Doxyfile.doc is Doxygen configuration file generated by doxygen_add_docs, it can be found in the binary directory of the doc target (e.g. build_dir/doc/Doxyfile.doc). If you’re using standalone configuration file, use it here instead of Doxyfile.doc. From 3 files generated by this command, we need only header.html that we are about to customize. Opening it, we can see where that version text is located:

 $projectnumber

During docs generation, Doxygen replaces that $projecnumber with PROJECT_NUMBER value but we don’t need that. Instead, we cut it down to just:

Then, we add a version_selector_handler.js script that does 3 things:

injects element

Here’s the script:

// version_selector_handler.js
$(function () {
    var repoName = window.location.pathname.split('/')[1];
    $.get('/' + repoName + '/version_selector.html', function (data) {
        // Inject version selector HTML into the page
        $('#projectnumber').html(data);

        // Event listener to handle version selection
        document.getElementById('versionSelector').addEventListener('change', function () {
            var selectedVersion = this.value;
            window.location.href = '/' + repoName + '/' + selectedVersion + '/index.html';
        });

        // Set the selected option based on the current version
        var currentVersion = window.location.pathname.split('/')[2];
        $('#versionSelector').val(currentVersion);
    });
});

Note that its URL is in the form of //version_selector.html. That’s because when project is published on GitHub Pages, all its URLs will have prefix. It’s not needed if docs are hosted on a fully standalone domain.

This script has to be injected at the bottom of the section of the header.html:

The actual

Here’s the full structure of our documentation directory:

/           # root of our documentation
    1.0.0/  # version-specific directories
        version_selector_handler.js # loads `/version_selector.html`
        index.html # version-specific main page generated by Doxygen
        ...
    2.0.0/
        version_selector_handler.js
        index.html
        ...

    index.html # redirects to the latest release, e.g. `2.0.0/index.html`
    version_selector.html # holds the actual version list

Note that while every generated documentation has its own version_selector_handler.js copy, there’s only one instance of version_selector.html. There’s no need to regenerate existing docs to introduce a new version, only version_selector.html has to be updated.

We need to adjust Doxygen settings to bring the above changes together:

# specify custom header path
set(DOXYGEN_HTML_HEADER "header.html")

# extra files that will be copied alongside generated HTML files
set(DOXYGEN_HTML_EXTRA_FILES "version_selector_handler.js")

And that’s it, now each HTML page has that ' > $/$ for dir in $dirs; do if [[ "$(basename "$dir")" != .* ]]; then version=$(basename "$dir") echo " " >> $/$ fi done echo '' >> $/$ # deploy step...

Updating redirect page

This one is trivial, just take the target url from the parameter and generate the standard HTML redirect:

# .github/actions/update-redirect-page/action.yaml

steps:
  - name: Generate redirect HTML
    shell: bash
    run: |
      mkdir $
      cat << EOF > $/$
      
      
      
          
          
          Redirecting...
      
      
          If you are not redirected automatically, click here.
      
      
      EOF

  # deploy step...

Workflows

The above three actions are enough to implement almost any strategy for docs generation. Let’s see a couple of common ones.

Generating `git-main` docs

It’s useful to have not only release-specific docs but the ones corresponding to the latest, not yet released version of a project. This can be achieved by generating docs from the main branch whenever new commits are pushed into it. We’ll give such docs a git-main “version”:

# .github/workflows/create-git-main-docs.yml

on:
  push:
    branches:
      - main

jobs:
  create-git-main-docs:
    runs-on: ubuntu-22.04

    steps:
      - uses: actions/checkout@v4

      - name: Build docs
        id: build-docs
        uses: ./.github/actions/build-docs
        with:
          cmake_target: 'doc'
          docs_dir: 'doc/docs'
          destination_dir: git-main
          github_token: $

      - name: Update version selector
        id: update-version-selector
        uses: ./.github/actions/update-version-selector
        with:
          github_token: $

      - name: Create redirect page if there are no releases
        if: $
        uses: ./.github/actions/update-redirect-page
        with:
          github_token: $
          target_url: git-main/index.html

The first two steps are self-explanatory, the only thing to notice is destination_dir: git-main argument to build-docs action. It forces build-docs to deploy documentation to the git-main directory, not to a version-specific one because those are reserved for releases. The last step is required to create redirect page when your documentation branch has no release-specific docs yet. It’s useful when you begin to play with this stuff to test how everything works without making new releases.

Generating release docs

Generating release-specific docs is the main goal of this tutorial and its workflow is even simpler than the above. This time destination_dir is not set so the docs are published into a version-specific directory, e.g. 1.0.0:

on:
  release:
    types: [released]

jobs:
  create-release-docs:
    runs-on: ubuntu-22.04

    steps:
      - uses: actions/checkout@v4

      - name: Build docs
        id: build-docs
        uses: ./.github/actions/build-docs
        with:
          cmake_target: 'doc'
          docs_dir: 'doc/docs'
          github_token: $

      - name: Update redirect HTML
        uses: ./.github/actions/update-redirect-page
        with:
          github_token: $
          target_url: $/index.html

      - name: Update version selector
        uses: ./.github/actions/update-version-selector
        with:
          github_token: $

Generating PR docs

The above two workflows are a good basis to generate docs for a project. As an example of something “extra”, let’s generate docs from a pull request. It can be useful for projects with many contributors to check that docs for a new feature are correct. Generating them from every PR makes no sense so we need some condition here. GitHub has different ways to control when to run such a workflow, I’ve chosen the simplest one, to run it when PR is labeled as documentation. It’s very similar to git-main workflow but now destination_dir is set to PR-$ and redirect page is never touched:

# .github/workflows/create-pr-docs.yml

on:
  pull_request:
    types: [labeled, synchronize]
    branches:
      - main

jobs:
  create-pr-docs:
    if: $
    runs-on: ubuntu-22.04

    steps:
      - uses: actions/checkout@v4
        with:
          ref: $

      - name: Build docs
        id: build-docs
        uses: ./.github/actions/build-docs
        with:
          cmake_target: 'doc'
          docs_dir: 'doc/docs'
          destination_dir: PR-$
          github_token: $

      - name: Update version selector
        uses: ./.github/actions/update-version-selector
        with:
          github_token: $

Removing PR docs

Unlike git-main and release docs which stay there forever, PR docs should be removed when PR is closed:

# .github/workflows/remove-pr-docs.yml

on:
  pull_request:
    types: [closed]
    branches:
      - main

jobs:
  remove-pr-docs:
    if: $
    runs-on: ubuntu-22.04

    steps:
      - name: Remove PR docs
        uses: peaceiris/actions-gh-pages@v4
        with:
          github_token: $
          publish_dir: $
          destination_dir: PR-$

      - uses: actions/checkout@v4
        with:
          sparse-checkout: .github

      - name: Update version selector
        uses: ./.github/actions/update-version-selector
        with:
          github_token: $

Here, peaceiris/actions-gh-pages removes destination_dir before pushing new files to it and since $ is empty, this effectively results in removing PR docs.

Upgrading from a mono to multi-version documentation

Switch from a mono to multi-version documentation requires a bit of manual intervention. git-main docs will be generated automatically once new functionality is merged into main, the same applies to new releases but not to the previous ones. For my project, I’ve decided to integrate only the latest available release at the time to the new multi-version docs branch. But that release generates docs without the new version selector functionality and wouldn’t work as is. Here’s how I’ve done it:

pulled release tag locally
applied Doxygen-specific changes on top of it
generated its documentation, now it has version selector functionality
manually pushed it to my gh-pages branch into the corresponding version-specific directory
manually added that version to version_selector.html
updated index.html redirect page

That’s it, now old release docs are fully integrated. Not a lot of work for a single release but if one needs this for more releases, it makes sense to automate the process.

Wrap-up

We’re done, at this point we have a fully automated system that generates and publishes documentation for a project allowing user to switch between the versions. The presented approach is not the only one, many other customizations are possible but it should be a good place to start.

From range projections to projected ranges

2021-10-11T16:22:00+00:00

Introduction
What a projection is
Problems with existing design
Projected ranges to the rescue
Implementation story
Other use-cases
The role of std::views::transform
Demo
Wrap-up

Introduction

When I first watched range-related talks, I liked the idea of projections. I played with them a bit and still liked them. However, after trying to write range-based algorithms I found them not good enough and not pleasant to work with. In this post I’ll explain why I don’t like range projections in their current form and how I propose to fix them (demo implementation is provided).

Update. After this article was published, I received some feedback and realized that the proposed design has one problem. I described it in the section Major flaw and left the rest of the article untouched. Please keep that in mind while reading it. Thanks to all the people who shared their feedback and thoughts.

What a projection is

If you are not familiar with projections, here’s a brief explanation. Projection is an invocable entity which is applied to a range element before the algorithm’s logic will use it. It can be a lambda, pointer-to-member (either data or function) or just a function pointer. Along this article I will use these two structures for examples:

struct Y
{
    int a;
    int b;
    auto operator<=>(const Y&) const = default;
};

struct X
{
    int x;
    Y y;
    auto operator<=>(const X&) const = default;
};

If you don’t know what operator<=> is, don’t worry, in the context of this article you only need to know that it provides all the comparison operations ( ==, !=, <, <=, >, >=) for both X and Y, they operate in a member-wise fashion. Ok, back to the subject, imagine that we want to sort a vector of X based on X::x. Here’s how this can be done with the pre-ranges STL:

std::vector v;

std::sort(std::begin(v), std::end(v), [](const auto& lhs, const auto& rhs){
    return lhs.x < rhs.x;
});

And here’s how it can be done using projection and range-based algorithm:

std::ranges::sort(v, std::less{}, &X::x);

Now, std::less operates on X::x values but it’s important to understand that the algorithm itself sorts original X elements, not just their X::x parts. Roughly:

auto sort(auto range, auto compare, auto projection){
    // `it1` and `it2` are iterators from `range`
    // comparator is invoked on projected values
    if(compare(std::invoke(projection, *it1), std::invoke(projection, *it2))){
        // but moving/swapping is done on non-projected values
        std::ranges::iter_swap(it1, it2);
    }
    // ...
}

Projection provides clear separation of comparison logic from the element manipulation. These things are really orthogonal, it’s nice that now we can keep them separate. And while the idea behind projection is great, its implementation has unpleasant side-effects which sometimes make developer lives harder.

Problems with existing design

Several months ago I became involved in the P1708 Simple Statistical Functions proposal. I needed those functions for my hobby project and started implementing them. This was my first experience in writing range-based API and that’s how I got most of my unpleasant experience working with current projections design.

Projections uglify function signatures

In range-based API you usually have at least one range for which you need to support and hence provide a projection. For example, simplified signature of copy_if with removed return type and O type requirements:

template, Proj>> Pred>
constexpr auto copy_if(R&& r, O result, Pred pred, Proj proj = {});

All range-based algorithms must have this additional function and template parameter that defaults to a no-op std::identity projection. Looks innocent? In P1708 we have weighted statistics so we use two ranges: one for values and one for weights, thus, we need one projection per range:

template<
    typename Values,
    typename ValuesProj = std::identity,
    typename Weights,
    typename WeightsProj = std::identity>
constexpr auto mean(
    Values&& v, Weights&& w, ValuesProj proj1 = {}, WeightsProj proj2 = {});

Add to it more algorithm specific parameters like comparators and you’ll get something like std::ranges::merge():

template<
    ranges::input_range R1,
    ranges::input_range R2,
    std::weakly_incrementable O,
    class Comp = ranges::less,
    class Proj1 = std::identity,
    class Proj2 = std::identity>
constexpr auto merge(R1&& r1, R2&& r2, O result,
        Comp comp = {}, Proj1 proj1 = {}, Proj2 proj2 = {});

I believe that in a good API default function arguments should be rare and their number should be small. Here, we have 6 parameters and 3 of them have default arguments. This signature is not good at all, we also will discuss usability of such API in following sections.

Another issue, though not so critical, is access to projected value type in order, for example, to constrain it. Recall the constraint from the copy_if(): std::indirect_unary_predicate, Proj>> Pred. It ensures that predicate Pred can be called with the result of applying projection Proj to the value of the iterator of a range R. It’s understandable but still quite complex. In P1708R5 functions are supposed to work only on standard arithmetic types, a way to achieve it:

template
requires std::is_arithmetic_v<
    std::remove_cvref_t<
        std::indirect_result_t>>>
double mean(Range&& r, Proj = {});

I mean, OK, it works and with some effort you can do it properly. But I don’t like its complexity. Writing your own algorithms in the classic STL style was simple, writing them for ranges is not if you want to support projections properly.

Projections are not easily composable

Imagine that you’re implementing a range-based algorithm and you need to call another algorithm but with one more additional projection. For example, geometric mean is usually implemented in terms of arithmetic mean of logarithms and final std::exp() of it. This requires combination of two projections, original one and std::log():

template
constexpr double geometric_mean(R&& r, P proj = {})
{
    const auto logs_mean = mean(
        std::forward(r),
        [&](const auto& value)
        {
            return std::log(std::invoke(proj, value));
        });

    return std::exp(logs_mean);
}

It would be nice to move std::log() part to a separate independent projection but such a projection wouldn’t be really independent because it needs to know about the preceding one:

// to make it a reusable function-like object we again need this additional
// parameter everywhere
template
class log_proj{
public:
    explicit log_proj(P proj = {}): p{std::move(proj)}{}

    auto operator()(const auto& value){
        return std::log(std::invoke(p, value));
    }
private:
    P p;
};
// with the above it's possible to write:
// const auto logs_mean = mean(r, log_proj{proj});

// and this is what I want as a client:
struct nice_log_proj{
    auto operator()(const auto& value){
        return std::log(value);
    }
};

Of course, it’s possible to create another utility to chain projections together like mean(r, chain(std::move(proj), nice_log_proj{})) but at the moment there’s no standard tool for that. This problem also occurs when you want to sort std::vector by Y::a member of X::y. In C++ it’s not possible to get a pointer to a member of a member, &X::y::a doesn’t work, something like chain(&X::y, &Y::a) is needed.

Projections complicate caller’s code

Imagine a function with several default arguments:

template
void f(R&& range, int x = 1, int y = 2, P p = {});

Because the projection is usually placed at the end of signature, if you need to use it, you have to specify all the default arguments by hand:

f(v);               // without projection
f(v, 1, 2, &X::x);  // with projection

What if the author of f() decides to change default arguments? Clients will be forced to rewrite the code to preserve “default” behavior. It’s less painful with something like std::ranges::sort():

std::ranges::sort(v);   // without projection
std::ranges::sort(v, std::less{}, &X::x);   // with projection
std::ranges::sort(v, {}, &X::x);    // less verbose but less readable too

But now it’s either too verbose with explicit std::less{} or less readable with {}. So the client is either forced to explicitly write arguments by hand or use less readable constructions if that’s possible at all. Going back to weighted stats:

template<
    typename Values,
    typename Weights,
    typename ValuesProj = std::identity,
    typename WeightsProj = std::identity>
constexpr auto mean(
    Values&& v, Weights&& w, ValuesProj proj1 = {}, WeightsProj proj2 = {});

mean(values, weights);    // no projections, great
mean(values, weights, &X::x); // only value projection, OK
mean(values, weights, {}, &X::x); // only weight projection, ugly :(

Root cause of all the problems

From the interface point of view it’s pretty simple, the problem is that range and projection represent a logically single entity but are passed to functions separately via distinct parameters. It’s the same as for error-prone f(const char* str, std::size_t len); and we all know it’s a bad way of doing things. Clients are forced to separate things, developers are forced to combine them back together, I want something better.

Projected ranges to the rescue

I had quite a simple idea: range and projection should be combined into a single thing using some kind of view, e.g., views::projection. This would make all those projection-related parameters redundant, algorithms wouldn’t care about them at all, they would only operate on a range itself, just like in classic STL. Here’s what I wanted:

// no projection-related parameters
auto sort(auto&& range, auto cmp = std::ranges::less{});

// sorts elements by `X::x` member, analog of current sort(v, {}, &X::x);
sort(v | projection(&X::x));

const auto log_proj = [](const auto value)
{
    return std::log(value);
};

// nested projections, actually, it's a nested range now
constexpr double geometric_mean(auto&& r)
{
    const auto logs_mean = mean(r | projection(log_proj{}));
    return std::exp(logs_mean);
}

Isn’t it great? No more projection-related parameters, signatures are clean, everything is perfectly composable. It simplifies projections in the same way as ranges simplified usage and composition of iterator-based algorithms.

I call it a projected range because it combines range and projection. Such a range has very important property: its operator*() returns projected value, while copy/move/swap/assign operations should be performed on the whole underlying object. Immediately, another type of projection came to my mind, the so-called narrow_projection. All of its operations are performed on the projected part only. It’s narrow in a sense that it represents only a narrow part of the object while wide projection represents a wider object behind it:

std::vector v{{3, {30, 300}}, {2, {20, 200}}, {1, {10, 100}}};

// sorts the whole X objects using &X::x member
std::sort(v | projection(&X::x));
// {{1, {10, 100}}, {2, {20, 200}}, {3, {30, 300}}}

// sorts only X::x
std::sort(v | narrow_projection(&X::x), std::ranges::greater{});
// {{3, {10, 100}}, {2, {20, 200}}, {1, {30, 300}}}

// sorts X::y by Y::a
std::sort(
    v | narrow_projection(&X::y) | projection(&Y::a), std::ranges::greater{});
// {{3, {30, 300}}, {2, {20, 200}}, {1, {10, 100}}}

Delighted, I started to think how to implement it and it turned out to be a bit harder than I expected.

Implementation story

If you don’t know how range views work, here’s the basic idea: all work is done inside custom “smart” iterators. For example, iterator for the most relevant to projections views::transform has operator*() which looks like this:

class transform_view_iterator{
private:
    Iterator it;    // underlying iterator
    F f;            // transform function

public:
    decltype(auto) operator*(){
        return std::invoke(f, *it);
    }
    // ...
};

Other operations mostly take care of proper it moving. To implement views::projection we will need to implement a custom iterator, thus, we need first to understand how iterators work in C++20.

C++20 iterators overview

Here’s the brief overview of iterator-related types and operations. It’s heavily based on articles/papers by Eric Niebler (0, 1, 2, 3, 4). Read them if you want more details and reasoning behind current design.

iter_value_t/value_type - the type of a value which the iterator represents. The value of this type can be copied/moved from the iterator.

iter_reference_t operator*() - dereference operator, usually returns lvalue reference to value_type (but not required), must be convertible to iter_value_t.

iter_rvalue_reference_t iter_move(it) - customization point for moving value out of iterator, usually returns rvalue reference to value_type (but not required). Also must be convertible to iter_value_t. If not defined by iterator, std::move(*it) is used.

void iter_swap(it1, it2) - customization point for swapping values between two iterators. If not defined, performs std::ranges::swap(*it1, *it2) if possible, otherwise uses iter_move() to swap elements “by-hand”.

common_reference requirements for readable iterators. Now comes the tricky part. As you might have noticed, iter_value_t, iter_reference_t, iter_rvalue_reference_t are not required to be as simple as int, int& and int&& correspondingly. But there must be pairwise common_references to represent relationships between them. Basically, common_reference is a type to which both T and U can be converted or bound, it’s not required to be a true reference type.

static_assert(std::same_as, const int&>);
static_assert(std::same_as, int&&>);
static_assert(std::same_as, const int&>);
static_assert(std::same_as, int>);

You can find these requirements in the std::indirectly_readable concept:

template
concept __IndirectlyReadableImpl = // exposition only
requires(const In in) {
    typename std::iter_value_t;
    typename std::iter_reference_t;
    typename std::iter_rvalue_reference_t;
    { *in } -> std::same_as>;
    { ranges::iter_move(in) } -> std::same_as>;
} &&
std::common_reference_with<
    std::iter_reference_t&&, std::iter_value_t&> &&
std::common_reference_with<
    std::iter_reference_t&&, std::iter_rvalue_reference_t&&> &&
std::common_reference_with<
    std::iter_rvalue_reference_t&&, const std::iter_value_t&>;

Interestingly, there’s no requirement that all these common_references must be the same type. In fact, they are not even required to be used and hence defined but they must be declared. Eric shows one example when common_reference might be useful, unique_copy() comparator parameter types. unique_copy() needs to copy value_type and then call comparator with this copy and the result of operator*() which is iter_reference_t. But the order of arguments is not specified. If for whatever reason your comparator cannot have templated parameters, you need to use common_reference for parameter types:

auto unique_copy(Iterator first, Iterator last, auto d_first, auto comparator){
    // somewhere inside `unique_copy()`
    Iterator it = first;
    std::iter_value_t copy = *it;   // copy current element
    ++it;
    comparator(copy, *it);  // compare it to the next one, it can be one way
    comparator(*it, copy);  // or the other
}

// client's code
auto generic_comparator = [](auto& lhs, auto& rhs){};   // no problems

// but if you need specific types, use common_reference
template
using iter_common_reference_t = std::common_reference_t<
    std::iter_reference_t,std::iter_value_t&>;

auto non_generic_comparator = [](
    iter_common_reference_t lhs, iter_common_reference_t rhs){};

Note that since C++20, iter_reference_t is not required to be a true reference for any kind of iterator which effectively allows random-access proxy iterators.

Range-based versions of existing algorithms must be changed like this (current libstdc++ still doesn’t use iter_move()/iter_swap() in its range algorithms):

Iterator it1, it2;

// pre C++20 algorithms:
using value_type = std::iterator_traits;
value_type copied = *it1;             // copy
value_type moved = std::move(*it1);   // move
std::iter_swap(it1, it2);             // swap

// C++20 algorithms:
using value_type = std::iter_value_t;
value_type copied = *it1;                       // copy
value_type moved = std::ranges::iter_move(it);  // move
std::ranges::iter_swap(it1, it2);               // swap

The main purpose of this design (as I understand it) is to allow proxy iterators of any kind, which, in theory, allows more “indirect” iterators and their usage with standard algorithms. Imaginary proxy-iterator must implement:

corresponding to its category functions (operator++(), operator[], etc.)
custom proxy-reference type which must have read/write/conversions to/from value_type, itself and iter_rvalue_reference_t
custom iter_move() and iter_swap()
specialize necessary basic_common_reference (a helper for common_reference described above) between its value_type, iter_reference_t, iter_rvalue_reference_t to a type which is at least declared

Need for a better design

Now, when you have a basic idea of what iterators can do in C++20, we can start to think about how views::projection should work. Recall usage example:

sort(v | projection(&X::x));    // sorts `v` by `X::x` member

For this to work, operator*() must return a reference-like thing which points to the projected value (X::x member in this case) so that the comparator will use it instead of the whole object. On the other hand, copy/move/swap/assign operations must operate on the non-projected object (X). In other words, we have two distinct types:

value_type - the type exposed through operator*(), projected type
iter_root_t - the type of underlying object, root type

and there’s no logical relationship between them, i.e., there’s no connection between int x; and struct X; types. One can argue that in fact we have X and &X::x types and there is a member-of relation but in reality projection can also be a pure transformation, e.g., from std::string to int so any kind of relationship doesn’t make sense here.
In contrast, the existing design doesn’t leave space for a second type (iter_root_t). It allows proxy-reference as iter_reference_t but it enforces strict relationships between it and value_type in terms of common_reference requirements. At most, it allows representing logically single value_type with two different types, like an advanced form of pointer. That’s why related concepts are named like indirectly_readable/writable/etc, it’s all about indirection mechanics, not true abstraction from one type to another.
And even this indirection mechanism is over-complicated, I’d say it’s expert- ~~friendly~~ only utility. I mean, when Eric Niebler says it’s hard( you can check his implementation here), how can you expect people to write their own iterators using it? It’s hard because if you need to use proxy-reference, you need first to check and understand algorithm requirements on operations/conversions proxy-reference (iter_reference_t), value_type and iter_rvalue_reference_t should support and only then try to implement it.

To summarize, there are two main problems: over-complicated design and its inability to support true abstraction between two unrelated types. Now, let’s fix it.

The next iteration of iterators

In Projected ranges to the rescue section I said that algorithms must operate on the iter_root_t values only, value_type should be used only when it’s passed to customizable logic like comparators. Thus, we need to separate value_type API from iter_root_t API. Let’s summarize what we have so far:

operator*() to get an lvalue reference or copy of value_type
operator*() to write value_type
iter_move(it) to get rvalue reference to value_type
iter_swap(it1, it2) to swap whatever we want, in our case it’s iter_root_t

Now we need similar functions for iter_root_t:

iter_copy_root(it) to get an lvalue reference to iter_root_t, iter_root_reference_t
iter_move_root(it) to get an rvalue reference to iter_root_t, iter_root_rvalue_reference_t

And to simplify assignment:

iter_assign_from(it, value) to assign whatever is needed

All these new functions are customization point objects (CPO) which means they are not required to be implemented if the iterator is happy with the default behavior. One of my goals was to preserve backward compatibility with all existing iterators so default implementations mostly forward to the old API. If you are not familiar with typical CPO implementation, the idea is quite simple: you call customized for a specific type function or the default implementation. The presence of a customized function is detected via ADL check (has_adl_[cpo_name] below). Implementation is located inside struct that’s why in the code below operator()(...) is used instead of a plain function. stdf is a namespace name where I put all the new stuff, not a typo.

iter_copy_root()

Returns lvalue reference to iter_root_t. Default behavior is to return the result of operator*(). I deliberately omit return type, noexcept-ness and constraint specifications since they are trivial, interested readers can find them in demo implementation.

template
constexpr decltype(auto) operator()(From&& from) const
{
    if constexpr(has_adl_iter_copy_root)
    {
        return iter_copy_root(static_cast(from));
    }
    else
    {
        return *from;
    }
}

// helper aliases
template
using iter_root_t =
    std::remove_cvref_t()))>;

template
using iter_root_reference_t = decltype(stdf::iter_copy_root(std::declval()));

// usage example:
auto& ref = stdf::iter_copy_root(it);
auto copy = stdf::iter_copy_root(it);

iter_move_root()

Returns rvalue reference to underlying object. When not customized, can forward to iter_move() or to iter_copy_root(). Reason for this is simple: iter_move_root() is supposed to return rvalue reference to root type, if iter_copy_root() is not customized, it operates in terms of value type and iter_move() is responsible for moving it. This also preserves backward compatibility, for existing iterators iter_copy_root() is forwarded to operator*() and iter_move_root() to iter_move().

constexpr decltype(auto) operator()(From&& from) const
{
    if constexpr(has_adl_iter_move_root)
    {
        return iter_move_root(static_cast(from));
    }
    else if constexpr(
        iter_move_cpo::has_adl_iter_move &&
        !iter_copy_root_cpo::has_adl_iter_copy_root)
    {
        return stdf::iter_move(static_cast(from));
    }
    else if constexpr(std::is_lvalue_reference_v<
                            iter_root_reference_t>)
    {
        return std::move(
            stdf::iter_copy_root(static_cast(from)));
    }
    else
    {
        return stdf::iter_copy_root(static_cast(from));
    }
}

template
using iter_root_rvalue_reference_t =
    decltype(stdf::iter_move_root(std::declval()));

// usage example:
auto moved = stdf::iter_move_root(it);

iter_assign_from()

It is responsible for assignment. Developer has full control over supported types. It’s possible to introduce iter_assign_value and iter_assign_root but I don’t know any use-case where it might be useful. Default behavior assigns to root:

template
constexpr void operator()(To&& to, From&& from) const
{
    if constexpr(has_adl_iter_assign_from)
    {
        iter_assign_from(static_cast(to), static_cast(from));
    }
    else
    {
        stdf::iter_copy_root(static_cast(to)) = static_cast(from);
    }
}

// helper concept
template
concept iter_assignable_from = requires(To&& to, From&& from)
{
    stdf::iter_assign_from(static_cast(to), static_cast(from));
};

// usage example:
stdf::iter_assign_from(it, T{});

iter_swap()

iter_swap() behaves almost like std::ranges::iter_swap() but it operates on root values, i.e., it uses iter_copy_root() instead of operator*() and iter_move_root()/iter_assign_from() instead of iter_move()/operator=(). I don’t show implementation here because it’s not so short, you can find it in the demo.

views::projection

Now, when we have full control over the iterator’s behavior, we can finally implement views::projection and views::narrow_projection and see how the new API simplifies custom iterator implementation. I will show only core parts of the iterator. We need to store current underlying iterator and a pointer to a parent view where projection function is stored:

class Iterator
{
private:
    BaseIter current{};
    ParentView* parent{};
};

Core parts:

constexpr decltype(auto) operator*() const
{
    return std::invoke(parent->fun, *current);
}

friend constexpr decltype(auto) iter_copy_root(const Iterator& it)
{
    return stdf::iter_copy_root(it.current);
}

// enabled only if `BaseIter` has custom `iter_move_root`
friend constexpr decltype(auto) iter_move_root(const Iterator& it)
{
    return stdf::iter_move_root(it.current);
}

// enabled only if `BaseIter` has custom `iter_swap`
friend constexpr void iter_swap(const Iterator& x, const Iterator& y)
{
    return stdf::iter_swap(x.current, y.current);
}

// enabled only if `BaseIter` has custom `iter_assign_from`
template
friend constexpr void iter_assign_from(const Iterator& it, T&& val)
{
    stdf::iter_assign_from(it.current, std::forward(val));
}

As you can see, it’s trivial, all of them are one-liners. operator*() returns the result of applying projection to the iterator’s value. We don’t need custom iter_move() because default implementation operates on the result of operator*(). Since we want copy/move/assign/swap operations to operate on the root value, we simply forward these calls to it. Note that the last three functions are enabled (using requires-clause) only in case when the underlying iterator customizes them. Otherwise, their default versions will operate on the basis of iter_copy_root() which is exactly what’s needed. There’s another reason why it’s better to avoid customized versions of CPO-s when possible, it’s described later in section Reducing number of dereferences.

views::narrow_projection

It’s even simpler, all we need is:

constexpr decltype(auto) operator*() const
{
    return std::invoke(parent->fun, *current);
}

Because we don’t want to expose the underlying root value to copy/move/swap/assign operations, everything else works by default.

Impact on algorithms

Just like it was with C++20 iterator API, this one also requires algorithm authors to update implementations. Their requirements have to be updated to reflect usage of the new API. Changes to algorithms code are trivial, operator*() is still used for customizable logic like comparators but copy/move/swap/assign must be replaced with new functions. In the demo I implemented a couple of simple algorithms to test how the new design fits in and found no major problems.

// read projected value
auto v1 = std::invoke(proj, *it);   // before
auto v2 = *it;                      // after

// copy underlying value, now copies `iter_root_t`
iter_value_t copy1 = *it;               // before
iter_root_t copy2 = iter_copy_root(it); // after

// move underlying value, now moves `iter_root_t`
iter_value_t moved1 = iter_move(it);    // before
iter_root_t moved = iter_move_root(it); // after

// assign to iterator
*it = val;                  // before
iter_assign_from(it, val);  // after

Iterator-based versions of algorithms

For some reason, all range-based algorithms also have iterator-based counterparts, e.g., copy_if(Range r, Out o, Pred pred, Proj proj) and copy_if(I begin, S end, Out o, Pred pred, Proj proj). I don’t know why they are needed at all when a pair of iterators can be converted into a range using std::ranges::subrange but they are here. Described projection/narrow_projection combine projection and a range. To remove projections from iterator-based signatures we need something like projection_iterator. It should work just like projection_view::iterator with addition of comparison functions with its root iterator to support cases like std::ranges::sort(make_projection_iterator(std::begin(r), some_projection), std::end(r)). Or this issue can be ignored at all, projections can be removed without introducing projection_iterator. It will force usage of std::ranges::subrange and projection on the resulting range.

Reducing number of dereferences

While implementing algorithms, I found one interesting issue. Consider copy_if() algorithm. Here’s libstdc++ implementation:

void copy_if(auto first, auto last, auto result, auto pred, auto proj)
{
    for (; first != last; ++first)
    {
        if (std::invoke(pred, std::invoke(proj, *first)))   // #1
        {
            *result = *first;   // #2
            ++result;
        }
    }
}

The subtle issue here is that first is dereferenced twice per iteration, first, to call the predicate, second, to copy its value to the output result iterator. As I told you before, libstdc++ still uses old implementations for range-based algorithms and probably this version is OK for old-school iterators. But in a ranges world operator*() might do non-trivial things. For example, it might be a range which uses views::transform with int -> string transformation. range-v3 handles it better, whenever possible it stores and reuses the result of dereference:

void copy_if(auto first, auto last, auto result, auto pred, auto proj)
{
    for (; first != last; ++first)
    {
        auto&& x = *first;     // dereference is done only once now
        if (std::invoke(pred, std::invoke(proj, x)))
        {
            *result = (decltype(x) &&)x;    // analog of std::forward<...>(x)
            ++result;
        }
    }
}

With that in mind, I wrote initial implementation using the new API:

constexpr void copy_if(auto&& in, auto out, auto pred)
{
    auto first = std::ranges::begin(in);
    auto last = std::ranges::end(in);
    for(; first != last; ++first)
    {
        if(std::invoke(pred, *first))
        {
            iter_assign_from(out, iter_copy_root(first));
            ++out;
        }
    }
}

Explicit dereference is done only once here but recall that when iter_copy_root() is not customized by client, it falls back to operator*() so the above code transforms to:

if(std::invoke(pred, *first))       // first dereference
{
    iter_assign_from(out, *first);  // second dereference
    ++out;
}

Taking into account that now operator*() might contain projection, I want to avoid calling it whenever possible. Also, standard algorithms guarantee a specific number of projection calls, any approach which cannot fulfill them would be useless. The fixed version would be:

constexpr void copy_if(auto&& in, auto out, auto pred)
{
    auto first = std::ranges::begin(in);
    auto last = std::ranges::end(in);
    for(; first != last; ++first)
    {
        auto&& x = *first;
        if(std::invoke(pred, x))
        {
            if constexpr(has_adl_iter_copy_root)
            {
                // call customization point
                iter_assign_from(out, iter_copy_root(first));
            }
            else
            {
                // reuse `x`
                iter_assign_from(out, std::forward(x));
            }
            ++out;
        }
    }
}

Now when there’s no customized iter_copy_root(), the dereferenced value can safely be reused. Obviously, having such an if statement in all algorithms for each call of iter_copy_root, iter_move_root and iter_assign_from would be too verbose. To simplify it, I added a second version for each CPO with additional dereferenced parameter at the end. Now copy_if() is shorter and dereferences only once:

// second version of iter_copy_root()
template
constexpr decltype(auto) operator()(
    From&& from, std::iter_reference_t& dereferenced) const
{
    if constexpr(has_adl_iter_copy_root)
    {
        return iter_copy_root(static_cast(from));
    }
    else if constexpr(std::is_lvalue_reference_v>)
    {
        return dereferenced;
    }
    else
    {
        return std::move(dereferenced);
    }
}

constexpr void copy_if(auto&& in, auto out, auto pred)
{
    auto first = std::ranges::begin(in);
    auto last = std::ranges::end(in);
    for(; first != last; ++first)
    {
        auto&& x = *first;
        if(std::invoke(pred, x))
        {
            stdf::iter_assign_from(out, stdf::iter_copy_root(first, x));
            ++out;
        }
    }
}

As a side-effect, it reduces the number of dereferences for all currently existing iterators because they don’t customize new CPOs. Usually, an optimizer is able to eliminate them but I like that now it’s guaranteed by design with or without optimizations. The same problem exists for iter_move() and iter_swap() because when not customized, they dereference. At the end, I added a second version for them too. That’s why in views::projection it’s important to enable custom iter_move_root(), iter_swap(), iter_assign_from() only if they are customized by the underlying iterator. Customizing them unconditionally prevents reuse of dereferenced value.

root() method

Sometimes we need to use the result of a generic algorithm with member function of a container. One such example is remove_if(). It returns a range of removed elements which are then erase()d using member function. The signature in std::vector is constexpr iterator erase(const_iterator first, const_iterator last);. The problem is that it takes std::vector::const_iterator and when we do:

auto pv = v | projection(&X::x);
auto removed = stdf::remove_if(pv, less_than{});

removed contains a range of projection_view::iterator so we need a way to get the underlying iterator from it. It’s possible to provide an implicit conversion for it but implicit conversions are always dangerous so for now I made it a normal member function. While the existing base() method of view iterators returns the last wrapped iterator, the new root() method returns the very first iterator in the projection chain. To make it generic, I added a stdf::root(it) free function which falls back to it.root() or just returns it. Now we can do:

auto pv = v | projection(&X::x);
auto removed = stdf::remove_if(pv, less_than{});
v.erase(stdf::root(removed.begin()), stdf::root(removed.end()));

Major flaw

Unfortunately, after this article was published I received some feedback and realized that this design cannot replace projections when the algorithm operates on input range/iterator. The value represented by the input iterator is valid until the iterator is not incremented, all the copies of the iterator may be invalidated afterward. This restriction allows only single-pass algorithms. Consider one of the simplest algorithm, max:

template>
auto max(R&& r, C pred = {}, auto proj)
{
    auto first = std::ranges::begin(r);
    auto last = std::ranges::end(r);
    
    std::ranges::range_value_t result = *first;
    while(++first != last)
    {
        auto&& tmp = *first;
        if(invoke(pred, invoke(proj, result), invoke(proj, tmp))){
            result = (decltype(tmp) &&)tmp;
        }
    }

    return result;
}

Here, we need to store a copy of the current max element in the result variable. The projection proj is later applied to that copied element and that’s the problem. In the proposed design, I assumed that projected value is always accessed through iterator, not through the root value. Here’s the implementation using new design:

template>
stdf::iter_root_t> max(Rng&& rng, Cmp pred = {})
{
    auto first = std::ranges::begin(rng);
    auto last = std::ranges::end(rng);
    using iterator_t = std::ranges::iterator_t;

    std::iter_value_t maxValue = *first;
    stdf::iter_root_t root = stdf::iter_copy_root(first);
    while(++first != last)
    {
        auto&& tmp = *first;
        if(std::invoke(pred, result, tmp)){
            maxValue = (decltype(tmp) &&)tmp;
            root = stdf::iter_copy_root(first);
        }
    }
    return root;
}

As you can see, the only option is to copy both root and projected value which is not acceptable from the performance point of view. This problem exists only for input ranges and vanishes with forward ranges because for them it’s safe to copy the iterator and call its operator*() to get the projected value. However, there are still plenty of algorithms which require only input_range so the proposed design cannot be used in its current form. Any potential projection replacement should be able to retrieve projected value from the root value, not from the iterator.

Other use-cases

Introduced design significantly simplifies creation of non-trivial iterators. Because each aspect is handled separately, there’s no need for tricky proxy reference objects. common_reference requirements are still there, now for both value_type and iter_root_t, but it’s almost impossible to break them so clients shouldn’t care or even know about their existence. For example, here’s how infamous std::vector::iterator can be implemented:

class Iterator
{
public:
    bool operator*();   // no need for proxy reference type
    // swaps bits
    friend void iter_swap(const Iterator& lhs, const Iterator& rhs);
    // assigns bit from bool value
    friend void iter_assign_from(const Iterator& lhs, bool val);
};

Because there’s no sense in true copy/move of a single bit, iter_move(), iter_copy_root(), iter_move_root() work in terms of bool value returned by operator*(). But iter_swap() needs to actually swap bit values and iter_assign_from() should assign bool to a specific bit, thus, they are customized. It’s possible to achieve it with the existing design but it requires a custom proxy reference type and basic_common_reference specializations.

Another use-case might be various wrappers. Once I wanted to write a wrapper for rapidjson library. The main part was a wrapper class around rapidjson::Value and rapidjson::Document::AllocatorType which provided a more convenient interface similar to nlohmann/json. Writing the wrapper itself was easy but I failed at the point when I needed to provide a random-access iterator which returns my wrapper by-value. In C++17 it was impossible to achieve simply because operator*() returns value instead of true reference and such iterator couldn’t be a random-access one. In C++20 it should be possible, but again, requires a good understanding of proxy reference and common_reference requirements to implement it. With the proposed design it’s straightforward: root type is rapidjson::Value, value type is a wrapper:

class MyWrapper{
public:
    // interface methods...
private:
    rapidjson::Value* value;
    rapidjson::Document::AllocatorType* allocator;  // required for write ops
};

class MyIterator{
public:
    auto operator*(){
        return MyWrapper{*origIterator, allocator};
    }

    decltype(auto) iter_copy_root(){
        return *origIterator;
    }

    // other iterator methods...

private:
    rapidjson::Value::ValueIterator origIterator;
    rapidjson::Document::AllocatorType* allocator;
};

The role of std::views::transform

Someone might think that std::views::transform can be used to combine projection and a range but currently it’s mostly useless for that purpose. Its iter_move() operates on transformed value while iter_swap operates on the underlying non-transformed value so you can’t use it with any algorithm that might use them both (like sort(), see the issue 3520). The proposed fix is to remove customized iter_swap() so that the default version will operate on transformed value. With that fix, views::transform will become almost the same as narrow_projection (the only difference is that, for unknown reason, views::transform has customized iter_move() which behaves exactly like the default one). But should it be used instead of narrow_projection? It’s more like a naming question, I think that the name transform corresponds to a case when that’s the real purpose of the code, just like classic std::transform(). The name projection better fits cases when you don’t want to transform a range but only change its representation for an algorithm. Of course such a thing can still be called a transformation, it’s hard to get a clear answer here.

Demo

You can find the implementation of new CPO-s, projection, narrow_projection and a few test algorithms here. It’s just a single file which you can copy-paste to godbolt, currently it works only with GCC-11 because Clang hasn’t implemented ranges yet.

Wrap-up

The main benefit of introduced design is the support of true abstraction behind the iterator. Projection is only one of its use cases and I believe that it can fully replace and enhance them, at the same time simplify creation of new iterators. Important point is that it’s backward compatible, no need to change existing iterators, only the algorithms. Let me know what you think about it. Do you like this design? Would you like to see it in the standard? Can it solve some of your problems or will create new ones instead? Have I missed something else? Any meaningful feedback is welcome.

All C++20 core language features with examples

2021-04-02T11:12:00+00:00

Introduction

The story behind this article is very simple, I wanted to learn about new C++20 language features and to have a brief summary for all of them on a single page. So, I decided to read all proposals and create this “cheat sheet” that explains and demonstrates each feature. This is not a “best practices” kind of article, it serves only demonstrational purpose. Most examples were inspired or directly taken from corresponding proposals, all credit goes to their authors and to members of ISO C++ committee for their work. Enjoy!

Concepts
Modules
Coroutines
Three-way comparison
Lambda expressions
- Allow lambda-capture [=, this]
- Template parameter list for generic lambdas
- Lambdas in unevaluated contexts
- Default constructible and assignable stateless lambdas
- Pack expansion in lambda init-capture
Constant expressions
- Immediate functions(consteval)
- constexpr virtual function
- constexpr try-catch blocks
- constexpr dynamic_cast and polymorphic typeid
- Changing the active member of a union inside constexpr
- constexpr allocations
- Trivial default initialization in constexpr functions
- Unevaluated asm-declaration in constexpr functions
- std::is_constant_evaluated()
Aggregates
- Prohibit aggregates with user-declared constructors
- Class template argument deduction for aggregates
- Parenthesized initialization of aggregates
Non-type template parameters
- Class types in non-type template parameters
- Generalized non-type template parameters
Structured bindings
- Lambda capture and storage class specifiers for structured bindings
- Relaxing the structured bindings customization point finding rules
- Allow structured bindings to accessible members
Range-based for loop
- init-statements for range-based for loop
- Relaxing the range-based for loop customization point finding rules
Attributes
- [[likely]] and [[unlikely]]
- [[no_unique_address]]
- [[nodiscard]] with message
- [[nodiscard]] for constructors
Character encoding
- char8_t
- Stronger Unicode requirements
Sugar
- Designated initializers
- Default member initializers for bit-fields
- More optional typename
- Nested inline namespaces
- using enum
- Array size deduction in new-expressions
- Class template argument deduction for alias templates
constinit
Signed integers are two’s complement
__VA_OPT__ for variadic macros
Explicitly defaulted functions with different exception specifications
Destroying operator delete
Conditionally explicit constructors
Feature-test macros
Known-to-unknown bound array conversions
Implicit move for more local objects and rvalue references
Conversion from T* to bool is narrowing
Deprecate some uses of volatile
Deprecate comma operator in subscripts
Fixes
- Initializer list constructors in class template argument deduction
- const&-qualified pointers to members
- Simplifying implicit lambda capture
- const mismatch with defaulted copy constructor
- Access checking on specializations
- ADL and function templates that are not visible
- Specify when constexpr function definitions are needed for constant evaluation
- Implicit creation of objects for low-level object manipulation

Concepts

The basic idea behind concepts is to specify what’s needed from a template argument so the compiler can check it before instantiation. As a result, the error message, if any, is much cleaner, something like constraint X was not satisfied. Before C++20 it was possible to use tricky enable_if constructions or just fail during template instantiation with cryptic error messages. With concepts failure happens early and the error message is much cleaner.

Requires expression

Let’s start with requires-expression. It’s an expression that contains actual requirements for template arguments, it evaluates to true if they are satisfied and false otherwise.

template /*...*/
requires (T x) // optional set of fictional parameter(s)
{
    // simple requirement: expression must be valid
    x++;    // expression must be valid
    
    // type requirement: `typename T`, T type must be a valid type
    typename T::value_type;
    typename S;

    // compound requirement: {expression}[noexcept][-> Concept];
    // {expression} -> Concept is equivalent to
    // requires Concept
    {*x};  // dereference must be valid
    {*x} noexcept;  // dereference must be noexcept
    // dereference must  return T::value_type
    {*x} noexcept -> std::same_as;
    
    // nested requirement: requires ConceptName<...>;
    requires Addable; // constraint Addable must be satisfied
};

Concept

Concept is simply a named set of such constraints or their logical combination. Both concept and requires-expression render to a compile-time bool value and can be used as a normal value, for example in if constexpr.

template
concept Addable = requires(T a, T b)
{
    a + b;
};

template
concept Dividable = requires(T a, T b)
{
    a/b;
};

template
concept DivAddable = Addable && Dividable;

template
void f(T x)
{
    if constexpr(Addable){ /*...*/ }
    else if constexpr(requires(T a, T b) { a + b; }){ /*...*/ }
}

Requires clause

To actually constrain something we need requires-clause. It may appear right after template<> block or as the last element of a function declaration, or even at both places at once, lambdas included:

template
requires Addable
auto f1(T a, T b) requires Subtractable; // Addable && Subtractable

auto l = [] requires Addable
    (T a, T b) requires Subtractable{};

template
requires Addable
class C;

// infamous `requires requires`. First `requires` is requires-clause,
// second one is requires-expression. Useful if you don't want to introduce new
// concept.
template
requires requires(T a, T b) {a + b;}
auto f4(T x);

Much cleaner way is to use concept name instead of class/typename keyword in template parameter list:

template
void f();

Template template parameters can also be constrained. In this case argument must be less or equally constrained than parameter. Unconstrained template template parameters still can accept constrained templates as arguments:

template
concept Integral = std::integral;

template
concept Integral4 = std::integral && sizeof(T) == 4;

// requires-clause also works here
template requires Integral typename T>
void f2(){}

// f() and f2() forms are equal
template typename T>
void f(){
    f2();
}

// unconstrained template template parameter can accept constrained arguments
template typename T>
void f3(){}

template
struct S1{};

template
struct S2{};

template
struct S3{};

void test(){
    f();    // OK
    f();    // OK
    // error, S3 is constrained by Integral4 which is more constrained than
    // f()'s Integral
    f();

    // all are OK
    f3();
    f3();
    f3();
}

Functions with unsatisfied constraints become “invisible”:

template
struct X{
    void f() requires std::integral
    {}
};

void f(){
    X x;
    x.f();  // error
    auto pf = &X::f;    // error
}

Constrained `auto`

auto parameters now allowed for normal functions to make them generic just like generic lambdas. Concepts can be used to constrain placeholder types(auto/decltype(auto)) in various contexts. For parameter packs, MyConcept... Ts requires MyConcept to be true for each element of the pack, not for the whole pack at once, e.g. requires && requires && ... && requires.

template
concept is_sortable = true;

auto l = [](auto x){};
void f1(auto x){}               // unconstrained template
void f2(is_sortable auto x){}   // constrained template

template
is_sortable auto f3(is_sortable auto x, auto y)
{
    // notice that nothing is allowed between constraint name and `auto`
    is_sortable auto z = 0;
    return 0;
}

template
void f4(TypePack... args){}

int f();

// takes two parameters
template
concept C = true;
// binds second parameter
C auto v = f(); // means C

struct X{
    operator is_sortable auto() {
        return 0;
    }
};

auto f5() -> is_sortable decltype(auto){
    f4<1,2,3>(1,2,3);
    return new is_sortable auto(1);
}

Partial ordering by constraints

This section was inspired by the article Ordering by constraints by Andrzej Krzemieński. Check it out for a more thorough explanation.

Aside from specifying requirements for a single declaration, constraints can be used to select the best alternative for a normal function, template function or a class template. To do so, constraints have a notion of partial ordering, that is, one constraint can be at least or more constrained than the other or they can be unordered(unrelated). Compiler decomposes(the Standard uses term normalization but for me decomposition sounds better) constraint into a conjunction/ disjunction of atomic constraints. Intuitively, C1 && C2 is more constrained than C1, C1 is more constrained than C1 || C2 and any constraint is more constrained than the unconstrained declaration. When more than one candidate with satisfied constraints are present, the most constrained one is chosen. If constraints are unordered, the usage is ambiguous.

template
concept integral_or_floating = std::integral || std::floating_point;

template
concept integral_and_char = std::integral && std::same_as;

void f(std::integral auto){}        // #1
void f(integral_or_floating auto){} // #2
void f(std::same_as auto){}   // #3

// calls #1 because std::integral is more constrained
// than integral_or_floating(#2)
f(int{});
// calls #2 because it's the only one whose constraint is satisfied
f(double{});
// error, #1, #2 and #3's constraints are satisfied but unordered
// because std::same_as appears only in #3
f(char{});

void f(integral_and_char auto){}    // #4

// calls #4 because integral_and_char is more
// constrained than std::same_as(#3) and std::integral(#1)
f(char{});

It’s important to understand how the compiler decomposes constraints and when it can see that they have common atomic constraint and deduce order between them. During decomposition, the concept name is replaced with its definition but requires-expression is not further decomposed. Two atomic constraints are identical only if they are represented by the same expression at the same location. For example, concept C = C1 && C2 is decomposed to conjunction of C1 and C2 but concept C = requires{...} becomes concept C = Expression-Location-Pair and its body is not further decomposed. If two concepts have common or even the same requirements in their requires-expression, they will always be unordered because either their requires-expressions are not equal or they are equal but at different source locations. The same happens with duplicated usage of a naked type traits - they always represent different atomic constraints because of different locations, thus, cannot be used for ordering.

template
requires std::is_integral_v  // uses type traits instead of concepts
void f1(){}  // #1

template
requires std::is_integral_v || std::is_floating_point_v
void f1(){}  // #2

// error, #1 and #2 have common `std::is_integral_v` expression
// but at different locations(line 2 vs. line 6), thus, #1 and #2 constraints
// are unordered and the call is ambiguous
f1(int{});

template
concept C1 = requires{      // requires-expression is not decomposed
    requires std::integral;
};

template
concept C2 = requires{      // requires-expression is not decomposed
    requires (std::integral || std::floating_point);
};

void f2(C1 auto){}  // #3
void f2(C2 auto){}  // #4

// error, since requires-expressions are not decomposed, #3 and #4 have
// completely unrelated and hence unordered constraints and the call is
// ambiguous
f2(int{});

Conditionally trivial special member functions

For wrapper types like std::optional or std::variant it’s useful to propagate triviality from the types they wrap. For example, std::optional should be trivial but std::optional shouldn’t. In C++17 this can be achieved using pretty cumbersome machinery. Concepts provide a natural solution for this: we can create multiple versions of the same special member function with different constraints, the compiler will choose the best one and ignore the others. In this particular case, we need a trivial set of functions when the wrapped type is a trivial and a non-trivial set of functions when it’s not. For this to work, some updates have been made to the definition of trivial type. In C++17, a trivially copyable class is required to have all of its copy and move operations either deleted or trivial. To take concepts into account, the notion of an eligible special member function was introduced. It is a function that’s not deleted, whose constraints(if any) are satisfied and no other special member function of the same kind, with the same first parameter type(if any), is more constrained. Simply put, it’s a function(s) with the most constrained satisfied constraints(if any). All existing destructors(yes, now you can have more than one) are now called prospective destructors. Only one “active” destructor is allowed, it’s selected using normal overload resolution.
A trivially copyable class is now a class that has a trivial non-deleted destructor, at least one eligible copy/move operation and whose all such eligible operations are trivial. A trivial class is a trivially copyable class that has one or more eligible default constructors, all of which are trivial.
Here’s the skeleton of this technique:

template
class optional{
public:
    optional() = default;

    // trivial copy-constructor
    optional(const optional&) = default;

    // non-trivial copy-constructor
    optional(const optional& rhs)
        requires(!std::is_trivially_copy_constructible_v){
        // ...
    }

    // trivial destructor
    ~optional() = default;

    // non-trivial destructor
    ~optional() requires(!std::is_trivial_v){
        // ...
    }
    // ...
private:
    T value;
};

static_assert(std::is_trivial_v>);
static_assert(!std::is_trivial_v>);

Modules

Modules is a new way to organize C++ code into logical components. Historically, C++ used C model which is based on the preprocessor and repetitive textual inclusion. It has a lot of problems such as macros leakage in and out from headers, inclusion-order-dependent headers, repetitive compilation of the same code, cyclic dependencies, poor encapsulation of implementation details and so on. Modules are about to solve them but not so fast. We won’t be able to use their full power until compilers and build tools, such as CMake, will support it too. Full description of Modules is well beyond the scope of this article, I will only show the basic ideas and use cases. For more details you can read a series of articles by vector-of-bool or just google for other blog posts or talks.

The main idea behind modules is to restrict what’s accessible(exported) when a module is used(imported) by its clients. This allows true hiding of implementation details.

// module.cpp
// dots in module name are for readability purpose, they have no special meaning
export module my.tool;  // module declaration

export void f(){}       // export f()
void g(){}              // but not g()

// client.cpp
import my.tool;

f();    // OK
g();    // error, not exported

Modules are macro-unfriendly, you can’t pass manually #defined macros to module(compiler’s built-in and command-line macros are still visible) and only in one special case you can import macros from module. Modules can’t have cyclic dependencies. Module is a self-contained entity, compiler can precompile each module exactly once so overall compilation time is greatly improved. Import order doesn’t matter for modules.

Module units

A module can be either interface or implementation module unit. Only interface units can contribute to the module’s interface, that’s why they have export in their declaration. A module can be a single file or scattered across partitions. Each partition is named in the form module_name:partition_name. Partitions are importable only within the same module and client can import only a module as a whole. This provides much better encapsulation than header files.

// tool.cpp
export module tool; // primary module interface unit
export import :helpers; // re-export(see below) helpers partition

export void f();
export void g();

// tool.internals.cpp
module tool:internals;  // implementation partition
void utility();

// tool.impl.cpp
module tool;    // implementation unit, implicitly imports primary module unit
import :internals;

void utility(){}

void f(){
    utility();
}

// tool.impl2.cpp
module tool;    // another implementation unit
void g(){}

// tool.helpers.cpp
export module tool:helpers; // module interface partition
import :internals;

export void h(){
    utility();
}

// client.cpp
import tool;

f();
g();
h();

Note that partitions are imported without specifying module name. This prohibits importing other module’s partitions. Multiple implementation units( module tool;) are allowed, all other units and partitions of any kind must be unique. All interface partitions must be re-exported by the module via export import.

Export

Here are various forms of export, the general rule is that you can’t export names with internal linkage:

// tool.cpp
module tool;
export import :helpers; // import and re-export helpers interface partition

export int x{}; // export single declaration

export{         // export multiple declarations
    int y{};
    void f(){};
}

export namespace A{ // export the whole namespace
    void f();
    void g();
}

namespace B{
    export void f();// export a single declaration within a namespace
    void g();
}

namespace{
    export int x;   // error, x has internal linkage
    export void f();// error, f() has internal linkage
}

export class C; // export as incomplete type
class C{};
export C get_c();

// client.cpp
import tool;

C c1;    // error, C is incomplete
auto c2 = get_c();  // OK

Import

Import declarations should precede any other “non-module” declarations, it allows quick dependency analysis. Otherwise, it’s pretty intuitive:

// tool.cpp
export module tool;
import :helpers;  // import helpers partition

export void f(){}

// tool.helpers.cpp
export module tool:helpers;

export void g(){}

// client.cpp
import tool;

f();
g();

Header units

There’s one special import form that allows import of importable headers: import or import "header.h". Compiler creates a synthesized header unit and makes all declarations implicitly exported. What headers are actually importable is implementation-defined but all C++ library headers are so. Perhaps, there will be a way to tell the compiler which user-provided headers are importable, such headers should not contain non-inline function definitions or variables with external linkage. It’s the only import form that allows import of macros from headers(but you still can’t re-export them via export import "header.h"). Don’t use it to import random legacy header if you’re not sure about its content.

Global module fragment

If you need to use old-school headers within a module, there’s a special place to put #includes safely: global module fragment:

// header.h
#pragma once
class A{};
void g(){}

// tool.cpp
module;             // global module fragment
#include "header.h"
export module tool; // ends here

export void f(){    // uses declarations from header.h
    g();
    A a;
}

It must appear before the named module declaration and it can contain only preprocessor directives. All declarations from all global module fragments and non-modular translation units are attached to a single global module. Thus, all rules for normal headers apply here.

Private module fragment

The final strange beast is a private module fragment. Its intent is to hide implementation details in a single-file module(it’s not allowed elsewhere). In theory, clients might not recompile when things in a private module fragment changes:

export module tool; // interface

export void f();    // declared here

module :private;    // implementation details

void f(){}          // defined here

No more implicit `inline`

There’s also an interesting change regarding inline. Member functions defined within the class definition are not implicitly inline if that class is attached to a named module. inline functions in a named module can use only names that are visible to a client.

// header.h
struct C{
    void f(){}  // still inline because attached to a global module
};

// tool.cpp
module;
#include "header.h"

export module tool;

class A{};  // not exported

export struct B{// B is attached to module "tool"
    void f(){   // not implicitly inline anymore
        A a;    // can safely use non-exported name
    }

    inline void g(){
        A a;    // oops, uses non-exported name
    }

    inline void h(){
        f();    // fine, f() is not inline
    }
};

// client.cpp
import tool;

B b;
b.f();  // OK
b.g();  // error, A is undefined
b.h();  // OK

Coroutines

Finally, we have stackless(their state is stored in heap, not on stack) coroutines in C++. C++20 provides nearly the lowest possible API and leaves rest up to the user. We’ve got co_await, co_yield, co_return keywords and rules for interaction between the caller and callee. Those rules are so low-level that I see no point in explaining them here. You can find more details on Lewis Baker’s blog. Hopefully, C++23 will fill this gap with some library utilities. Until then, we can use third-party libraries, here’s an example that uses cppcoro:

cppcoro::task someAsyncTask()
{
    int result;
    // get the result somehow
    co_return result;
}

// task<> is analog of void for normal function
cppcoro::task<> usageExample()
{
    // creates a new task but doesn't start executing the coroutine yet
    cppcoro::task myTask = someAsyncTask();
    // ...
    // Coroutine is only started when we later co_await the task.
    auto result = co_await myTask;
}

// will lazily generate numbers from 0 to 9
cppcoro::generator getTenNumbers()
{
    std::size_t n{0};
    while (n != 10)
    {
        co_yield n++;
    }
}

void printNumbers()
{
    for(const auto n : getTenNumbers())
    {
        std::cout << n;    
    }
}

Three-way comparison

Before C++20, to provide comparison operations for a class, implementations of 6 operators are needed: ==, !=, <, <=, >, >=. Usually, four of them contain boiler-plate code that works in terms of == and < which contain the real comparison logic. Common practice is to implement them as free functions taking const T& to allow comparison of convertible types. If you want to support non-convertible types, you need to add two sets of 6 functions, op(const T1&, const T2&) and op(const T2&, const T1&) and now you have 18 comparison operators(check out std::optional). C++20 gives us a better way to handle and think about comparisons. Now you need to focus on operator<=>() and sometimes on operator==(). New operator<=>(spaceship operator) implements three-way comparison, it tells whether a is less, equal or greater than b in a single call, just like strcmp(). It returns a comparison category(see below) that could be compared to zero. Having this, compiler can replace calls to <, <=, >, >= with call to operator<=>() and check its result(a < b becomes a <=> b < 0), and calls to ==, != to operator==()(a != b becomes !(a == b)). Due to new lookup rules they can handle asymmetric comparisons, e.g. when you provide a single T1::operator==(const T2&), you get both T1 == T2 and T2 == T1, the same applies to operator<=>(). Now you need to write at most 2 functions to get all 6 comparisons between convertible types, and 2 functions to get all 12 comparisons between non-convertible types.

Comparison categories

The Standard provides three comparison categories(which doesn’t prevent you from having your own one). strong_ordering implies that exactly one of a < b, a > b, a == b must be true and if a == b then f(a) == f(b). weak_ordering implies that exactly one of a < b, a > b, a == b must be true and if a == b then f(a) can be not equal to f(b). Such elements are equivalent but not equal. partial_ordering means that none of a < b, a > b, a == b might
be true and if a == b then f(a) can be not equal to f(b). That is, some elements may be incomparable. Important note here is that f() denotes a function that accesses only salient attributes. For example, std::vector is strongly ordered despite that two vectors with the same values can have different capacity. Here, capacity is not a salient attribute. Example of a weakly ordered type is CaseInsensitiveString, it can store original string as-is but compare in a case-insensitive way. Example of a partially ordered type is float/double because NaN is not comparable to any other value. These categories form hierarchy, i.e., strong_ordering can be converted to weak_ordering and partial_ordering, and weak_ordering can be converted to partial_ordering.

Defaulted comparisons

Comparisons could be defaulted just like special member functions. In such case they operate in a member-wise fashion by comparing all underlying non-static data members with their corresponding operators. Defaulted operator<=>() also declares defaulted operator==()(if there was none), so you can write auto operator<=>(const T&) const = default; and get all six comparison operations with member-wise semantics.

template
void TestComparisons(T1 a, T2 b)
{
    (a < b), (a <= b), (a > b), (a >= b), (a == b), (a != b);
}

struct S2
{
    int a;
    int b;
};

struct S1
{
    int x;
    int y;
    // support homogeneous comparisons
    auto operator<=>(const S1&) const = default;
    // this is required because there's operator==(const S2&) which prevents
    // implicit declaration of defaulted operator==()
    bool operator==(const S1&) const = default;

    // support heterogeneous comparisons
    std::strong_ordering operator<=>(const S2& other) const
    {
        if (auto cmp = x <=> other.a; cmp != 0)
            return cmp;
        return y <=> other.b;
    }

    bool operator==(const S2& other) const
    {
        return (*this <=> other) == 0;
    }
};

TestComparisons(S1{}, S1{});
TestComparisons(S1{}, S2{});
TestComparisons(S2{}, S1{});

Implicitly declared operator==() has the same signature as operator<=>() except that return type is bool.

template
struct X
{
    friend constexpr std::partial_ordering operator<=>(X, X) requires(sizeof(T) != 1) = default;
    // implicitly declares:
    // friend constexpr bool operator==(X, X) requires(sizeof(T) != 1) = default;

    [[nodiscard]] virtual std::strong_ordering operator<=>(const X&) const = default;
    // implicitly declares:
    //[[nodiscard]] virtual bool operator==(const X&) const = default; 
};

Deduced comparison category is the weakest one of type’s members.

struct S3{
    int x;      // int-s are strongly ordered
    double d;   // but double-s are partially ordered
    // thus, the resulting category is std::partial_ordering
    auto operator<=>(const S3&) const = default;
};
static_assert(std::is_same_v S3{}), std::partial_ordering>);

They must be members or friends and only friends can take by-value.

struct S4
{
    int x;
    int y;
    // member version must have op(const T&) const; form
    auto operator<=>(const S3&) const = default;

    // friend version can take arguments by const-reference or by-value
    // friend auto operator<=>(const S3&, const S3&) = default;
    // friend auto operator<=>(S3, S3) = default;
};

Can be out-of-class defaulted, just like special member functions.

struct S5
{
    int x;
    std::strong_ordering operator<=>(const S5&) const;
    bool operator==(const S5&) const;
};

std::strong_ordering S5::operator<=>(const S5&) const = default;
bool S5::operator==(const S5&) const = default;

Defaulted operator<=>() uses operator<=>() of class members or their ordering can be synthesized using existing Member::operator==() and Member::operator<(). Note that it works only for members and not for the class itself, existing T::operator<() is never used in defaulted T::operator<=>().

// not in our immediate control
struct Legacy
{
    bool operator==(Legacy const&) const;
    bool operator<(Legacy const&) const;
};

struct S6
{
    int x;
    Legacy l;
    // deleted because Legacy doesn't have operator<=>(), comparison category
    // can't be deduced
    auto operator<=>(const S6&) const = default;
};

struct S7
{
    int x;
    Legacy l;

    std::strong_ordering operator<=>(const S7& rhs) const = default;
    /*
    Since comparison category is provided explicitly, ordering can be
    synthesized using operator<() and operator==(). They must return exactly
    `bool` for this to work. It will work for weak and partial ordering as well.
    
    Here's an example of synthesized operator<=>():
    std::strong_ordering operator<=>(const S7& rhs) const
    {
        // use operator<=>() for int
        if(auto cmp = x <=> rhs.x; cmp != 0) return cmp;

        // synthesize ordering for Legacy using operator<() and operator==()
        if(l == rhs.l) return std::strong_ordering::equal;
        if(l < rhs.l) return std::strong_ordering::less;
        return std::strong_ordering::greater;
    }
    */
};

struct NoEqual
{
    bool operator<(const NoEqual&) const = default;
};

struct S8
{
    NoEqual n;
    // deleted, NoEqual doesn't have operator<=>()
    // auto operator<=>(const S8&) const = default;

    // deleted as well because NoEqual doesn't have operator==()
    std::strong_ordering operator<=>(const S8&) const = default;
};

struct W
{
    std::weak_ordering operator<=>(const W&) const = default;
};

struct S9
{
    W w;
    // ask for strong_ordering but W can provide only weak_ordering, this will
    // yield an error during instantiation
    std::strong_ordering operator<=>(const S9&) const = default;
    void f()
    {
        (S9{} <=> S9{});    // error
    }
};

union and reference members are not supported.

struct S4
{
    int& r;
    // deleted because of reference member
    auto operator<=>(const S4&) const = default;
};

Lambda expressions

Allow lambda-capture `[=, this]`

When captured implicitly, this is always captured by-reference, even with [=]. To remove this confusion, C++20 deprecates such behavior and allows more explicit [=, this]:

struct S{
    void f(){
        [=]{};          // captures this by reference, deprecated since C++20
        [=, *this]{};   // OK since C++17, captures this by value
        [=, this]{};    // OK since C++20, captures this by reference
    }
};

Template parameter list for generic lambdas

Sometimes generic lambdas are too generic. C++20 allows to use familiar template function syntax to introduce type names directly.

// lambda that expect std::vector
// until C++20:
[](auto vector){
    using T =typename decltype(vector)::value_type;
    // use T
};
// since C++20:
[](std::vector vector){
    // use T
};

// access argument type
// until C++20
[](const auto& x){
    using T = std::decay_t;
    // using T = decltype(x); // without decay_t<> it would be const T&, so
    T copy = x;               // copy would be a reference type
    T::static_function();     // and these wouldn't work at all
    using Iterator = typename T::iterator;
};
// since C++20
[](const T& x){
    T copy = x;
    T::static_function();
    using Iterator = typename T::iterator;
};

// perfect forwarding
// until C++20:
[](auto&&... args){
    return f(std::forward(args)...);
};
// since C++20:
[](Ts&&... args){
    return f(std::forward(args)...);
};

// and of course you can mix them with auto-parameters
[](const T& a, auto b){};

Lambdas in unevaluated contexts

Lambda expressions can be used in unevaluated contexts, such as sizeof(), typeid(), decltype(), etc. Here are some key points for this feature, for a more real-world example see Default constructible and assignable stateless lambdas.

The main principle is that lambdas have a unique unknown type, two lambdas and their types are never equal.

using L = decltype([]{});   // lambdas have no linkage
L PublicApi();              // L can't be used for external linkage

// in template , two different declarations
template void f(decltype([]{}) (*s)[sizeof(T)]);
template void f(decltype([]{}) (*s)[sizeof(T)]);

// again, lambda types are never equivalent
static decltype([]{}) f();
static decltype([]{}) f(); // error, return type mismatch

static decltype([]{}) g();
static decltype(g()) g(); // okay, redeclaration

// each specialization has its own lambda with unique type
template
using R = decltype([]{});

static_assert(!std::is_same_v, R>);

// Lambda-based SFINAE and constraints are not supported, it just fails
template 
auto f(T) -> decltype([]() { T::invalid; } ());
void f(...);

template
void g(T) requires requires{
    [](){typename T::invalid x;}; }
{}
void g(...){}

f(0);  // error
g(0);  // error

In the following example, f() increments the same counter in both translation units because inline function behaves as if there’s only one definition of it. However, g_s violates ODR because despite that there’s only one definition of it, there are still multiple declarations which are different because there are two different lambdas in a.cpp and b.cpp, thus, S has different non-type template argument:

// a.h
template
int counter(){
    static int value{};
    return value++;
}

inline int f(){
    return counter();
}

template struct S{ void call(){} };
// cast lambda to pointer
inline S<+[]{}> g_s;

// a.cpp
#include "a.h"
auto v = f();
g_s.call();

// b.cpp
#include "a.h"
auto v = f();
g_s.call();

Default constructible and assignable stateless lambdas

In C++20 stateless lambdas are default constructible and assignable which allows to use a type of a lambda to construct/assign it later. With Lambdas in unevaluated contexts we can get a type of a lambda with decltype() and create a variable of that type later:

auto greater = [](auto x,auto y)
{
    return x > y;
};
// requires default constructible type
std::map map;
auto map2 = map;    // requires default assignable type

Here, std::map takes a comparator type to instantiate it later. While we could get a lambda type in C++17, it was not possible to instantiate it because lambdas were not default constructible.

Pack expansion in lambda init-capture

C++20 simplifies capturing parameter packs in lambdas. Until C++20 they can be captured by-value, by-reference or do some tricks with std::tuple if we want to move the pack. Now it’s much easier, we can create init-capture pack and initialize it with the pack we want to capture. It’s not limited to std::move or std::forward, any function can be applied to pack elements.

void g(int, int){}

// C++17
template
auto delay_apply(F&& f, Args&&... args) {
    return [f=std::forward(f), tup=std::make_tuple(std::forward(args)...)]()
            -> decltype(auto) {
        return std::apply(f, tup);
    };
}

// C++20
template
auto delay_call(F&& f, Args&&... args) {
    return [f = std::forward(f), ...f_args=std::forward(args)]()
            -> decltype(auto) {
        return f(f_args...);
    };
}

void f(){
    delay_call(g, 1, 2)();
}

Constant expressions

Immediate functions(`consteval`)

While constexpr implies that function can be evaluated at compile-time, consteval specifies that function must be evaluated at compile-time(only). virtual functions are allowed to be consteval but they can override and be overridden by another consteval function only, i.e., mix of consteval and non-consteval is not allowed. Destructors and allocation/deallocation functions can’t be consteval.

consteval int GetInt(int x){
    return x;
}

constexpr void f(){
    auto x1 = GetInt(1);
    constexpr auto x2 = GetInt(x1); // error x1 is not a constant-expression
}

`constexpr` virtual function

Virtual functions can now be constexpr. constexpr function can override non-constexpr one and vice-versa.

struct Base{
    constexpr virtual ~Base() = default;
    virtual int Get() const = 0;    // non-constexpr
};

struct Derived1 : Base{
    constexpr int Get() const override {
        return 1;
    }
};

struct Derived2 : Base{
    constexpr int Get() const override {
        return 2;
    }
};

constexpr auto GetSum(){
    const Derived1 d1;
    const Derived2 d2;
    const Base* pb1 = &d1;
    const Base* pb2 = &d2;

    return pb1->Get() + pb2->Get();
}

static_assert(GetSum() == 1 + 2);   // evaluated at compile-time

`constexpr` try-catch blocks

try-catch blocks are now allowed inside constexpr functions but throw is not, so, the catch block is simply ignored. This can be useful, for example, in combination with constexpr new, we can have single function that works at run/compile time:

constexpr void f(){
    try{
        auto p = new int;
        // ...
        delete p;
    }
    catch(...){     // ignored at compile-time
        // ...
    }
}

`constexpr` `dynamic_cast` and polymorphic `typeid`

Since virtual functions can now be constexpr, there’s no reason not to allow dynamic_cast and polymorphic typeid in constexpr. Unfortunately, std::type_info has no constexpr members yet so there’s a little use of it now(thanks to Peter Dimov for clarifying this for me).

struct Base1{
    virtual ~Base1() = default;
    constexpr virtual int get() const = 0;
};

struct Derived1 : Base1{
    constexpr int get() const override {
        return 1;
    }
};

struct Base2{
    virtual ~Base2() = default;
    constexpr virtual int get() const = 0;
};

struct Derived2 : Base2{
    constexpr int get() const override {
        return 2;
    }
};

template
constexpr auto downcasted_get(){
    const Derived d;
    const Base& upcasted = d;
    const auto& downcasted = dynamic_cast(upcasted);

    return downcasted.get();
}

static_assert(downcasted_get() == 1);
static_assert(downcasted_get() == 2);

// compile-time error, cannot cast Derived1 to Base2
static_assert(downcasted_get() == 1);

Changing the active member of a `union` inside `constexpr`

Another relaxation for constant expressions. One can change an active member of a union but can’t read an inactive member since it’s UB and UB is not allowed in constexpr context.

union Foo {
  int i;
  float f;
};

constexpr int f() {
  Foo foo{};
  foo.i = 3;    // i is an active member
  foo.f = 1.2f; // valid since C++20, f becomes an active member

//   return foo.i;  // error, reading inactive union member
  return foo.f;
}

`constexpr` allocations

C++20 lays foundation for constexpr containers. First, it allows constexpr and even virtual constexpr destructors for literal types(types that can be used as a constexpr variable). Second, it allows calls to std::allocator::allocate() and new-expression which results in a call to one of the global operator new if allocated storage is deallocated at compile time. That is, memory can be allocated at compile-time but it must be freed at compile-time also. This creates a bit of friction if final data has to be used at run-time. There’s no choice but to store it in some non-allocating container like std::array and get compile-time value twice: first, to get its size, and second, to actually copy it(thanks to arthur-odwyer, beached and luke from cpplang slack for explaining this to me):

constexpr auto get_str()
{
    std::string s1{"hello "};
    std::string s2{"world"};
    std::string s3 = s1 + s2;
    return s3;
}

constexpr auto get_array()
{
    constexpr auto N = get_str().size();
    std::array arr{};
    std::copy_n(get_str().data(), N, std::begin(arr));
    return arr;
}

static_assert(!get_str().empty());

// error because it holds data allocated at compile-time
constexpr auto str = get_str();

// OK, string is stored in std::array
constexpr auto result = get_array();

Trivial default initialization in `constexpr` functions

In C++17 constexpr constructor, among other requirements, must initialize all non-static data members. This rule has been removed in C++20. But, because UB is not allowed in constexpr context, you can’t read from such uninitialized members, only write to them:

struct NonTrivial{
    bool b = false;
};

struct Trivial{
    bool b;
};

template 
constexpr T f1(const T& other) {
    T t;        // default initialization
    t = other;
    return t;
}

template 
constexpr auto f2(const T& other) {
    T t;
    return t.b;
}

void test(){
    constexpr auto a = f1(Trivial{});   // error in C++17, OK in C++20
    constexpr auto b = f1(NonTrivial{});// OK

    constexpr auto c = f2(Trivial{}); // error, uninitialized Trivial::b is used
    constexpr auto d = f2(NonTrivial{}); // OK
}

Unevaluated `asm`-declaration in `constexpr` functions

asm-declaration now can appear inside constexpr function in case it’s not evaluated at compile-time. This allows to have both compile and run time(with asm now) code inside a single function:

constexpr int add(int a, int b){
    if (std::is_constant_evaluated()){
        return a + b;
    }
    else{
        asm("asm magic here");
        //...
    }
}

std::is_constant_evaluated()

With std::is_constant_evaluated() you can check whether current invocation occurs within a constant-evaluated context. I would like to say “during compile-time” but, as the authors said, “C++ doesn’t make a clear distinction between compile-time and run-time”. Instead, C++20 declares a list of expressions that are manifestly constant-evaluated and this function returns true during their evaluation and false otherwise.
Be careful not to use this function directly in such manifestly constant-evaluated expressions(e.g. if constexpr, array size, template arguments, etc.). By definition, in such cases std::is_constant_evaluated() returns true even if the enclosing function is not constant evaluated. Thanks to user destroyerrocket from /r/cpp for bringing up this issue.

constexpr int GetNumber(){
    if(std::is_constant_evaluated()){   // should not be `if constexpr`
        return 1;
    }
    return 2;
}

constexpr int GetNumber(int x){
    if(std::is_constant_evaluated()){   // should not be `if constexpr`
        return x;
    }
    return x+1;
}

void f(){
    constexpr auto v1 = GetNumber();
    const auto v2 = GetNumber();

    // initialization of a non-const variable, not constant-evaluated
    auto v3 = GetNumber();

    assert(v1 == 1);
    assert(v2 == 1);
    assert(v3 == 2);

    constexpr auto v4 = GetNumber(1);
    int x = 1;

    // x is not a constant-expression, not constant-evaluated
    const auto v5 = GetNumber(x);

    assert(v4 == 1);
    assert(v5 == 2);    
}

// pathological examples
// always returns `true`
constexpr bool IsInConstexpr(int){
    if constexpr(std::is_constant_evaluated()){ // always `true`
        return true;
    }
    return false;
}

// always returns `sizeof(int)`
constexpr std::size_t GetArraySize(int){
    int arr[std::is_constant_evaluated()];  // always int arr[1];
    return sizeof(arr);
}

// always returns `1`
constexpr std::size_t GetStdArraySize(int){
    std::array arr;  // std::array
    return arr.size();
}

Aggregates

Prohibit aggregates with user-declared constructors

Now aggregate types can’t have user-declared constructors. Previously, aggregates were allowed to have only deleted or defaulted constructors. That resulted in a weird behavior for aggregates with defaulted/deleted constructors (they’re user-declared but not user-provided).

// none of the types below are an aggregate in C++20
struct S{
    int x{2};
    S(int) = delete; // user-declared ctor
};

struct X{
    int x;
    X() = default;  // user-declared ctor
};

struct Y{
    int x;
    Y();            // user-provided ctor
};

Y::Y() = default;

void f(){
    S s(1);     // always an error
    S s2{1};    // OK in C++17, error in C++20, S is not an aggregate now
    X x{1};     // OK in C++17, error in C++20
    Y y{2};     // always an error
}

Class template argument deduction for aggregates

In C++17 to use aggregates with CTAD we need explicit deduction guides, that’s unnecessary now:

template
struct S{
    T t;
    U u;
};
// deduction guide was needed in C++17
// template
// S(T, U) -> S;

S s{1, 2.0};    // S

CTAD isn’t involved when there are user-provided deduction guides:

template
struct MyData{
    T data;
};
MyData(const char*) -> MyData;

MyData s1{"abc"};   // OK, MyData using deduction guide
MyData s2{1};  // OK, explicit template argument
MyData s3{1};       // Error, CTAD isn't involved

Can deduce array types:

template
struct Array{
    T data[N];
};

Array a{{1, 2, 3}}; // Array, notice additional braces
Array str{"hello"}; // Array

Brace elision doesn’t work for dependent non-array types or array types of dependent bound.

template
struct Pair{
    T first;
    U second;
};

template
struct A1{
    T data[N];
    T oneMore;
    Pair p;
};

template
struct A2{
    T data[3];
    T oneMore;
    Pair p;
};

// A1::data is an array of dependent bound and A1::p is a dependent type, thus,
// no brace elision for them
A1 a1{{1,2,3}, 4, {5, 6}};  // A1
// A2::data is an array of non-dependent bound and A1::p is a non-dependent type,
// thus, brace elision works
A2 a2{1, 2, 3, 4, 5, 6};    // A2

Works with pack expansions. Trailing aggregate element that is a pack expansion corresponds to all remaining elements:

template
struct Overload : Ts...{
    using Ts::operator()...;
};
// no need for deduction guide anymore

Overload p{[](int){
        std::cout << "called with int";
    }, [](char){
        std::cout << "called with char";
    }
};     // Overload
p(1);   // called with int
p('c'); // called with char

Non-trailing element that is a pack expansions corresponds to no elements:

template
struct Pack : Ts... {
    T x;
};

// can deduce only the first element
Pack p1{1};         // Pack
Pack p2{[]{}};      // Pack
Pack p3{1, []{}};   // error

Number of elements in the pack is deduced only once but types should match exactly if repeated:

struct A{};
struct B{};
struct C{};
struct D{
    operator C(){return C{};}
};

template
struct P : std::tuple, Ts...{
};

P{std::tuple{}, A{}, B{}, C{}}; // P

// equivalent to the above, since pack elements were deduced for
// std::tuple there's no need to repeat their types
P{std::tuple{}, {}, {}, {}}; // P

// since we know the whole P type after std::tuple initializer, we can
// omit trailing initializers, elements will be value-initialized as usual
P{std::tuple{}, {}, {}}; // P

// error, pack deduced from first initializer is  but got  for
// the trailing pack, implicit conversions are not considered
P{std::tuple{}, {}, {}, D{}};

Parenthesized initialization of aggregates

Parenthesized initialization of aggregates now works in the same way as braced initialization except that narrowing conversions are permitted, designated initializers are not allowed, no lifetime extension for temporaries and no brace elision. Elements without initializer are value-initialized. This allows seamless usage of factory functions like std::make_unique<>()/emplace() with aggregates.

struct S{
    int a;
    int b = 2;
    struct S2{
        int d;
    } c;
};

struct Ref{
    const int& r;
};

int GetInt(){
    return 21;
}

S{0.1}; // error, narrowing
S(0.1); // OK

S{.a=1}; // OK
S(.a=1); // error, no designated initializers

Ref r1{GetInt()}; // OK, lifetime is extended
Ref r2(GetInt()); // dangling, lifetime is not extended

S{1, 2, 3}; // OK, brace elision, same as S{1,2,{3}}
S(1, 2, 3); // error, no brace elision

// values without initializers take default values or value-initialized(T{})
S{1}; // {1, 2, 0}
S(1); // {1, 2, 0}

// make_unique works now
auto ps = std::make_unique(1, 2, S::S2{3});

// arrays are also supported
int arr1[](1, 2, 3);
int arr2[2](1); // {1, 0}

Non-type template parameters

Class types in non-type template parameters

Non-type template parameters now can be of literal class types( types that can be used as a constexpr variable) with all bases and non-static members being public and non-mutable(literally, there should be no mutable specifier). Instances of such classes are stored as const objects and you can even call their member functions. There’s a new kind of non-type template parameter: placeholder for a deduced class type. In the example below, fixed_string is a template name, not a type name, but we can use it to declare template parameter template. In such a case, the compiler will deduce template arguments for fixed_string before instantiating f<>() using an invented declaration in the form of T x = template-argument;. Here’s how it can be used to create a simple compile-time string class:

template struct fixed_string{ constexpr fixed_string(const char (&s)[N+1]) { std::copy_n(s, N + 1, str); } constexpr const char* data() const { return str; } constexpr std::size_t size() const { return N; } char str[N+1]; }; template fixed_string(const char (&)[N])->fixed_string; // user-defined literals are also supported template constexpr auto operator""_cts(){ return S; } // N for `S` will be deduced template void f(){ std::cout << S.data() << ", " << S.size() << '\n'; } f<"abc">(); // abc, 3 constexpr auto s = "def"_cts; f(); // def, 3

Generalized non-type template parameters

Non-type template parameters are generalized to so-called structural types. Structural type is one of:

scalar type(arithmetic, pointer, pointer-to-member, enumeration, std::nullptr_t)

lvalue reference

literal class type with the following properties: all base classes and non-static data members are public and non-mutable, and their types are structural or array types.

This allows usage of floating-point and class types as a template parameters:

template // placeholder for any non-type template parameter struct X{}; template struct Arr{ T data[N]; }; X<5> x1; X<'c'> x2; X<1.2> x3; // with the help of CTAD for aggregates X x4; // X> X x5; // X>

Interesting moment here is that non-type template arguments are compared not with their operator==() but in a bitwise-like manner(the exact rules are here). That is, their bit representation is used for comparison. unions are exceptions because the compiler can track their active members. Two unions are equal if they both have no active member or have the same active member with equal value.

template struct S{}; union U{ int a; int b; }; enum class E{ A = 0, B = 0 }; struct C{ int x; bool operator==(const C&) const{ // never equal return false; } }; constexpr C c1{1}; constexpr C c2{1}; assert(c1 != c2); // not equal using operator==() assert(memcmp(&c1, &c2, sizeof(C)) == 0); // but equal bitwise // thus, equal at compile-time, operator==() is not used static_assert(std::is_same_v, S>); constexpr E e1{E::A}; constexpr E e2{E::B}; // equal bitwise, enum's identity isn't taken into account assert(memcmp(&e1, &e2, sizeof(E)) == 0); static_assert(std::is_same_v, S>); // thus, equal at compile-time constexpr U u1{.a=1}; constexpr U u2{.b=1}; // equal bitwise but have different active members(a vs. b) assert(memcmp(&u1, &u2, sizeof(U)) == 0); // thus, not equal at compile-time static_assert(!std::is_same_v, S>);

Structured bindings

Lambda capture and storage class specifiers for structured bindings

Structured bindings are allowed to have [[maybe_unused]] attribute, static and thread_local specifiers. Also, it’s possible now to capture them by-value or by-reference in lambdas. Note that bound bit-fields can be captured only by-value.

struct S{ int a: 1; int b: 1; int c; }; static auto [A,B,C] = S{}; void f(){ [[maybe_unused]] thread_local auto [a,b,c] = S{}; auto l = [=](){ return a + b + c; }; auto m = [&](){ // error, can't capture bit-fields 'a' and 'b' by-reference // return a + b + c; return c; }; }

Relaxing the structured bindings customization point finding rules

One of ways for a type to be decomposed for structured bindings is through a tuple-like API. It consists of three “functions”: std::tuple_element, std::tuple_size and two options for get: e.get() or get(e) where the first has priority over the second. That is, the member get() is preferred over non-member one. Imagine a type that has get() but it’s not for a tuple-like API, for example std::shared_ptr::get(). Such a type can’t be decomposed because the compiler will try to use member get() and it won’t work. Now this rule has been fixed in a way that the member version is preferred only if it’s a template and its first template parameter is a non-type template parameter.

struct X : private std::shared_ptr{ std::string payload; }; // due to new rules, this function is used instead of std::shared_ptr::get template std::string& get(X& x) { if constexpr(N==0) return x.payload; } namespace std { template<> class tuple_size : public std::integral_constant {}; template<> class tuple_element<0, X> { public: using type = std::string; }; } void f(){ X x; auto& [payload] = x; }

Allow structured bindings to accessible members

This fix allows structured bindings not only to public members but to accessible members in the context of structured binding declaration.

struct A { friend void foo(); private: int i; }; void foo() { A a; auto x = a.i; // OK auto [y] = a; // Ill-formed until C++20, now OK }

Range-based for-loop

init-statements for range-based for-loop

Similar to if-statement, range-based for-loop now can have init-statement. It can be used to avoid dangling references:

class Obj{ std::vector& GetItems(); }; Obj GetObj(); // dangling reference, lifetime of Obj return by GetObj() is not extended for(auto x : GetObj().GetCollection()){ // ... } // OK for(auto obj = GetObj(); auto item : obj.GetCollection()){ // ... } // also can be used to maintain index for(std::size_t i = 0; auto& v : collection){ // use v... i++; }

Relaxing the range-based for-loop customization point finding rules

This one is similar to structured bindings customization point fix. To iterate over a range, range-based for-loop needs either free or member begin/end functions. Old rules worked in a way that if any member(function or variable) named begin/end was found then the compiler would try to use member functions. This creates a problem for types that have a member begin but no end or vice versa. Now member functions are used only if both names exist, otherwise free functions are used.

struct X : std::stringstream { // ... }; std::istream_iterator begin(X& x){ return std::istream_iterator(x); } std::istream_iterator end(X& x){ return std::istream_iterator(); } void f(){ X x; // X has member with name `end` inherited from std::stringstream // but due to new rules free begin()/end() are used for (auto&& i : x) { // ... } }

Attributes

[[likely]] and [[unlikely]]

[[likely]] and [[unlikely]] attributes give a hint to the compiler about likeliness of execution path so it can better optimize the code. They can be applied to statements(e.g. if/else-statements, loops) or labels(case/default).

int f(bool b){ if(b) [[likely]] { return 12; } else{ return 10; } }

[[no_unique_address]]

[[no_unique_address]] can be applied to a non-static non-bitfield data member to indicate that it doesn’t need a unique address. In practice, it’s applied to a potentially empty data member and the compiler can optimize it to occupy no space(like empty base optimization for members). Such a member can share the address of another member or base class.

struct Empty{}; template struct Cpp17Widget{ int i; T t; }; template struct Cpp20Widget{ int i; [[no_unique_address]] T t; }; static_assert(sizeof(Cpp17Widget) > sizeof(int)); static_assert(sizeof(Cpp20Widget) == sizeof(int));

[[nodiscard]] with message

Like [[deprecated("reason")]], nodiscard now can have a reason too.

// test whether it's supported static_assert(__has_cpp_attribute(nodiscard) == 201907L); [[nodiscard("Don't leave me alone")]] int get(); void f(){ get(); // warning: ignoring return value of function declared with // 'nodiscard' attribute: Don't leave me alone }

[[nodiscard]] for constructors

This fix explicitly allows applying [[nodiscard]] to constructors(compilers were not required to support it prior to C++20).

struct resource{ // empty resource, no harm if discarded resource() = default; [[nodiscard("don't discard non-empty resource")]] resource(int fd); }; void f(){ resource{}; // OK resource{1}; // warning }

Character encoding

char8_t

C++17 introduced the u8 character literal for UTF-8 string but its type was plain char. The inability to distinguish encoding by a type resulted in a code that had to use various tricks to handle different encodings. A new char8_t type was introduced to represent UTF-8 characters. It has the same size, signedness, alignment, etc, as unsigned char but it’s a distinct type, not an alias.

void HandleString(const char*){} // distinct function name is required to handle UTF-8 in C++17 void HandleStringUTF8(const char*){} // now it can be done using convenient overload void HandleString(const char8_t*){} void Cpp17(){ HandleString("abc"); // char[4] HandleStringUTF8(u8"abc"); // C++17: char[4] but UTF-8, // C++20: error, type is char8_t[4] } void Cpp20(){ HandleString("abc"); // char HandleString(u8"abc"); // char8_t }

Stronger Unicode requirements

Types char16_t and char32_t are now explicitly required to represent UTF-16 and UTF-32 string literals correspondingly. Universal character names(\Unnnnnnnn and \uNNNN) must correspond to ISO/IEC 10646 code points (0x0 - 0x10FFFF inclusive) and not to a surrogate code points (0xD800 - 0xDFFF inclusive), otherwise the program is ill-formed.

char32_t c{'\U00110000'}; // error: invalid universal character

Sugar

Designated initializers

Now it’s possible to initialize specific(designated) aggregate members and skip others. Unlike C, initialization order must be the same as in aggregate declaration:

struct S{ int x; int y{2}; std::string s; }; S s1{.y = 3}; // {0, 3, {}} S s2 = {.x = 1, .s = "abc"}; // {1, 2, {"abc"}} S s3{.y = 1, .x = 2}; // Error, x should be initialized before y

Default member initializers for bit-fields

Until C++20, to provide default value for a bit-field one had to create a default constructor, now that can be achieved using convenient default member initialization syntax:

// until C++20: struct S{ int a : 1; int b : 1; S() : a{0}, b{1}{} }; // since C++20: struct S{ int a : 1 {0}, int b : 1 = 1; };

More optional typename

typename can be omitted in contexts where nothing but a type name can appear(type in casts, return type, type aliases, member type, argument type of a member function, etc.):

template T::R f(); // OK, return type of a function declaration at global scope template void f(T::R); // Ill-formed (no diagnostic required), attempt to declare a // void variable template template struct PtrTraits{ using Ptr = void*; }; template struct S { using Ptr = PtrTraits::Ptr; // OK, in a defining-type-id T::R f(T::P p) { // OK, class scope return static_cast(p); // OK, type-id of a static_cast } auto g() -> S::Ptr; // OK, trailing-return-type T::SubType t; }; template void f() { void (*pf)(T::X); // Variable pf of type void* initialized with T::X void g(T::X); // Error: T::X at block scope does not denote a type // (attempt to declare a void variable) }

Nested inline namespaces

inline keyword is allowed to appear in nested namespace definitions:

// C++20 namespace A::B::inline C{ void f(){} } // C++17 namespace A::B{ inline namespace C{ void f(){} } }

using enum

Scoped enumerations are great, the only problem with them is their verbose usage (e.g. my_enum::enum_value). For example, in a switch-statement that checks every possible enum value, my_enum:: part should be repeated for each case-label. Using enum declaration introduces all enumeration’s names into the current scope so they are visible as unqualified names and my_enum:: part can be omitted. It can be applied to unscoped enumerations and even to a single enumerator.

namespace my_lib { enum class color { red, green, blue }; enum COLOR {RED, GREEN, BLUE}; enum class side {left, right}; } void f(my_lib::color c1, my_lib::COLOR c2){ using enum my_lib::color; // introduce scoped enum using enum my_lib::COLOR; // introduce unscoped enum using my_lib::side::left; // introduce single enumerator id // C++17 if(c1 == my_lib::color::red){/*...*/} // C++20 if(c1 == green){/*...*/} if(c2 == BLUE){/*...*/} auto r = my_lib::side::right; // qualified id is required for `right` auto l = left; // but not for `left` }

Array size deduction in new-expressions

This fix allows the compiler to deduce array size in new-expressions just like it does for local variables.

// before C++20 int p0[]{1, 2, 3}; int* p1 = new int[3]{1, 2, 3}; // explicit size is required // since C++20 int* p2 = new int[]{1, 2, 3}; int* p3 = new int[]{}; // empty char* p4 = new char[]{"hi"}; // works with parenthesized initialization of aggregates int p5[](1, 2, 3); int* p6 = new int[](1, 2, 3);

Class template argument deduction for alias templates

CTAD works with type aliases now:

template using IntPair = std::pair; double d{}; IntPair p0{1, d}; // C++17 IntPair p1{1, d}; // std::pair IntPair p2{1, p1}; // std::pair>

constinit

C++ has infamous “static initialization order fiasco” when order of initialization of static storage variables from different translation units is undefined. Variables with zero/constant initialization avoid this problem because they are initialized at compile-time. constinit enforces that variable is initialized at compile-time and unlike constexpr it allows non-trivial destructors. Second use-case for constinit is with non-initializing thread_local declarations. In such a case, it tells the compiler that the variable is already initialized, otherwise the compiler usually adds code to check and initialize it if required on each usage.

struct S { constexpr S(int) {} ~S(){}; // non-trivial }; constinit S s1{42}; // OK constexpr S s2{42}; // error because destructor is not trivial // tls_definitions.cpp thread_local constinit int tls1{1}; thread_local int tls2{2}; // main.cpp extern thread_local constinit int tls1; extern thread_local int tls2; int get_tls1() { return tls1; // pure TLS access } int get_tls2() { return tls2; // has implicit TLS initialization code }

Signed integers are two’s complement

That is, signed integers are now guaranteed to be two’s complement. This removes some undefined and implementation-defined behavior because the binary representation is fixed. Overflow for signed integers is still UB but these are well-defined now:

int i1 = -1; // left-shift for signed negative integers(previously undefined behavior) i1 <<= 1; // -2 int i2 = INT_MAX; // "unrepresentable" left-shift for signed integers(previously undefined behavior) i2 <<= 1; // -2 int i3 = -1; // right shift for signed negative integers, performs sign-extension(previously // implementation-defined) i3 >>= 1; // -1 int i4 = 1; i4 >>= 1; // 0 // "unrepresentable" conversions to signed integers(previously implementation-defined) int i5 = UINT_MAX; // -1

__VA_OPT__ for variadic macros

Allows more simple handlining of variadic macros. Expands to nothing if __VA_ARGS__ is empty and to its content otherwise. It’s especially useful when macro calls a function with some predefined argument(s) followed be optional __VA_ARGS__. In such a case, __VA_OPT__ allows to omit the trailing comma when __VA_ARGS__ are empty(thanks to Jérôme Marsaguet for bringing up this issue).

#define LOG1(...) \ __VA_OPT__(std::printf(__VA_ARGS);) \ std::printf("\n"); LOG1(); // std::printf("\n"); LOG1("number is %d", 12); // std::printf("number is %d", 12); std::printf("\n"); #define LOG2(msg, ...) \ std::printf("[" __FILE__ ":%d] " msg, __LINE__, __VA_ARGS__) #define LOG3(msg, ...) \ std::printf("[" __FILE__ ":%d] " msg, __LINE__ __VA_OPT__(,) __VA_ARGS__) // OK, std::printf("[" "file.cpp" ":%d] " "%d errors.\n", 14, 0); LOG2("%d errors\n", 0); // Error, std::printf("[" "file.cpp" ":%d] " "No errors\n", 17, ); LOG2("No errors\n"); // OK, std::printf("[" "file.cpp" ":%d] " "No errors\n", 20); LOG3("No errors\n");

Explicitly defaulted functions with different exception specifications

This fix allows exception specification of an explicitly defaulted function to differ from such specification of implicitly declared function. Until C++20 such declarations made the program ill-formed. Now it’s allowed and, of course, the provided exception specification is the actual one. This is useful when you want to enforce noexcept-ness of some operations. For example, due to strong exception guarantee, std::vector moves its elements into a new storage only if their move constructors are noexcept, otherwise elements are copied. Sometimes it’s desirable to allow this faster implementation even if elements can actually throw during move. As usual, when a function marked noexcept throws, std::terminate() is called.

struct S1{ // ill-formed until C++20 because implicit constructor is noexcept(true) S1(S1&&)noexcept(false) = default; // can throw }; struct S2{ S2(S2&&) noexcept = default; // implicitly generated move constructor would be `noexcept(false)` // because of `s1`, now it's enforced to be `noexcept(true)` S1 s1; }; static_assert(std::is_nothrow_move_constructible_v == false); static_assert(std::is_nothrow_move_constructible_v == true); struct X1{ X1(X1&&) noexcept = default; std::map m; // `std::map(std::map&&)` can throw }; struct X2{ // same as implicitly generated, it's `noexcept(false)` because of `std::map` X2(X2&&) = default; std::map m; // `std::map(std::map&&)` can throw }; std::vector v1; std::vector v2; // ... at some point, `push_back()` needs to reallocate storage // efficiently uses `X1(X1&&)` to move the elements to a new storage, // calls `std::terminate()` if it throws v1.push_back(X1{}); // uses `X2(const X2&)`, thus, copies, not moves elements to a new storage v2.push_back(X2{});

Destroying operator delete

C++20 introduces a class-specific operator delete() that takes a special std::destroying_delete_t tag. In such a case, the compiler will not call the object’s destructor before calling operator delete(), it should be called manually. This might be useful if object members should be used to extract information needed to free memory it occupies, for example to extract its valid size and call sized version of delete.

struct TrickyObject{ void operator delete(TrickyObject *ptr, std::destroying_delete_t){ // without destroying_delete_t object would have been destroyed here const std::size_t realSize = ptr->GetRealSizeSomehow(); // now we need to call the destructor by-hand ptr->~TrickyObject(); // and free storage it occupies ::operator delete(ptr, realSize); } // ... };

Conditionally explicit constructors

Just like noexcept(bool) we now have explicit(bool) to make constructor/conversion conditionally explicit.

template struct S{ explicit(!std::is_convertible_v) S(T){} }; void f(){ S sc = 'x'; // OK S ss1 = "x"; // Error, constructor is explicit S ss2{"x"}; // OK }

Feature-test macros

C++20 defines a set of preprocessor macros for testing various language and library features, the full list is here.

#ifdef __has_cpp_attribute // check __has_cpp_attribute itself before using it # if __has_cpp_attribute(no_unique_address) >= 201803L # define CXX20_NO_UNIQUE_ADDR [[no_unique_address]] # endif #endif #ifndef CXX20_NO_UNIQUE_ADDR # define CXX20_NO_UNIQUE_ADDR #endif template class Widget{ int x; CXX20_NO_UNIQUE_ADDR T obj; };

Known-to-unknown bound array conversions

Allows conversion from array of known bound to the reference to array of unknown bound. Overload resolution rules have also been updated so that overload with matching size is better than overload with unknown or non-matching size.

void f(int (&&)[]){}; void f(int (&)[1]){}; void g() { int arr[1]; f(arr); // calls `f(int (&)[1])` f({1, 2}); // calls `f(int (&&)[])` int(&r)[] = arr; }

Implicit move for more local objects and rvalue references

In certain cases the compiler is allowed to replace copy with move. But it turned out that rules were too restrictive. C++17 didn’t allow to move rvalue references in return statements, function parameters in throw expressions, and various forms of conversions unreasonably prevented moving. C++20 fixed these issues but some problems are still here, see P2266R0 Simpler implicit move.

std::unique_ptr f0(std::unique_ptr && ptr) { return ptr; // copied in C++17(thus, error), moved in C++20, OK } std::string f1(std::string && x) { return x; // copied in C++17, moved in C++20 } struct Widget{}; void f2(Widget w){ throw w; // copied in C++17, moved in C++20 } struct From { From(Widget const &); From(Widget&&); }; struct To { operator Widget() const &; operator Widget() &&; }; From f3() { Widget w; return w; // moved (no NRVO because of different types) } Widget f4() { To t; return t;// copied in C++17(conversions were not considered), moved in C++20 } struct A{ A(const Widget&); A(Widget&&); }; struct B{ B(Widget); }; A f5() { Widget w; return w; // moved } B f6() { Widget w; return w; // copied in C++17(because there's no B(Widget&&)), moved in C++20 } struct Derived : Widget{}; std::shared_ptr f7() { std::shared_ptr result; return result; // moved } Widget f8() { Derived result; // copied in C++17(because there's no Base(Derived)), moved in C++20 return result; }

Conversion from T* to bool is narrowing

Conversions from pointer or pointer-to-member types to bool are narrowing now and can’t be used in places where such conversions are not allowed. nullptr is OK when used with direct initialization.

struct S{ int i; bool b; }; void f(){ void* p; S s{1, p}; // error bool b1{p}; // error bool b2 = p; // OK bool b3{nullptr}; // OK bool b4 = nullptr; // error bool b5 = {nullptr};// error if(p){/*...*/} // OK }

Deprecate some uses of volatile

Deprecates volatile in various contexts:

built-in prefix/postfix increment/decrement operators on volatile-qualified variables

usage of the result of an assignment to volatile-qualified object

built-in compound assignments in form of E1 op= E2(e.g. a += b) when E1 is volatile-qualified

volatile-qualified return/parameter type

volatile-qualified structured binding declarations

Note that volatile-qualified means top-level qualification, not just any volatile in a type. Something like volatile int* px is actually pointer-to-volatile-int, thus, not volatile-qualified.

volatile int x{}; x++; // deprecated int y = x = 1; // deprecated x = 1; // OK y = x; // OK x += 2; // deprecated volatile int //deprecated f(volatile int); //deprecated

Deprecate comma operator in subscripts

Comma operator inside subscripts is deprecated to allow a multidimensional (variadic) subscript operator in the future. Current approach for this is to have a custom path_type with overloaded path_type::operator,() and operator[](path_type). Variadic operator[] will eliminate the need for such dirty tricks.

// current approach struct SPath{ SPath(int); SPath operator,(const SPath&); // store path somehow }; struct S1{ int operator[](SPath); // use path }; S1 s1; auto x1 = s1[1,2,3]; // deprecated auto x2 = s1[(1,2,3)]; // OK // future approach struct S2{ int operator[](int, int, int); // or, as a variadic template template int operator[](IndexType...); }; S2 s2; auto x3 = s2[1,2,3];

Fixes

Here I put minor fixes. Some of them have been implemented by compilers for a while but were not reflected in the Standard. Perhaps, you won’t notice any major changes in practice.

Initializer list constructors in class template argument deduction

// C++17 std::tuple t{std::tuple{1, 2}}; // std::tuple std::vector v{std::vector{1,2,3}}; // std::vector>

In this example, two syntactically similar initializations result in surprisingly different CTAD-deduced types. That’s because std::vector has and prefers std::initializer_list constructor, std::tuple doesn’t have one so it prefers copy constructor.
With this fix, copy constructor is preferred to list constructor when initializing from a single element whose type is a specialization or a child of specialization of the class template under construction.

// C++20 std::tuple t{std::tuple{1, 2}}; // std::tuple std::vector v{std::vector{1,2,3}}; // std::vector // this example is from "C++17" book by N. Josuttis, section 9.1.1 // now it has consistent behavior across compilers template auto make_vector(const Args&... elems) { return std::vector{elems...}; } auto v2 = make_vector(std::vector{1,2,3}); // std::vector

const&-qualified pointers to members

The problem was that using .* with rvalue with reference qualified pointer to member function was not allowed. Now it’s fine.

struct S { void f() const& {} }; S{}.f(); // OK (S{}.*&S::f)(); // could be an error on some old compilers

Simplifying implicit lambda capture

This simplifies wording for lambda capture. Lambdas within default member initializers now officially can have capture list, their enclosing scope is the class scope:

struct S{ int x{1}; int y{[&]{ return x + 1; }()}; // OK, captures 'this' };

Entities are implicitly captured even within discarded statements and typeid:

template void f1() { std::unique_ptr p; [=]() { if constexpr (B) { (void)p; // always captures p } }(); } f1(); // error, can't capture unique_ptr by-value void f2() { std::unique_ptr p; [=]() { typeid(p); // error, can't capture unique_ptr by-value }(); } void f3() { std::unique_ptr p; [=]() { sizeof(p); // OK, unevaluated operand }(); }

const mismatch with defaulted copy constructor

This fix allows type to have defaulted copy constructor that takes its argument by const reference even if some of its members or base classes has copy constructor that takes its argument by non-const reference until that constructor is actually needed:

struct NonConstCopyable{ NonConstCopyable() = default; NonConstCopyable(NonConstCopyable&){} // takes by non-const reference NonConstCopyable(NonConstCopyable&&){} }; // std::tuple(const std::tuple& other) = default; // takes by const reference void f(){ std::tuple t; // error in C++17, OK in C++20 auto t2 = t; // always an error auto t3 = std::move(t); // OK, move-ctor is used }

Access checking on specializations

Allows usage of protected/private type to be used as template arguments for partial specialization, explicit specialization and explicit instantiation.

template void f(){} template struct Trait{}; class C{ class Impl; // private }; template<> struct Trait{}; // OK template struct Trait; // OK class C2{ template struct Impl; // private }; template struct Trait>; // OK

ADL and function templates that are not visible

Unqualified-id that is followed by a < and for which name lookup finds nothing or finds a function is treated as a template-name in order to potentially cause argument dependent lookup to be performed.

int h; void g(); namespace N { struct A {}; template int f(T); template int g(T); template int h(T); } // OK: lookup of `f` finds nothing, `f` treated as a template name auto a = f(N::A{}); // OK: lookup of `g` finds a function, `g` treated as a template name auto b = g(N::A{}); // error: `h` is a variable, not a template function auto c = h(N::A{}; // OK, `N::h` is qualified-id auto d = N::h(N::A{});

In rare cases, this can break existing code if there’s operator<() for functions but it was considered as a pathological case by committee:

struct A {}; bool operator <(void (*fp)(), A); void f(){} int main() { A a; f < a; // OK until C++20, now error (f) < a; // OK }

Specify when constexpr function definitions are needed for constant evaluation

This fix specifies when constexpr functions are instantiated. These rules are pretty tricky but most of the time everything works as expected. Instead of copy-pasting them here I will only show a couple of examples to demonstrate the problem.

struct duration { constexpr duration() {} constexpr operator int() const { return 0; } }; // duration d = duration(); // #1 int n = sizeof(short{duration(duration())}); // always OK since C++20

Remember that special member functions are defined only when they are used. In C++17 terms move constructor is not used and not defined here so the program should be ill-formed. But, if line #1 would be uncommented, move constructor would become used and defined so the program would be OK. It makes no sense and rules have been changed to reflect this.

Another example:

template constexpr int f() { return T::value; } template void g(decltype(B ? f() : 0)); template void g(...); template void h(decltype(int{B ? f() : 0})); template void h(...); void x() { g(0); // OK h(0); // error }

Here we have constexpr template function that will potentially be instantiated with type int and should lead to an error because int::value is wrong. Then there are two functions that use B ? f() : 0 where B is always false so f() is never needed. The question is: should f be instantiated here?
New rules clarify what’s needed for constant evaluation, template variables or functions in such expressions are always instantiated even if they are not required to evaluate an expression. One of such cases is braced initializer list, thus, in expression int{B ? f() : 0} f is always instantiated which leads to an error.

Implicit creation of objects for low-level object manipulation

In C++17 an object can be created by a definition, by a new-expression or by changing the active member of a union. Now, consider this example:

struct X { int a, b; }; X *make_x() { X* p = (X*)malloc(sizeof(struct X)); p->a = 1; // UB in C++17, OK in C++20 return p; }

Although it looks natural, in C++17 this code has undefined behavior because X is not created according to the language rules and write to a member of a nonexistent entity is UB. Rules for such cases have been clarified by specifying what types can be created implicitly and what operations can create such objects implicitly. Types that can be created implicitly(implicit-lifetime types):

scalar types

aggregate types

class types with any eligible trivial constructor and trivial destructor

Operations that can create implicit-lifetime objects implicitly:

operations that begin the lifetime of an array of char, unsigned char, std::byte

operator new and operator new[]

std::allocator::allocate(std::size_t n)

C library allocation functions: aligned_alloc, calloc, malloc, and realloc

memcpy and memmove

std::bit_cast

Also, the rule for pseudo-destructor(destructor for built-in types) has been changed. Until C++20 it has no effect, now it ends object’s lifetime:

int f(){ using T = int; T n{1}; n.~T(); // no effect in C++17, ends n's lifetime in C++20 return n; // OK in C++17, UB in C++20, n is dead now }

You can find more detailed explanation in this post: Objects, their lifetimes and pointers by Dawid Pilarski.

References

C++20 feature list
Complete and grouped list of all papers for each feature
C++ Weekly
CppCon 2019: Jonathan Müller “Using C++20’s Three-way Comparison ＜=＞”
CppCon 2019: Timur Doumler “C++20: The small things”
C++ standard draft

Allow only pure data structs with clang-tidy

2020-11-04T16:33:00+00:00

Introduction

struct is for data, class is for invariant. This is what guides told us(Core guidelines, Google C++ style guide, Fluent C++ blog). Sounds like a good candidate for another simple clang-tidy check. While implementation of such a check is more or less simple, defining the actual constraints for it is not. In this article I’ll discuss what could pure data struct mean and flavors it could have. Then I’ll briefly show my implementation of a corresponding clang-tidy check.

Pure data struct

Naively, pure data struct means that there are only data members and nothing more, just like C struct:

struct Point{ int x; int y; };

But in C++ world we can have much more in it and still consider that as a data. It turned out that it can have several levels of strictness, that’s why I decided to make this tool configurable instead of hardcoding one the only way.

Static members

Because static members are not part of an object’s state, I decided to completely ignore them. Also, I don’t care whether they are public or not.

Stateless structs

structs are often used for various TMP tricks and for simple callables:

struct less{ bool operator()(const Widget&, const Widget&){ //... } };

In such context they are simply a shorthand for a stateless class with all members being public. However, in some codebases, especially old ones, you can find this trick(which I also consider as a non-data struct):

struct WidgetList : std::vector{};

Thus, instead of always skipping stateless structs, I added an option SkipStateless to control it. By default it’s true.

Data members

Obviously, only public data members are allowed. What about default member initialization, do you think it violates pure data notion? Consider this simple example:

struct Point{ int x{}; int y{}; };

It doesn’t have invariants but it has a kind of a contract, specifically postcondition:

Point p; assert((p.x == 0) && (p.y == 0));

It binds magic values(0s in this particular case) to its members. As opposed to class, struct should model a collection of values without any control or logic over them. Not everyone will share it, so I made AllowDefaultMemberInit option for it, it’s false by default.

Member functions

At first sight no member functions should be allowed. Since all data is public there’s no need for them. How about special members? Google guide says:

Constructors, destructors, and helper methods may be present; however, these methods must not require or enforce any invariants.

I feel that struct should not add behavior, only collect related data. Thus, this tool doesn’t allow destructors and helper methods(whatever that means). Hand-written copy/move constructors are also not allowed but you can still =default or =delete them. What’s left I called primary constructors, that is, non-copy non-move constructors.
Default constructor can be used instead of default member initialization(before C++20 default values for bit-fields could be set only in constructors) and they also can have a body. Having a body means that constructor does something beyond trivial initialization and should be considered suspicious. Nevertheless, it’s not uncommon, hence another option: AllowNonEmptyCtorBody which is false by default.
First kind of a non-default constructor is what I called memberwise constructor, it enforces the initialization of all members at once and looks like a good practice:

struct Point{ int x; int y; Point(int x, int y) : x{x}, y{y} {} }; Point p1; // error Point p2{1}; // error Point p3{1, 2}; // OK

Although there’s no invariant nor contract, I consider this a poor interface. It guarantees that initial Point contains valid and related coordinate values but there’s no guarantee that this relation will always be respected. The better way is to have two separate types, one for coordinate values and another one to manipulate them:

// pure data struct Coordinate{ int x; int y; }; // manipulation interface class Point{ public: Point(Coordinate c) : c{c}{} // or Point(int x, int y); void Update(Coordinate c) { this->c = c; } //... private: Coordinate c; };

Second kind of a non-default constructor is purely custom one. It can have any kind of parameters, related or not to struct’s data members. I consider this a bad design because it either implies invariant or works as a conversion from another set of values, which should be implemented as a free function.
With that in mind, I added another option: AllowedCtors which can have three values: none - no constructors are allowed, default - only default constructors are allowed, primary - all non-copy non-move constructors are allowed.

Inheritance

Things are simple here: only public inheritance is allowed and only another struct could be used as a base.

Conclusion

So what’s a pure data struct? It’s a struct that implies no invariant nor contract on its members beyond theirs own ones, and doesn’t add any kind of behavior to them. It should be used just to pack a set of logically related values. It should not be used to model a new type. In practice it means good old C-structs with rare static members and inheritance.

Implementation

Let’s summarize options introduced above:

SkipStateless - allows skip(0) or not(1) checking of structs without direct data members

AllowDefaultMemberInit - allows(1) or not(0) default member initialization

AllowNonEmptyCtorBody - allows(1) or not(0) non-empty body of allowed constructors

AllowedCtors - specifies what kind of constructors are allowed. Possible values: none - no constructors are allowed, default - only default constructors are allowed, primary - all non-copy non-move constructors are allowed

Our matcher should detect bad structs finding some bad parts in them. The top-level structure of a matcher is:

cxxRecordDecl( isStruct(), // proceed only if SkipStateless == false or has any data member anyOf(boolean(!SkipStateless), has(fieldDecl())), anyOf( // bad member method // OR bad data member // OR bad base specifier ))

We want to check only user-provided methods(hand-written, non-defaulted, non-deleted), and it’s easier to specify what’s allowed(static methods and several kinds of constructors):

// helper to check whether given constructor matches AllowedCtors requirements AST_MATCHER_P(CXXConstructorDecl, shouldAllowCtor, NonDataStructsCheck::AllowedCtorKind, AllowedCtors) { if (Node.isCopyOrMoveConstructor() || (AllowedCtors == NonDataStructsCheck::AllowedCtorKind::None)) { return false; } else if (AllowedCtors == NonDataStructsCheck::AllowedCtorKind::Primary) { return true; } else { return Node.isDefaultConstructor(); } } // it should have trivial(empty) body or be explicitly allowed through config const auto ShouldAllowNonEmptyCtorBody = anyOf(boolean(AllowNonEmptyCtorBody), hasTrivialBody()); cxxMethodDecl(isUserProvided(), unless(anyOf(isStaticStorageClass(), cxxConstructorDecl( shouldAllowCtor(AllowedCtors), ShouldAllowNonEmptyCtorBody))))

Bad data members are either non-public or the ones with default member initializers(in case they are not allowed):

// returns true when either AllowDefaultMemberInit is set or there's no default // member initializer const auto ShouldAllowDefaultMemberInit = anyOf(boolean(AllowDefaultMemberInit), unless(has(initListExpr()))); fieldDecl( unless(allOf(isPublic(), ShouldAllowDefaultMemberInit)))

And finally, bad base specifier is the one that’s non-public or non-struct:

anyOf(hasNonPublicBase(cxxRecordDecl()), hasDirectBase( cxxRecordDecl(unless(isStruct()))))

The full matcher:

cxxRecordDecl( isStruct(), anyOf(boolean(!SkipStateless), has(fieldDecl())), anyOf( has(cxxMethodDecl(isUserProvided(), unless(anyOf(isStaticStorageClass(), cxxConstructorDecl( shouldAllowCtor(AllowedCtors), ShouldAllowNonEmptyCtorBody)))) .bind("method")), has(fieldDecl( unless(allOf(isPublic(), ShouldAllowDefaultMemberInit))) .bind("field")), anyOf(hasNonPublicBase(cxxRecordDecl().bind("np_base")), hasDirectBase( cxxRecordDecl(unless(isStruct())).bind("ns_base"))) .bind("record"),

That’s it, you can find the full source code here. Maybe it doesn’t cover all possible cases but upgrading it to meet specific guide requirements should not be hard.

Enforce explicit/implicit ‘this’ with custom clang-tidy check

2020-10-16T18:34:00+00:00

Introduction

Recently, I’ve discovered an interesting topic: clang-tidy-based tools. The idea is that you get an AST representing all the details of your C++ code, what you can do with it is limited mostly by your imagination: detect bugs, calculate some code metrics, refactor, etc. You can take your old legacy codebase and convert it into a modern one. This idea of making changes at scale really fascinates me. To learn something you have to use it in a real-world. As I’ve recently made several contributions to CMake codebase, the idea quickly popped up in my mind for a refactoring tool. CMake has one strange part in its coding conventions which I’ve never seen in any other C++ codebase: explicit this usage. That is, like in JS or Python:

class Widget{ void Increment(){ this->x++; } int x; };

Because there’s no way to check it automatically, there are places where it’s not respected. I don’t know how this style was adopted, maybe it’s just some old artifact that is hard to get rid of by hand. So, I decided to write a tool that can do the both:

add explicit this if it’s missed

remove explicit this wherever possible

Workflow outline

Writing this clang-tidy tool involves several steps:

understand what C++ code you want to fix

find out how to detect it using Clang AST API

fix it

I will describe explicit/implicit parts separately after describing common things. I used clang-tidy-standalone as a base for this tool, you can build it without build LLVM itself, more information in my previous article.
Notice that this is not a complete Clast AST tutorial, you can find more information in the official documentation , various youtube talks, and clang-tidy sources.

Templates handling

Since template is only an outline for generated code, they are represented differently from non-templated code. In non-templated code all the types, variables and functions are known and checked. In template it’s not always possible before the actual instantiation. As a result, they are represented with different AST nodes. You have a choice: deal with template definition which contains unknown or type-dependent entities, or deal with instantiations where everything is known. You should know whether your check can produce different results for different template instantiations. Thankfully, that’s not the case for this tool, so I will deal only with instantiations. This in turn requires that all of your templates are used somewhere in a project or in a test suite so they are actually instantiated.

Macros handling

Since macros are just text replacements, they can have very different meaning in different contexts. Final AST represents code after preprocessing, so your tool can detect things that were composed from macro expansions. In most cases you can just skip such code, macro usage should be rare nowadays. But there’s at least one macro which I want to handle - assert(). It naturally contains things that I want to fix, for example:

void Widget::Reset(){ assert(this->ptr); *ptr = 0; }

For this reason, I’ve added simple regex-based macro filter:

if (ThisLocation.isMacroID()) { const auto MacroName = Lexer::getImmediateMacroName(ThisLocation, SM, getLangOpts()); if (!llvm::Regex(AllowedMacroRegexp).match(MacroName)) { return false; // skip } } // continue...

Another example of widely used macro is various loggers, my final macro-filter for CMake codebase looks like this: ^(assert|cm.*Log|cm.*Logger)$. Keep in mind that we only can handle things that are present after preprocessing, eliminated #ifdef blocks wouldn’t be there, so run your tool on various configurations.

Enforce explicit this

Target C++ code

Let’s start with the easier case, enforcing explicit this. Here’s our test case:

class Widget{ void Do(){ DoConst(); // should become this->DoConst(); x++; // should become this->x++; } void DoConst() const{} int x{}; };

That is, every access to member should become explicit. Since we’re dealing with already valid code and adding explicit this wouldn’t change its meaning, there’s nothing more to consider, we just need to find such places, check whether they have explicit this or not, and add it if missed.

Detecting it with Clang AST API

clang-query is a useful tool to examine generated AST, I left only important parts:

clang-check-10 --ast-dump example.cpp -- |-CXXMethodDecl 0x1ef3a88 line:2:10 Do 'void ()' | `-CompoundStmt 0x1ef3e18 | |-CXXMemberCallExpr 0x1ef3d88 'void' | | `-MemberExpr 0x1ef3d58 '' ->DoConst 0x1ef3ba8 | | `-ImplicitCastExpr 0x1ef3da8 'const Widget *' | | `-CXXThisExpr 0x1ef3d48 'Widget *' implicit this | `-UnaryOperator 0x1ef3e00 'int' postfix '++' | `-MemberExpr 0x1ef3dd0 'int' lvalue ->x 0x1ef3c60 | `-CXXThisExpr 0x1ef3dc0 'Widget *' implicit this

You can see that our target parts are represented as MemberExpr and CXXThisExpr with optional ImplicitCastExpr. Cast is there because we’re calling const function from non-const one, hence, casting Widget* to const Widget*. AST matcher for it is straightforward:

memberExpr(has( ignoringImpCasts( cxxThisExpr().bind("thisExpr")))) .bind("memberExpr")

bind() is needed to get access to the matched node, in our case we need MemberExpr and CXXThisExpr, thus, we bind them to names. In the CXXThisExpr documentation we can see isImplicit() method that does exactly what we need:

void EnforceThisStyleCheck::check(const MatchFinder::MatchResult &Result) { const auto ThisExpr = Result.Nodes.getNodeAs("thisExpr"); const auto MembExpr = Result.Nodes.getNodeAs("memberExpr"); // ... if (ThisExpr->isImplicit()) { addExplicitThis(*MembExpr); } }

Fix

Fixing is really simple, clang-tidy has a lot of examples of it, we have to provide hint, location, and text for our fix:

void EnforceThisStyleCheck::addExplicitThis(const MemberExpr &MembExpr) { const auto ThisLocation = MembExpr.getBeginLoc(); diag(ThisLocation, "insert 'this->'") << FixItHint::CreateInsertion(ThisLocation, "this->"); }

We use MemberExpr’s location instead of CXXThisExpr’s because in case of qualified names(Base::Method();) CXXThisExpr::getBeginLoc() points to the start of Method, not the start of a namespace.

Enforce implicit this

Target C++ code

This case is a bit harder because in some cases removing explicit this could change the meaning of code due to name lookup rules, in other cases it could result in a compilation error.

Special members

We can’t remove explicit this from a special member functions like destructors or operators:

void Widget::Do(){ this->~Widget(); this->operator=(Widget{}); }

This can happen only when member expression refers to a method, not to a variable. Thus, we need to get member declaration, check whether it’s a method, and then check it’s name:

static bool isNonSpecialMember(const MemberExpr &MembExpr) { const auto MemberDecl = MembExpr.getMemberDecl(); assert(MemberDecl); const auto MethodDecl = dyn_cast(MemberDecl); // CXXMethodDecl::getIdentifier() returns nullptr for special members return !MethodDecl || MethodDecl->getIdentifier(); }

Name conflicts

Consider this case:

void Widget::Do(int x){ this->x++; // increment member x++; // increment argument }

If we remove explicit this from the expression at line 2, it will increment argument instead of data member. Generally, any visible local name hides class member name during the lookup. Unfortunately, Clang doesn’t have the API to detect such conflicts, so I choose less precise but easier to implement way(thanks to Nicolás Alvarez for this idea):

static bool hasVariableWithName(const CXXMethodDecl &Function, ASTContext &Context, const StringRef Name) { const auto Matches = match(decl(hasDescendant(varDecl(hasName(Name)))), Function, Context); return !Matches.empty(); }

This method enumerates all declared variables(including arguments) in the function, ignoring their visibility. It means that this code will be untouched even if it’s safe:

void Widget::Do(){ this->x++; // increment member x++; // still increment member but confusing int x; x++; // increment local variable }

Dependent names

template class Derived : public Base{ void Do(){ this->baseCounter++; // baseCounter is defined somewhere in Base } };

C++ requires dependent member names to be prepended with explicit this, thus, removing it here will yield a compile-time error. In our case it means is that if name is provided by the base class, explicit this is required. So, removing explicit this from a name is safe when this name is a direct(non-inherited) member of a class:

static bool hasDirectMember(const CXXRecordDecl &Class, ASTContext &Context, const StringRef Name) { const auto Matches = match(cxxRecordDecl(has(namedDecl(hasName(Name)))), Class, Context); return !Matches.empty(); }

Now, we can create our final isRedundantExplicitThis() function:

static bool isRedundantExplicitThis(const MemberExpr &MembExpr, const CXXMethodDecl &MethodDecl, ASTContext &Context) { return (isNonSpecialMember(MembExpr) && !hasVariableWithName(MethodDecl, Context, MembExpr.getMemberDecl()->getName()) && !isDependentName(MethodDecl, MembExpr, Context)); }

And, because we need access to the corresponding CXXMethodDecl, our final matcher for both cases becomes:

cxxMethodDecl( isDefinition(), isUserProvided(), forEachDescendant( memberExpr(has(ignoringImpCasts(cxxThisExpr().bind("thisExpr")))) .bind("memberExpr"))) .bind("methodDecl")

isUserProvided() is self-explainable, we’re interested only in user-provided functions, not in compiler-generated ones.

Fix

Again, fixing is mostly simple. We have to provide hint and range for removal. Qualified names require special handling because we don’t want to remove namespace part.

void EnforceThisStyleCheck::removeExplicitThis(const SourceManager &SM, const MemberExpr &MembExpr) { const auto ThisStart = MembExpr.getBeginLoc(); auto ThisEnd = MembExpr.getMemberLoc(); if (MembExpr.hasQualifier()) { ThisEnd = MembExpr.getQualifierLoc().getBeginLoc(); } const auto ThisRange = Lexer::makeFileCharRange( CharSourceRange::getCharRange(ThisStart, ThisEnd), SM, getLangOpts()); diag(ThisStart, "remove 'this->'") << FixItHint::CreateRemoval(ThisRange); }

Results

Applying to CMake codebase:

enforce explicit this: 129 files changed, 4689 insertions

enforce implicit this: 406 files changed, 23237 insertions

Full source code is here.
CMake branch with explicit this is here.
CMake branch with implicit this is here.

I’m pretty satisfied with the result. The whole tool takes <170 lines of code. Hope that in future there will be more good tutorials to make this framework more available to more people.

Creating your own clang-tidy checks without building LLVM

2020-10-06T14:10:00+00:00

What is clang-tidy

clang-tidy is a static analysis tool based on Clang’s LibTooling library. It can find and sometimes fix subtle problems in your code or just make it look better. The list of checks is pretty extensive.

Create your own tool

What’s more interesting is that you can create your own tool that can detect/fix some problems with your code, enforce your custom coding style, refactor it and so on.

There are two options for that:

use LibTooling

create custom clang-tidy check

When I started, I chose the LibTooling way. But it turned out that it’s pretty low-level, you need to understand a lot more things, and write more boiler-plate code to create useful tool. Custom clang-tidy check is a much better option, it’s actively supported, you can use existing checks as an example for your own ones. It has the whole infrastructure like diagnostic messages, deduplication of fix-it replacements, run-clang-tidy.py to run your check in parallel and so on.

Little problem

The only problem with clang-tidy check is that according to official manual you have to build it from sources which means you have to build the whole LLVM. Why should someone who wanted to play with simple checks to build LLVM? And I’m afraid to imagine how long it will take on my 2013 2-core MBP laptop.

Solution

So, I wanted to have full clang-tidy infrastructure without building LLVM. Thankfully, LLVM has Debian/Ubuntu prebuilt packages which include all required libraries for clang-tidy. The only remaining thing is to link clang-tidy sources with it. It turned out to be pretty simple. We need to edit three CMakeLists.txts, replace parts responsible for building libraries from sources with parts that do link to static libraries. I’ve removed all checks to make it as light as possible. Now you can follow the official manual, use add_new_check.py to create a new check, build it with prebuilt packages, and run it with run-clang-tidy.py in parallel.

The resulting repository is here. It requires LLVM 10 packages. Porting it to next versions should be simple but it’s not in my plans right now.

Reducing CMake heap usage part 2: Know your tools

2020-09-06T18:14:00+00:00

Introduction

At the end of the previous post, after all those optimizations I stated:

For more complex configurations the economy is of course lower. Partly because there’s an old parsing routine that allocates a lot and becomes the major memory consumer.

After a while, I decided to investigate why so much memory is used and found a surprisingly easy way to fix it.

Previous results

Here’re the overall results of previous optimizations(total allocated bytes (number of allocations)):

empty project: 65 MB (394k) -> 39 MB (280k)

google benchmark: 233 MB (1344k) -> 196 MB (1190k)

heaptrack: 305 MB (1308k) -> 268 MB (1148k)

As you can see, not a big improvement for the last two projects. Let’s take a look at heaptrack report of heaptrack itself:

We can see that the top consumer is cmCommandArgumet_yyalloc()(1), it stems from cmMakefile::ExpandVariablesInStringOld()(2) that in turn stems from cmMakefile::ExpandArguments()(3). Also, notice the huge difference between the first(1) and second(4) heap consumers: 119 MB vs 1 MB correspondingly.

What’s going on?

Looking at that report, I had several questions:

What is ExpandVariablesInStringOld()?

Why does it eat so much memory?

Why doesn’t it exist in the report of the empty project?

I’ll answer the (1) and (3) first, then the (2).

Argument expansion

Argument expansion is a process of replacing variable reference(${var}) with variable’s value(var_value) and, for unquoted arguments, replacing list(a;b;c) with its elements as separate arguments(a, b, c). CMake does this for every argument of every command call using ExpandArguments(), so this function is called pretty frequently.

Variable references could be nested and mixed with plain strings(ab_${cd_${ef}}_$ENV{env_var}), the algorithm for their expansion is not so trivial. CMake uses Flex scanner and Bison parser to do this, the driver function to run them is called ExpandVariablesInStringOld(). Why old? Because that was the case before CMake 3.1. Back then, Flex/Bison implementation was considered to be slow and inefficient(which is strange because these tools are used for decades) and the whole thing was replaced with hand-crafted implementation(cmMakefile::ExpandVariablesInStringNew()).

Old vs. new

If it was replaced in CMake 3.1 then why was it used in CMake 3.18 when I configured google benchmark and heaptrack? Because of the cmake_minimum_required() command. Roughly speaking, when you set a minimum required version for your project, CMake adjusts its behavior to that version. So, if you call it with anything below 3.1, CMake will use ExpandVariablesInStringOld(). Heaptrack calls cmake_minimum_required(VERSION 2.8.12) and google benchmark includes google test which calls cmake_minimum_required(VERSION 2.8.8).

The problem

Now, we can reproduce that enormous heap consumption with empty project:

cmake_minimum_required(VERSION 2.9) project(empty)

Heaptrack report is quite similar to the above one:

Here and further I’ll omit cmCommandArgument_ part of function names, that’s just a prefix to avoid name clashes. We can see that yylex()(1) allocates a lot of memory using yy_create_buffer()(2). FYI, yylex()(Flex part) is responsible for reading the input and returning the token to yyparse()(Bison part). Here’s the troublesome part of the yylex() code:

#define YY_BUF_SIZE 16384 int yylex() { if(!init) { yy_create_buffer(YY_BUF_SIZE); } //... }

We can see that Flex allocates 16 KB of memory for whatever purpose. Recall that ExpandVariablesInStringOld() is called for every argument, thus, Flex allocates 16 KB of RAM thousands of times, and it doesn’t depend on argument structure nor its size. Looks pretty bad, huh?

Flex input management

Why does Flex need that buffer? Flex can be configured to take its input from the file handle(the default is stdin, i.e., the terminal) or from the provided buffer. When it’s a file handle, obviously, it needs a buffer for the data to be read, so it allocates 16 KB for that purpose. When it’s configured to read from the buffer it doesn’t need that additional 16 KB because all data is already provided. Flex needs mutable buffer, client has a choice: provide a mutable buffer or allow Flex to make a copy of immutable one. Sounds reasonable?

Arguments are of course located in string buffers, not in file, why, after all, was that 16 KB allocated? Well, because for whatever reasons CMake uses the third, kinda tricky, way: it doesn’t configure Flex to read from the buffer, instead it replaces file reading routine through Flex macro YY_INPUT with code that do actual read from buffer. Thus, Flex thinks it’s going to read from file, allocates 16 KB for file buffer, and calls overridden file reading routine. Arguments are usually small strings, much less than 16 KB, hence we got that huge overconsumption.

The fix

The fix was fairly simple. I replaced all that hackery with the call to public API yy_scan_string() and everything just worked. Let’s check heaptrack report:

Now, yyalloc()(1) has allocated 1.1 MB instead of 83 MB :)

Overall results(total allocated bytes (number of allocations)):

empty project: 120 MB (354k) -> 38 MB (351k)

google benchmark: 233 MB (1344k) -> 137 MB (1153k)

heaptrack: 305 MB (1308k) -> 140 MB (1113k)

Benchmark

Like I said before, I was surprised that Flex/Bison solution was replaced by hand-crafted parser due to the poor performance of the former. So I decided to do a little benchmark of three methods: old ExpandVariablesInStringOld(), new one, and hand-written ExpandVariablesInStringNew(). I used this simple file:

# cmake_policy(SET CMP0053 OLD) # use ExpandVariablesInStringOld() cmake_policy(SET CMP0053 NEW) # use ExpandVariablesInStringNew() function(sink) endfunction() foreach(i RANGE 1000000) sink(simple_var ${simple_var_ref} ${nested_${var_${ref}}}) endforeach()

It’s main purpose is to generate a lot of argument expansion calls. Measurement was done with

/usr/bin/time -v cmake -P bench.cmake

Results:

hand-written ExpandVariablesInStringNew(): 4.53 sec

new ExpandVariablesInStringOld(): 5.38 sec

old ExpandVariablesInStringOld(): 12.71 sec

So yes, the old version was really slow, and had to be replaced. But was it slow due to Flex or Bison problems? Of course no. Actually, I suppose that if the old version had been done right, nobody would have thought about its replacement. Yes, it’s still slower by ~20% than the hand-written version, but this difference isn’t noticeable in real-world cases. Not to mention how easier it’s to understand and maintain Flex/Bison specs compared to hand-written code.

Conclusion

Know your tools, don’t reinvent the wheel. Widely used tools usually have all the needed APIs for nearly all common use-cases. If you find that it doesn’t fit your needs, consider you’re doing something wrong. Think hard to comprehend the problem, then learn what tool provides. And only when you understand them both, you can use some hacks or custom solutions.

Reducing CMake heap usage with Heaptrack

2020-08-30T20:02:00+00:00

Introduction

While working on previous CMake experiment I noticed several places that looked suspiciously suboptimal. As in case of any performance considerations everything should be measured so I decided to run it through a Heaptrack and check what’s going on.

Clean run

I used Heaptrack 1.2.80, CMake 3.18.1 with debug symbols and this barely empty CMakeLists.txt:

cmake_minimum_required(VERSION 3.17) project(empty)

Let’s take a look at what heaptrack gave us:

I don’t expand all entries but you can see that top two heap consumers(1 and 2) are related to std::vector and std::vector operations which in turn stem from cmFunctionBlocker::IsFunctionBlocked()(3 and 4). This is exactly what I expected. Now let’s look at these types and their role.

Representation

These types are used to represent parsed code. Here are their simplified definitions:

// command argument representation struct cmListFileArgument { std::string value; // argument itself Delimiter delim; // argument's type: quoted/unquoted/bracket long line; // argument's position }; // command representation struct cmListFileFunction { std::string nameLower; std::string nameOriginal; long line; std::vector arguments; }; // file representation struct cmListFile { std::vector functions; };

For example, representation for this code

Find_Package(Boost 1.41.0 COMPONENTS system filesystem iostreams)

would be

cmListFile { std::vector { cmListFileFunction { std::string{"find_package"}, std::string{"Find_Package"}, long{1}, std::vector { {std::string{"Boost"}, Unquoted, long{1}}, {std::string{"1.41.0"}, Unquoted, long{1}}, {std::string{"COMPONENTS"}, Unquoted, long{1}}, {std::string{"system"}, Unquoted, long{1}}, {std::string{"filesystem"}, Unquoted, long{1}}, {std::string{"iostreams"}, Unquoted, long{1}}, } } } };

When the file is parsed, cmListFile contains all its commands. After, they will be executed in a row.

Function blocker

Everything in CMake is a command including things like functions, conditionals and cycles. Hence, it needs a way to represent a scope, e.g. it can’t just execute a function body on its first occurence. Instead, it needs to collect function’s inner commands as its body and create a new function definition. To achieve that CMake has function blocker which has that IsFunctionBlocked() function:

class cmFunctionBlocker { public: bool IsFunctionBlocked(cmListFileFunction const& function) { //... function.push_back(function); // copy! return true; } //... private: std::vector functions; };

When starting command(e.g. function(), if()) is executed, it starts to collect scope body using IsFunctionBlocked() and when ending command(endfunction(), endif()) is met, it does something with its body. The important moment here: once the body is collected, it will never be modified.

The problem

As you can see it copies functions inside. Consider how this simple code would be represented:

function(f) # function() block if(${var}) # if() block message("hello world") endif() endfunction() f()

Here’s how it’s stored in memory:

cmListFile - stores commands at lines [1; 9] (ranges are closed)

function() blocker - stores commands at lines [2; 6]

if() blocker - stores commands at lines [4; 5]

That’s the problem, blockers copy same commands multiple times depending on the code structure. It’s quite expensive because each command contains two strings (for its name) and a vector of strings(for arguments). Moreover, it does it multiple times. Inner blocker are populated each time outer one is executed. In the above example, if-blocker will be recreated again on each f() call.

Solution

My first thought was to store raw pointers in blockers but it doesn’t work. When it reads dependent file(via include()) it really needs to copy cmListFileFunction because corresponding cmListFile is destroyed after parsing but we still need to use included functions. So the actual solution is to store each command in a std::shared_ptr to make copy cheap:

// old cmListFileFunction struct cmListFileFunctionImpl { std::string nameLower; std::string nameOriginal; long line; std::vector arguments; }; using cmListFileFunction = std::shared_ptr;

Another little problem

Second place where it does unnecessary copy of commands is in ExecuteCommand() function. Although it’s not critical, I still think obviously unneeded copy operations should be avoided.

using Command = std::function const&)>; std::map RegisteredCommands; Command GetCommandByExactName(std::string const& name); bool ExecuteCommand(const cmListFileFunction& function) { //... if (auto command = GetCommandByExactName(function.nameLower)) // copy! { // execute command(function.arguments); } //... }

As you can see, commands are stored in a std::function and during execution that object is copied. Why? Because commands could be redefined, thus currently executing function object might be reassigned and the following execution would be UB if it’s not stored anywhere. Copy of std::function usually involves copy of its control block where the actual Callable is stored. Most built-in commands in CMake are stored as a raw function pointers so their copy is relatively cheap. But user-defined functions (function()/endfunction()) contains a vector of commands in its blocker and its copy is not cheap. Again, solution is to use std::shared_ptr:

using CommandPtr = std::shared_ptr; std::map RegisteredCommands; CommandPtr GetCommandByExactName(std::string const& name);

No more excessive copies of vectors of strings. Both solutions could be even better with something like boost::local_shared_ptr which avoids synchronization overhead.

Optimized run

Let’s measure those changes.

Now top heap consumers(1 and 2) are related to std::string operations. Some of them also could be fixed but it requires more effort to get significant improvement. Total bytes allocation decreased from 65MB to 39MB. Number of allocations decreased from 394k to 280k.

For more complex configurations the economy is of course lower. Partly because there’s an old parsing routine that allocates a lot and becomes the major memory consumer(I’ve fixed them in the next post). Here are results for heaptrack itself and Google benchmark configuration step (total allocated bytes (number of allocations)).

Heaptrack:

before 305 MB (1308k)

after 268 MB (1148k)

Google benchmark:

before: 233 MB (1344k)

after: 196 MB (1190k)

Conclusion

I’m not a fan of deep optimizations in a project like CMake where resources are not scarce and actual execution is rare. However, in this case it’s not really an optimization, just avoidance of unneeded copy operations, just the C++ way of doing things.

Allowing CMake functions to return(value)

2020-08-09T10:55:09+00:00

Introduction

It’s a story of implementing CMake feature that I call command reference (similar to existing variable reference), i.e., using result of command invocation as an argument. Having this idea for a long time I never had enough time to dig into it. Now, being unemployed I decided at least to try it before looking for a next job. It was not as easy as I expected but I’m pretty satisfied with the result.

It consists of two parts:

First part contains motivation, design and results.

Second part explains some implementation details, such as why new lexer and parser is needed.

Motivation

Most part of my career I used Visual Studio and when switched to Linux I was slightly shocked. Compared to MSVS, makefiles felt like bows and arrows against machine gun. Then I discovered CMake and it felt much better, instead of cryptic makefiles we got a distinct language with commands and variables. And since it’s just another language, the same rules apply to its code: meaningful names, small functions, separation of abstractions, etc. Unfortunately, many CMake files look like one very big function that mixes everything in it. CMake allows us to handle almost all those things right except one - it doesn’t have return values, thus, limiting the usefulness of function abstraction. As a result, some parts of your CMakeLists look bad.

Let’s look at some examples.

if(${CMAKE_CURRENT_LIST_DIR} STREQUAL ${CMAKE_SOURCE_DIR}) # is top level list? if(WIN32) # surprisingly short name comparing to other CMake vars if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU") if((CMAKE_CXX_COMPILER_ID STREQUAL "Clang") OR (CMAKE_CXX_COMPILER_ID STREQUAL "AppleClang")) if(CMAKE_SIZEOF_VOID_P EQUAL 8) # my favorite one, x64 check

The problem with such code is that it doesn’t express logic, only implementation details. It requires you to remember all those long and tricky variable names, magic values and relation between them.

We can move that into a function but it doesn’t solve the problem. We are lazy, nobody wants to write two lines instead of one:

check_is_top_level_list(is_top_level_list) if(is_top_level_list)... # is it better than one-liner? if(${CMAKE_CURRENT_LIST_DIR} STREQUAL ${CMAKE_SOURCE_DIR})

Almost all feature tests become:

check_feature_available(RESULT is_feature_available) if(is_feature_available)

It’s better than direct values manipulation but it’s ugly. Now you need to check documentation for the name of this output argument, intuitive candidates are RESULT, RESULT_VAR, OUTPUT, or just absence of such argument at all: check_feature_available(is_feature_available)(or check_feature_available() with FATAL_ERROR). It also forces you to think about names for variables, many of whom are used only once. In any popular language we can write all of the above in a clear manner:

if(IsTopLevelProject()){} if(IsWindowsBuild()){} if(IsClang() || IsGcc()){} if(IsX64Build()){} if(IsFeatureAvailable()){}

Why should I know how those checks are performed?

Again, you can handle all that stuff, CMake has been successfully used for years. But why not make it simpler? CMake is a big part of C++ so why shouldn’t we make it easier for new people as we do with C++ itself?

Design

Why f(h()) isn’t possible

Initial idea was to allow something like get_name_by_id(get_id()) but it quickly turned out to be wrong because of two reasons:

CMake syntax is too simple, it doesn’t have keywords, command names are not restricted, anything is a string(including parens). For example expression if(x AND (y OR z)) means call if_impl("x", "AND", "(", "y", "OR", "z", ")"); where AND, OR and parens are just plain strings that are handled in a specific way by if_impl. The only requirement here is that parens should match, e.g. if(x AND (y)))) isn’t allowed. Because of that this is ambiguous:

function(AND a b c) endfunction() if(x AND (y OR z)) # if_impl(x, AND, (, y, OR, z, )) or # if_impl(x, AND_impl(y, OR, z)) ?

You can extend this case to named arguments which, unlike bool operations, can have non-trivial names.

But the main reason is that the above form isn’t flexible enough. How to use it within the quoted argument or to mix it with plain strings?

function(h) endfunction() f(a_h() "b h()_c") # not possible

Meet the command reference

Syntax mimics variable reference: ${command_name( args... )}(notice, there’s no spaces before command name and after final paren). It works just like you expect:

function(get_name) return("Alex") endfunction() message(${get_name()}) # prints "Alex"

More generally:

function(f) return(return-value-expr) endfunction() use_f(${f()}) # is equal to set(__ret_var_name return-value-expr) use_f(${__ret_var_name})

It can be used wherever variable reference can. Comments, nested calls and lists are also allowed, let’s mix it all together:

function(format_name first last) return("First: ${first}, last: ${last}") endfunction() function(get_first_name) return("John") # return quoted endfunction() function(get_last_name) return(Doe) # return unquoted endfunction() function(get_first_and_last) return([[John]] Doe) # return list endfunction() message( ${format_name( # pass separate args ${get_first_name()} # comments ${get_last_name()} #[[ inside command reference ]] )} ) # First: John, last: Doe message( ${format_name( # pass as a list, expands in two arguments ${get_first_and_last()} )} ) # First: John, last: Doe # return() becomes a function that returns its arguments message(${CMAKE_${return("VERSION")}}) # 3.18.1-...

Downloads

Current implementation based upon CMake 3.18.1 release. You can build it from sources or use pre-built binaries:

cmake-3.18.1-win64-x64.zip

cmake-3.18.1-Linux-x86_64.tar.gz

Warning

Some CMake features or policies, especially related to syntax or variable expansion, might not work. One such policy I’m aware of is OLD part of CMP0053. Syntax related error messages are also slightly different. All other things should work, I’ve successfully built Google Test, Google Benchmark and fmt, using it. I’m not a CMake developer, integration is quite dirty in some places so don’t expect it to be production ready right away.

Part 2. Implementation details

At the beginning I naively supposed that if CMake can already parse single command invocation, it would be enough just to call that function recursively on every argument :) But it turned out to be a way more complex and required completely new lexer and parser. I’ve created it using Flex&Bison, you can find separate project that does parsing and pseudo-evaluation here.

Existing CMake parser and why it’s not enough

Current implementation is relatively simple(but not its code). It consists of Flex-based scanner and hand-written parser. Scanner detects separated arguments and their kinds, it’s easy since we know how each argument starts and ends. Current parser mostly verifies basic syntax rules like valid separations, parens matching etc. For example command(a "${b}") is parsed as call("command").with_args(unquoted_arg{"a"}, quoted_arg{"${b}"}). Notice that variable reference ${b} is passed as a plain text. During command execution each argument is parsed again with another parser that can detect, verify and evaluate variable references. If such command appears in a cycle it does this additional parsing on every iteration. Also if you make a mistake inside reference, it won’t be detected until expression is evaluated:

if(${ALWAYS_TRUE_IN_YOUR_ENV}) # no errors or warnings on your machine message("hello world") else() # syntax error at run-time on another machine message(${@:-:@}) endif()

Now, when we allow another command appear inside argument, argument separation is not so easy:

command("result: ${get_result("a" b)}")

You can see that highlighter marks “a” in black because it thinks that arguments are "result: ${get_result(", a, " b)}". Existing CMake parser sees it in the same way. To separate arguments correctly we got to be able to parse recursively when we meet command reference.

In terms of BNF existing syntax looks like(simplified):

command_invocation ::= identifier '(' argument* ')'

with command reference we got:

command_invocation ::= identifier '(' (argument | command_invocation)* ')'

with only difference that command reference might appear inside argument, not only as a separate one.

As you can see, now we need to parse it much deeper than existing parser does, there’s no sense in trying to extend it, also writing parser for recursive rules by hand is not trivial so I have no choice but to write both scanner and parser from scratch. Flex and Bison were chosen because they’re already used in CMake.

BNF for a new syntax

Let’s slightly update official BNF accordingly to new syntax:

command_invocation ::= identifier space* '(' arguments ')' quoted_argument ::= '"' (quoted_element | reference)* '"' unquoted_argument ::= (unquoted_element | reference)+ reference ::= var_reference | command_reference var_reference ::= var_ref_open (variable_name | reference)* ref_close command_reference ::= cmd_ref_open command_invocation ref_close var_ref_open ::= "${" | "$ENV{" | "$CACHE{" cmd_ref_open ::= "${" ref_close ::= "}" quoted_element ::= unquoted_element ::= variable_name ::=

Unlike existing implementation, I want to avoid parsing during execution and get all details in one pass. Now, each quoted/unquoted argument consists of string(quoted/unquoted_element+) and reference. To get its real value at run-time we need to evaluate and concatenate all its parts. For example, a_${b}_c has 3 elements: string("a_"), var_ref("b"), string("_c"). At run-time we get the value of b and concatenate them together: a_B_VALUE_c.

Expression representation and evaluation

Here’s brief overview of key expressions:

call expression is a list of arguments.

quoted/unquoted argument expression is a list of strings and references.

variable reference expression is a list of strings and references

command reference expression is similar to call expression.

Now we need a good representation that can store and evaluate such expressions efficiently.

AST

First approach was to use classic Interpreter pattern and compose expressions into a tree. Since each expression is list-like we can represent them all as a std::vector>. It works but even simple command becomes quite involved, command(a b) is represented roughly with

vector{ // vector of arguments "command", // command name vector{ // each argument is a vector itself "a" }, vector{ "b" } }

Things got worse when we add reference, command(a ${b}_c):

vector{ "command", vector{ "a" }, vector{ vector{ // reference is also a vector "b" }, "_c" } }

Too many vectors %)

RPN

Reverse Polish(or postfix) Notation is a notation when arguments comes before operator. It shines when you need to represent “linear” expression without branches, also it doesn’t need parens to express precedence:

Normal(infix) notation: a + b RPN: a b + Normal: (a + b) * c RPN: a b + c *

Now command(a ${b}_c) is represented with:

vector{ StringExpr{"command"}, // command name StringExpr{"a"}, UnquotedArg{1}, // 1 means number of subexpressions to concat StringExpr{"b"}, VarRefExpr{1}, UnquotedArg{1}, // same for VarRefExpr CallExpr{3} // 2 means number of arguments including name }

One vector instead of four with AST approach, regardless how complex expression is, win :)

It also fits nicely a bottom-up parser like Bison because of the order in which symbols are discovered. In example above, Bison will discover symbols exactly in their order in that vector, you can just push expressions without any knowledge about previous symbols or other context.

Evaluation

RPN is evaluated using stack. Each expression knows its arity (number of arguments), it pops them from stack and pushes back the result. But there’s a little problem here. CMake expands list strings into multiple arguments:

set(my_list a;b;c) command(${my_list}) # called with 3 args: a, b, c

It means that if our CallExpr has arity = 1, at run-time it might become any number including zero. Classical RPN evaluation doesn’t work here. To overcome this we need to adjust definition of arity: now arity means number of expressions whose results should be taken as arguments. And we need additional stack to track this results count. Consider RPN representation of the above example:

{ StringExpr{"command"}, StringExpr{"my_list"}, VarRefExpr{1}, UnquotedArgExpr{1}, CallExpr{2} }

Take a look at both stacks before CallExpr evaluation for two cases:

my_list expands into 3 arguments
results: {"command", "a", "b", "c"} results_count: {1, 3}

CallExpr arity is 2, thus actual arity is the sum of last two elements in results_count stack and that will be the final number of its arguments 1 +3 = 4.

my_list expands into 0 arguments
results: {"commands"} results_count: {1, 0}

Here, actual arity is 1 + 0 = 1.

Another small benefits of this implementation

Easy to change

Writing syntax rules in Bison makes it much easier to change, understand, review and support, then hand-written parser.

Symbol locations

Bison makes symbol locations tracking almost automatic. With simple action you only need to track lines manually.

Error messages

Bison’s out-of-the-box error messages are pretty good:

f(${@}) # 1.5 : syntax error, unexpected invalid token, expecting command name or # reference opening or reference closing or variable name

BOM and line breaks handling

CMake supports BOM header but only UTF-8 is allowed. Instead of reading it by hand we can handle it easily with another rule in parser.

CMake converts all \r\n into \n during file reading by replacing Flex’s input routine. Honestly, I can’t fully understand that code. Supposedly it just replaces \r\n with \n and memcpy() the rest, I want something better. In many places we can just use \r?\n regexp endings in scanner rules. In theory it’s possible that string literal might contain \r\r\n which should become \r\n(I’m talking about raw bytes 0x0D 0x0A, not escapes). To handle this I remove trailing \r (if any) when \n is met in string literal on the fly in scanner. Since rules are written to take input line-by-line it doesn’t involve much overhead. These simple solutions allow to eliminate custom reading routines and tons of memcpy() calls.

Aftenotes

It’s not an official CMake feature of course. If you like it, let me or CMake devs know to increase chances of having it in future CMake versions.

Looking for a good C++ highlighter for your Jekyll blog? Try Prism.js

2020-08-09T10:48:07+00:00

What a nice title for the first blog post :) Almost as good as “How to start your blog”. Anyway, I faced this problem and want to share the solution with others.

Starting a blog with GitHub Pages and Jekyll is easy, however I found that built-in highlighter(Rouge) has very poor C++ support. You can see how it looks in the nice post by Lewis Baker C++ Coroutines: Understanding the promise type. At most it makes some parts bold and others dark blue(which looks almost black for me). Thus, I looked for another option. There were two candidates: highlight.js and Prism.js. Both do pretty decent job but latter looks slightly better for me.

Here’s how definition of my_promise_type looks now:

struct my_promise_type { void* operator new(std::size_t size) { void* ptr = my_custom_allocate(size); if (!ptr) throw std::bad_alloc{}; return ptr; } void operator delete(void* ptr, std::size_t size) { my_custom_free(ptr, size); } ... };

Installation

Disable built-in highlighter in your _config.yml:
kramdown: syntax_highlighter_opts: disable : true

Take links to prism.js scripts and themes from cdnjs. You need prism.min.css, prism-line-numbers.min.css, prism-core.min.js, prism-autoloader.min.js(it will download highlighters for languages used on your page on-the-fly), prism-line-numbers.min.js. You can omit prism-line-numbers.* if line numbers aren’t required.

Add each theme to section of your blog like . Add each script to the bottom of like . You can read how to edit and where to find those files here.

That’s it. If you like me are little crazy about don’t pay for what you don’t use principle, you can set it up only for posts. For this you need to make above steps for post layout. Check my blog repository for an example.

The final tip

The C++ language name in prism.js is cpp, not c++; thus, you should use ```cpp code blocks, not ```c++.

Oleksandr Koval’s blog

Multi-version Doxygen documentation with GitHub Pages

Table of contents

Introduction

Problems with mono-version documentation

Welcome multi-version documentation

Prerequisites

Overall design

Version switch mechanics

Adjusting folder names

Main page

Version selector

Updating redirect page

Workflows

Generating git-main docs

Generating release docs

Generating PR docs

Removing PR docs

Upgrading from a mono to multi-version documentation

Wrap-up

From range projections to projected ranges

Table of contents

Introduction

What a projection is

Problems with existing design

Projections uglify function signatures

Projections are not easily composable

Projections complicate caller’s code

Root cause of all the problems

Projected ranges to the rescue

Implementation story

C++20 iterators overview

Need for a better design

The next iteration of iterators

iter_copy_root()

iter_move_root()

iter_assign_from()

iter_swap()

views::projection

views::narrow_projection

Impact on algorithms

Iterator-based versions of algorithms

Reducing number of dereferences

root() method

Major flaw

Other use-cases

The role of std::views::transform

Demo

Wrap-up

All C++20 core language features with examples

Introduction

Table of contents

Concepts

Requires expression

Concept

Requires clause

Constrained auto

Partial ordering by constraints

Conditionally trivial special member functions

Modules

Module units

Export

Import

Header units

Global module fragment

Private module fragment

No more implicit inline

Coroutines

Three-way comparison

Comparison categories

Defaulted comparisons

Lambda expressions

Allow lambda-capture [=, this]

Template parameter list for generic lambdas

Lambdas in unevaluated contexts

Default constructible and assignable stateless lambdas

Pack expansion in lambda init-capture

Constant expressions

Immediate functions(consteval)

constexpr virtual function

Generating `git-main` docs

Constrained `auto`

No more implicit `inline`

Allow lambda-capture `[=, this]`

Immediate functions(`consteval`)

`constexpr` virtual function

`constexpr` try-catch blocks

`constexpr` `dynamic_cast` and polymorphic `typeid`

Changing the active member of a `union` inside `constexpr`

`constexpr` allocations

Trivial default initialization in `constexpr` functions

Unevaluated `asm`-declaration in `constexpr` functions

`[[likely]]` and `[[unlikely]]`

`[[no_unique_address]]`

`[[nodiscard]]` with message

`[[nodiscard]]` for constructors

`char8_t`

More optional `typename`

Nested `inline` namespaces

`using enum`

`constinit`

`__VA_OPT__` for variadic macros

Destroying `operator delete`

Conditionally `explicit` constructors

Conversion from `T*` to `bool` is narrowing

Deprecate some uses of `volatile`

`const&`-qualified pointers to members

`const` mismatch with defaulted copy constructor

Specify when `constexpr` function definitions are needed for constant evaluation