Introduction

Recently, I’ve discovered an interesting topic: clang-tidy-based tools. The idea is that you get an AST representing all the details of your C++ code, what you can do with it is limited mostly by your imagination: detect bugs, calculate some code metrics, refactor, etc. You can take your old legacy codebase and convert it into a modern one. This idea of making changes at scale really fascinates me. To learn something you have to use it in a real-world. As I’ve recently made several contributions to CMake codebase, the idea quickly popped up in my mind for a refactoring tool. CMake has one strange part in its coding conventions which I’ve never seen in any other C++ codebase: explicit this usage. That is, like in JS or Python:

class Widget{
    void Increment(){
        this->x++;
    }
    int x;
};

Because there’s no way to check it automatically, there are places where it’s not respected. I don’t know how this style was adopted, maybe it’s just some old artifact that is hard to get rid of by hand. So, I decided to write a tool that can do the both:

  • add explicit this if it’s missed
  • remove explicit this wherever possible

Workflow outline

Writing this clang-tidy tool involves several steps:

  • understand what C++ code you want to fix
  • find out how to detect it using Clang AST API
  • fix it

I will describe explicit/implicit parts separately after describing common things. I used clang-tidy-standalone as a base for this tool, you can build it without build LLVM itself, more information in my previous article.
Notice that this is not a complete Clast AST tutorial, you can find more information in the official documentation , various youtube talks, and clang-tidy sources.

Templates handling

Since template is only an outline for generated code, they are represented differently from non-templated code. In non-templated code all the types, variables and functions are known and checked. In template it’s not always possible before the actual instantiation. As a result, they are represented with different AST nodes. You have a choice: deal with template definition which contains unknown or type-dependent entities, or deal with instantiations where everything is known. You should know whether your check can produce different results for different template instantiations. Thankfully, that’s not the case for this tool, so I will deal only with instantiations. This in turn requires that all of your templates are used somewhere in a project or in a test suite so they are actually instantiated.

Macros handling

Since macros are just text replacements, they can have very different meaning in different contexts. Final AST represents code after preprocessing, so your tool can detect things that were composed from macro expansions. In most cases you can just skip such code, macro usage should be rare nowadays. But there’s at least one macro which I want to handle - assert(). It naturally contains things that I want to fix, for example:

void Widget::Reset(){
    assert(this->ptr);
    *ptr = 0;
}

For this reason, I’ve added simple regex-based macro filter:

  if (ThisLocation.isMacroID()) {
    const auto MacroName =
        Lexer::getImmediateMacroName(ThisLocation, SM, getLangOpts());
    if (!llvm::Regex(AllowedMacroRegexp).match(MacroName)) {
      return false; // skip
    }
  }
  // continue...

Another example of widely used macro is various loggers, my final macro-filter for CMake codebase looks like this: ^(assert|cm.*Log|cm.*Logger)$. Keep in mind that we only can handle things that are present after preprocessing, eliminated #ifdef blocks wouldn’t be there, so run your tool on various configurations.

Enforce explicit this

Target C++ code

Let’s start with the easier case, enforcing explicit this. Here’s our test case:

class Widget{
    void Do(){
        DoConst();      // should become this->DoConst();
        x++;            // should become this->x++;
    }
    void DoConst() const{}
    int x{};
};

That is, every access to member should become explicit. Since we’re dealing with already valid code and adding explicit this wouldn’t change its meaning, there’s nothing more to consider, we just need to find such places, check whether they have explicit this or not, and add it if missed.

Detecting it with Clang AST API

clang-query is a useful tool to examine generated AST, I left only important parts:

clang-check-10 --ast-dump example.cpp --

  |-CXXMethodDecl 0x1ef3a88 <line:2:5, line:5:5> line:2:10 Do 'void ()'
  | `-CompoundStmt 0x1ef3e18 <col:14, line:5:5>
  |   |-CXXMemberCallExpr 0x1ef3d88 <line:3:9, col:17> 'void'
  |   | `-MemberExpr 0x1ef3d58 <col:9> '<bound member function type>' ->DoConst 0x1ef3ba8
  |   |   `-ImplicitCastExpr 0x1ef3da8 <col:9> 'const Widget *' <NoOp>
  |   |     `-CXXThisExpr 0x1ef3d48 <col:9> 'Widget *' implicit this
  |   `-UnaryOperator 0x1ef3e00 <line:4:9, col:10> 'int' postfix '++'
  |     `-MemberExpr 0x1ef3dd0 <col:9> 'int' lvalue ->x 0x1ef3c60
  |       `-CXXThisExpr 0x1ef3dc0 <col:9> 'Widget *' implicit this

You can see that our target parts are represented as MemberExpr and CXXThisExpr with optional ImplicitCastExpr. Cast is there because we’re calling const function from non-const one, hence, casting Widget* to const Widget*. AST matcher for it is straightforward:

memberExpr(has(
    ignoringImpCasts(
        cxxThisExpr().bind("thisExpr"))))
.bind("memberExpr")

bind() is needed to get access to the matched node, in our case we need MemberExpr and CXXThisExpr, thus, we bind them to names. In the CXXThisExpr documentation we can see isImplicit() method that does exactly what we need:

void EnforceThisStyleCheck::check(const MatchFinder::MatchResult &Result) {
  const auto ThisExpr = Result.Nodes.getNodeAs<CXXThisExpr>("thisExpr");
  const auto MembExpr = Result.Nodes.getNodeAs<MemberExpr>("memberExpr");
  // ...
  if (ThisExpr->isImplicit()) {
    addExplicitThis(*MembExpr);
  }
}

Fix

Fixing is really simple, clang-tidy has a lot of examples of it, we have to provide hint, location, and text for our fix:

void EnforceThisStyleCheck::addExplicitThis(const MemberExpr &MembExpr) {
  const auto ThisLocation = MembExpr.getBeginLoc();
  diag(ThisLocation, "insert 'this->'")
      << FixItHint::CreateInsertion(ThisLocation, "this->");
}

We use MemberExpr’s location instead of CXXThisExpr’s because in case of qualified names(Base::Method();) CXXThisExpr::getBeginLoc() points to the start of Method, not the start of a namespace.

Enforce implicit this

Target C++ code

This case is a bit harder because in some cases removing explicit this could change the meaning of code due to name lookup rules, in other cases it could result in a compilation error.

Special members

We can’t remove explicit this from a special member functions like destructors or operators:

void Widget::Do(){
    this->~Widget();
    this->operator=(Widget{});
}

This can happen only when member expression refers to a method, not to a variable. Thus, we need to get member declaration, check whether it’s a method, and then check it’s name:

static bool isNonSpecialMember(const MemberExpr &MembExpr) {
  const auto MemberDecl = MembExpr.getMemberDecl();
  assert(MemberDecl);

  const auto MethodDecl = dyn_cast<CXXMethodDecl>(MemberDecl);
  // CXXMethodDecl::getIdentifier() returns nullptr for special members
  return !MethodDecl || MethodDecl->getIdentifier();
}

Name conflicts

Consider this case:

void Widget::Do(int x){
    this->x++;  // increment member
    x++;        // increment argument
}

If we remove explicit this from the expression at line 2, it will increment argument instead of data member. Generally, any visible local name hides class member name during the lookup. Unfortunately, Clang doesn’t have the API to detect such conflicts, so I choose less precise but easier to implement way(thanks to Nicolás Alvarez for this idea):

static bool hasVariableWithName(const CXXMethodDecl &Function,
                                ASTContext &Context, const StringRef Name) {
  const auto Matches =
      match(decl(hasDescendant(varDecl(hasName(Name)))), Function, Context);

  return !Matches.empty();
}

This method enumerates all declared variables(including arguments) in the function, ignoring their visibility. It means that this code will be untouched even if it’s safe:

void Widget::Do(){
    this->x++;  // increment member
    x++;        // still increment member but confusing
    int x;
    x++;        // increment local variable
}

Dependent names

template<typename Base>
class Derived : public Base{
    void Do(){
        this->baseCounter++;    // baseCounter is defined somewhere in Base
    }
};

C++ requires dependent member names to be prepended with explicit this, thus, removing it here will yield a compile-time error. In our case it means is that if name is provided by the base class, explicit this is required. So, removing explicit this from a name is safe when this name is a direct(non-inherited) member of a class:

static bool hasDirectMember(const CXXRecordDecl &Class, ASTContext &Context,
                            const StringRef Name) {
  const auto Matches =
      match(cxxRecordDecl(has(namedDecl(hasName(Name)))), Class, Context);

  return !Matches.empty();
}

Now, we can create our final isRedundantExplicitThis() function:

static bool isRedundantExplicitThis(const MemberExpr &MembExpr,
                                    const CXXMethodDecl &MethodDecl,
                                    ASTContext &Context) {
  return (isNonSpecialMember(MembExpr) &&
          !hasVariableWithName(MethodDecl, Context,
                               MembExpr.getMemberDecl()->getName()) &&
          !isDependentName(MethodDecl, MembExpr, Context));
}

And, because we need access to the corresponding CXXMethodDecl, our final matcher for both cases becomes:

cxxMethodDecl(
    isDefinition(), isUserProvided(),
    forEachDescendant(
        memberExpr(has(ignoringImpCasts(cxxThisExpr().bind("thisExpr"))))
            .bind("memberExpr")))
    .bind("methodDecl")

isUserProvided() is self-explainable, we’re interested only in user-provided functions, not in compiler-generated ones.

Fix

Again, fixing is mostly simple. We have to provide hint and range for removal. Qualified names require special handling because we don’t want to remove namespace part.

void EnforceThisStyleCheck::removeExplicitThis(const SourceManager &SM,
                                               const MemberExpr &MembExpr) {
  const auto ThisStart = MembExpr.getBeginLoc();
  auto ThisEnd = MembExpr.getMemberLoc();
  if (MembExpr.hasQualifier()) {
    ThisEnd = MembExpr.getQualifierLoc().getBeginLoc();
  }

  const auto ThisRange = Lexer::makeFileCharRange(
      CharSourceRange::getCharRange(ThisStart, ThisEnd), SM, getLangOpts());

  diag(ThisStart, "remove 'this->'") << FixItHint::CreateRemoval(ThisRange);
}

Results

Applying to CMake codebase:

  • enforce explicit this: 129 files changed, 4689 insertions
  • enforce implicit this: 406 files changed, 23237 insertions

Full source code is here.
CMake branch with explicit this is here.
CMake branch with implicit this is here.

I’m pretty satisfied with the result. The whole tool takes <170 lines of code. Hope that in future there will be more good tutorials to make this framework more available to more people.