Saturday, July 23, 2011

Could Clang displace GCC among PostgreSQL developers? Part I: Intro and compile times

Back in February, I attended FOSDEM, the free and open source software developers' European meeting. Most of my time was spent manning the PostgreSQL stand and networking with fellow members of the community; I often find the “hall track” of conferences is of most interest.

On this occasion though, there was one talk in particular that I really wanted to see: Chris Lattner's “LLVM and Clang: Advancing Compiler Technology”. I was certainly not alone in having the subject pique my interest, as it filled what was apparently Belgium's largest auditorium to capacity, and I was joined by a number of other PostgreSQL people, including some senior community members. Chris' slides are available from here:

http://www.scribd.com/doc/48921683/LLVM-Clang-Advancing-Compiler-Technology



For those of you who haven't heard of Clang, it is a compiler front-end for C, C++ and Objective-C that uses the Low Level Virtual Machine (LLVM) as its back-end. It aims for compatibility with GCC wherever reasonable, so that it can be used as a drop-in replacement in many cases. In the last couple of years of development, Clang has reached various milestones, including:
  • Building a working Linux Kernel and FreeBSD Kernel
  • Becoming self-hosting (i.e. capable of building itself)
  • Building the Boost C++ libraries, which could be considered a litmus test for C++ standards conformance.
Better Clang support is coming as of PostgreSQL 9.1, as noted in the release notes, and I think it shows a lot of promise as an alternative to GCC. Clang may ultimately displace GCC for GCC’s most important use-cases, but in the meantime it offers some immediate benefits to at least some kinds of compiler users, such as developers. Why might you want to replace GCC as your day-to-day compiler when hacking on Postgres, you ask? To my mind, the main advantage is that Clang’s diagnostic messages are just great - in fact, the entire user experience is better than GCC's. While arcane error messages are of much greater concern to C++ programmers, and while this is where the difference is most dramatic, there are still plenty of cases where Clang's diagnostic's shine that C programmers will appreciate.

If the subject of the talk particularly interested me as a PostgreSQL contributor, the talk itself really hooked me, particularly the fact that PostgreSQL was unexpectedly specifically cited as a medium sized C program that had a build-time significantly lower than GCC's with the same set of flags.

I tried to recreate Chris’s impressive compile times, but was unsuccessful. I’m not sure why that may be, but welcome Chris to comment below. In any case, I’m not hugely concerned about compile times; that’s not the real attraction to me, as long as Clang at least holds its own, as it now does thanks to the Clang community's response to a problem that I reported.

I discovered an issue that caused the compilation of a number of PostgreSQL grammar translation units (such as gram.c) to be significantly slower than GCC - after submitting a testcase, the Clang developers produced a fix that brought the compile time for gram.c down from over 80 seconds to a little over one second on my system - approximately the same time as with GCC. With this fix, the total compile time on my system was brought down to a number approaching GCC’s time. On my laptop, Clang tip’s time was now 2 minutes and 13 seconds, to GCC 4.6’s 2 minutes and 10 seconds. I asked Dave Page to have a go on his MacBook at the recent Char(11) conference, and he reported better times there; not surprising, considering that Clang principally targets that platform due to Apple’s stewardship on the LLVM/Clang project. Apparently, there is a known issue that causes linking to occur a little slower on Linux. On Dave’s rather powerful Macbook, Clang builds Postgres in marginally less time than Apple’s fork of GCC 4.2 - the difference of about a second that he saw could be explained entirely by noise though. Dave timed two runs of both, the first of which lasted 1m16.410s vs 1m17.834s in Clang's favour and the second of which was about the same.

I took it upon myself to complain loudly on the Clang developer’s mailing list about what I described as an over-zealous warning that occurred when building certain parts of Postgres. When it was statically detected that code appeared to assign past the end of an one-element array at the end of a struct, due to an int rvalue of greater than 1 being used to index the array, when in-fact a popular idiom was being used, a warning was seen. I successfully argued that the warning was invalid because this is such a common pattern, and it will be no longer seen once that fix is committed, provided the indexing occurs under circumstances that are exactly consistent with the use of the idiom. A patch is in the works. Progress! I suppose that it makes sense for a compiler project like Clang to initially veer on the side of being overly-conservative about warnings, and then roll back some of them in response to feedback from ordinary end-users.

As I’ve said, diagnostics (compiler errors and warnings) is where Clang really shines. One feature that I think is particularly relevant to Postgres is “typedef preservation and selective unwrapping”. Consider this:

typedef struct RelationData *Relation;


This is a common idiom within the Postgres source code. This is a declaration of a typedef for a RelationData pointer, named “Relation”. In fact, this particular typedef will invariably be used by relcache client code throughout the PostgreSQL source – the RelationData struct itself will not be used.

In my personal opinion, these typedefs are not particularly helpful, and may even be harmful - they are leaky abstractions, in that they don’t usefully abstract away the fact that the underlying type is a pointer to a RelationData. Some might say that they save the programmer keystrokes, but I just think that they obfuscate things. This issue was recently raised on a thread on pgsql-hackers, where concern was expressed about how this idiom interacted with constness, potentially giving a false sense of security. This declaration:

const Relation foo;


is actually a declaration of "struct RelationData * const foo" - a const pointer to a RelationData, as opposed to a pointer to a const RelationData. Only the pointer itself is const, which is perhaps counter-intuitive.

Let’s take a look at compiler diagnostics given by both GCC and Clang when we incorrectly treat Relation as a struct that is passed by value, which in many places it appears to be at first blush. We’re assigning to a member of that struct:

With GCC:
relcache.c: In function ‘RelationDestroyRelation’:
relcache.c:1756:10: error: request for member ‘BackendId’ in something not a structure or union


With Clang:
relcache.c:1756:10: error: member reference type 'Relation' (aka 'struct RelationData *') is a pointer; maybe you meant to use 
'->'?
      relation.BackendId = 5;
      ~~~~~~~~^

Interestingly, Clang has heuristics for when it should and shouldn’t expand typedefs like this, and they correspond fairly well to whether they will be useful or unhelpful to programmers in practice. There’s a real focus on the programmer’s user-experience, and I think that’s just great.

Clang usefully produces warnings and errors across different phases of translation. It is particularly adept at giving sound diagnostics for deeply nested macros. Its automatic Macro Expansion provides for warnings and errors that intelligently expand potentially heavily nested macros, and indicate the exact column that is of interest. Here's a contrived example:


relcache.c:1759:2: warning: incompatible pointer to integer conversion passing 'void *' to parameter of type 'Size' (aka 'unsigned long')
        palloc((void*) 0);
        ^~~~~~~~~~~~~~~~~
../../../../src/include/utils/palloc.h:52:61: note: expanded from:
#define palloc(sz)      MemoryContextAlloc(CurrentMemoryContext, (sz))


In general, there are lots of small things that add up pretty quickly.

In my next blog post, PostgreSQL performance expert and 2ndQuadrant colleague Greg Smith and I take a look at the performance of Clang-built binaries, with some interesting results.

8 comments:

  1. Very good one Peter. Worth waiting for it since yesterday ;)

    ReplyDelete
  2. Thanks Greg. Unfortunately, by the time moderators approved my blog, a couple of days had passed, and so I'm going to slide off postgresql.org earlier than I should. Oh well.

    ReplyDelete
  3. Peter how about tests on Windows environment?

    ReplyDelete
  4. @pgolub,

    If you're talking about benchmarking performance of the compiler or the binaries it generates on Windows, that isn't a priority for us right now, since the vast majority of large Postgres installations are based on Linux or something else that's Unix-like. Also, there are plenty of good reasons to prefer MSVC to Clang or GCC on Windows at the moment, so that isn't really an interesting case to my mind.

    I believe that a big change is a real possibility on Unix-like platforms.

    ReplyDelete
  5. Interesting. I see Clang mentioned on the FreeBSD lists as they work on improving it, but this is the first time I've seen it mentioned from an app developers point of view. Good to see that you're testing not only how quick it compiles, but how the resulting code performs too.

    ReplyDelete
  6. @hmallett

    Yes, Clang is now an integral part of FreeBSD (i.e. you don't have to get it from ports). I think that the licensing issues surrounding GCC have something to do with their apparent intent to move towards Clang.

    ReplyDelete
  7. Very much looking forward to your performance test results. Thanks for the hard work Peter.

    ReplyDelete
  8. Peter, to be honest the method with the [] to find out size or content of a structure is quite a hack. So no wonder clang developers were having issues with it.

    Another thing that clang shows very well, is that c++ isn't slow. That myth is long gone. Someone should maybe rewrite postgresql in c++, to increase clarity and sustainability :)

    ReplyDelete