A Closer Look at LLVM

The Low Level Virtual Machine (LLVM) is a set of open-source compiler technologies, combined with related toolchain libraries, that is surpassing GCC (GNU Compiler Collection) in popularity.

LLVM started its life as a research project at the University of Illinois, used primarily to explore the use of compilers in both dynamic and static programming languages. It is written in C++, and languages with compilers that rely on LLVM include ActionScript, Ada, C#, Common Lisp, Crystal, D, Delphi, Fortran, OpenGL Shading Language, Halide, Haskell, Java bytecode, Julia, Lua, Objective-C, Pony, Python, R, Ruby, Rust, Scala and Swift.

LLVM is useful when it comes to developing compiler front- and back-ends; for example, it has become an integral development tool for the Sony PlayStation 4 (PDF). In addition to Sony, LLVM has found its way into commercial products from Apple, Adobe, Google, Intel, and many others (every commercial implementation of OpenCL from AMD, Apple, Intel and nVidia is based on LLVM). XCode, the Apple IDE for Mac and iOS development, replaced GCC with LLVM.

On the academic front, researchers use LLVM for projects such as building link-time inter-procedural optimizers, just-in-time compilers, secure browser extensions, language virtual machines, static analysis tools, automatic vectorization, GPU programming, software verification, hardware synthesis tools, embedded code generators, and numerous language implementations.

LLVM differs from traditional compilers, which take input source code and then compile it to object code suitable for linking into an executable. LLVM is primarily a compiler backend; its input is an intermediate language (IR) based on a RISC instruction set with compiled output to optimized IR. LLVM IR is strongly typed, and comes with a good deal of documentation.

What’s Intermediate Language like?

On my Ubuntu box I installed LLVM and Clang (we’ll come to Clang shortly). The following command installs both on Ubuntu with a 247 MB download:

sudo apt install llvm clang

Here’s a simple C function to sum up 1..n, where n is the value passed in:

int sum(int n) {

int i=0;

int total=0;

for (i=1;i<=n;i++) {

total += i;

}

return total;

}

After compiling with this line, it emitted the ll code, which is very similar to assembly language:

clang -Os -S -emit-llvm test.c -o test.ll

Here’s the output:

define i32 @sum(i32 %n) #0 {

%1 = icmp slt i32 %n, 1

br i1 %1, label %._crit_edge, label %.lr.ph.preheader

.lr.ph.preheader:                                 ; preds = %0

%2 = shl i32 %n, 1

%3 = add i32 %n, -1

%4 = zext i32 %3 to i33

%5 = add i32 %n, -2

%6 = zext i32 %5 to i33

%7 = mul i33 %4, %6

%8 = lshr i33 %7, 1

%9 = trunc i33 %8 to i32

%10 = add i32 %2, %9

%11 = add i32 %10, -1

br label %._crit_edge

._crit_edge:                                     ; preds = %.lr.ph.preheader, %0

%total.0.lcssa = phi i32 [ 0, %0 ], [ %11, %.lr.ph.preheader ]

ret i32 %total.0.lcssa

}

}

We’ve used Clang to compile the C into the intermediate language; this is the input to LLVM. Clang is an open-source compiler front end for the programming languages C, C++, Objective-C, Objective-C++, OpenMP, OpenCL, and CUDA (written in C++, Clang can compile up to C++ 14 with some C++ 17, as well as C, Objective-C, and so on).

Backend

Given how LLVM is meant to compile ll code to machine code, let’s see what it produces with this command using the llc compiler. The 64 bit code can be changed to 32 bit if you alter the -march=x86-64 to -march=x86:

llc -O3 test.ll -march=x86-64 -o test.s

The 25 line file is the 64 bit assembler it outputs.

.text

.file         "test.ll"

.globl     sum

.type      sum,@function

sum:                                  # @sum

.cfi_startproc

# BB#0:

xorl         %eax, %eax

testl       %edi, %edi

jle            .LBB0_2

# BB#1:                                 # %.lr.ph.preheader

leal         -1(%rdi), %eax

leal         -2(%rdi), %ecx

imulq     %rax, %rcx

shrq        %rcx

leal         -1(%rcx,%rdi), %eax

.LBB0_2:                              # %._crit_edge

retq

.Lfunc_end0:

.size        sum, .Lfunc_end0-sum

.cfi_endproc

 

.ident    "clang version 3.8.0-2ubuntu3 (tags/RELEASE_380/final)"

.section                 ".note.GNU-stack","",@progbits

Using lli

In addition to compiling intermediate code, you can try LLVM out using the lli tool. To do this, you’ll need to add a main() function, as well as the usual includes, to turn the function into a program.

#include <stdio.h>

#include <stdlib.h>


int sum(int n) {

int i=0;

int total=0;

for (i=1;i<=n;i++) {

total += i;

}

return total;

}


int main(int arg,char ** argv) {

int total = sum(10);

printf("%d\n\r",total);

return 0;

}

Compile with Clang as before; then you can run it with lli:

lli test.ll

The output is as you’d expect: 55.

Conclusion

LLVM, along with Clang and associated utilities, makes for a powerful system. The use of the intermediate representation code gives it considerable flexibility, and its modularity makes it easier to use than GCC.

Post a Comment

Your email address will not be published.