Computational models, programming languages and runtime libraries

Theory, Definitions and Algorithms

These are some of the entry points to the underlying fundamentals in compiler theory, in random order.

Some useful sites and articles with state of the art information.

Programming languages around interesting concepts


Description Language Implementation License
GCC, GCC Interactive Compiler Ada, C, C++, Java, Go C GPL
LLVM , Clang: Clang Language Extensions CXX status C++ C++ BSD
Open64, AMD Open64 page C, C++, Fortran C++ GPL
Path64, Path64 open source announcement C, C++, Fortran C/C++ GPL (v2, v3 depending on files)

nVidia also released its CUDA LLVM-based CUDA compiler as open source in 2011.

The autoparallelizing compiler for shared-memory computers here

Pointers inside compiler source bases

LLVM Hello

LLVM is able to emit machine independent code which is a useful option if you want to start playing with object translation tools outside the LLVM source tree.

# The argument to emit machine independent code is -emit-llvm
$ llvm-gcc -emit-llvm -o hello.o hello.c

The bit code libraries to link are libLLVMBitReader.a and libLLVMBitWriter.a while the headers can be found in include/llvm/Bitcode. A useful tool is the disassembler llvm-dis.

LLVM adding a target

First thing to figure out what is going while building llvm is to enable verbose mode:

cd llvm-3.0-obj/obj && make VERBOSE=1

Each target machine requires a subdirectory in lib/Target that will build a static library linked into different executables. Once you have created your own target subdirectory using a sibling as a template, you will also need to patch the following files to get the new target recognized by the system.


The way dependencies and link command lines are generated is quite complex. It all revolves around llvm-config (llvm-3.0-obj/Release/bin/llvm-config, llvm-3.0.src/Makefile.rules:LLVMConfigLibs). The idea is to generate a deep dependency tree (required to run ld command-line) from direct prerequisites. The problem is that exported symbols are used into the mix of computing dependencies such that your source files do not cross reference symbols in each other, one library might be dropped from the ld command line and you end-up with an error like:

llvm[3]: Linking Release executable llc (without symbols)
Undefined symbols for architecture x86_64:
  "_LLVMInitializeNameTargetMC", referenced from:
      _main in llc.o
ld: symbol(s) not found for architecture x86_64
LLVM Understanding the Intermediate Representation

The best way to start understanding the IR is to look for the code dumping it to stdout or maybe a file. Such entry point can be found in lib/VMCore/AsmWriter.cpp.

void Module::print(raw_ostream &ROS, AssemblyAnnotationWriter *AAW) const { ...
void AssemblyWriter::printModule(const Module *M) { ...
void AssemblyWriter::printFunction(const Function *F) { ...
void AssemblyWriter::printBasicBlock(const BasicBlock *BB) { ...
void AssemblyWriter::printInstruction(const Instruction &I) { ... 
void AssemblyWriter::writeOperand(const Value *Operand, bool PrintType) { ...

LLVM IR is a classic module / function / basic blok / instruction decomposition implemented as structures and pointers. The definitions are in:


I tried to compile the Open64 source base on OSX 10.6.8 with gcc 4.2.1 but that did not go well. The fact that the i386-apple-darwin10.8.0 default target picked up by configure is not supported was a little worrying. After looking through the configure script, the only patterns that do not result in a "open64 is not supported on" are

  • x86_64*-*-linux*
  • i*86*-*-linux*
  • ia64*-*-linux*
so, as the documentation suggests, I went through forcing the target to x86_64-unknown-linux-gnu.

$ ${srcTop}/contrib/compilers/open64/configure --prefix=${installTop} --with-build-optimize=debug --build=x86_64-unknown-linux-gnu
$ make
open64/osprey/driver/table.c:46:20: warning: malloc.h: No such file or directory

An interesting entry point to follow is the Lnoptimizer() function in osprey/be/lno/lnopt_main.cxx. Looking for the Start_Timer/Stop_Timer is a good approximation to find specific optimizer phases and algorithms. The list of opcodes is defined in osprey/common/com/opcode_gen_core.h.


Browsing through the directory structure and the source it seems like a lot of the code came originally out of the Open64 project.

I tried to compile the Path64 source base on OSX 10.6.8 with gcc 4.2.1 but that did not go well either.

$ cmake -DCMAKE_BUILD_TYPE=Debug \
	  -DPSC_CRT_PATH_x86_64=/usr/lib \
	  -DPSC_LIBSUPCPP_PATH_x86_64=/usr/lib \
	  -DPSC_LIBSTDCPP_PATH_x86_64=/usr/lib \
	  -DPSC_LIBGCC_PATH_x86_64=/usr/lib/gcc/i686-apple-darwin10/4.2.1 \
	  -DPSC_LIBGCC_EH_PATH_x86_64=/usr/lib/gcc/i686-apple-darwin10/4.2.1 \
	  -DPSC_LIBGCC_S_PATH_x86_64=/usr/lib \
	  -DCMAKE_INSTALL_PREFIX=${installTop} \
$ make
path64/src/csu/elf-x86_64/crtbegin.S:30:unknown section type: @progbits

Runtime libraries

C/C++ standard runtime libraries

Here is a list of popular C/C++ standard runtime libraries implementation.