Overview

TODO: Write introduction. Goal is to build a cross compiler targeting pdp11-aout.

TODO: What kind of joint header do I want across all the articles in a set, linking them together?

This document guides you through building a cross compiler using GCC on FreeBSD. This cross compiler will run on a modern AMD64 machine but emit code which runs on a DEC PDP-11. In addition to the compiler, these instructions also build associated tooling like an assembler, linker, etc.

In this manner, modern programming tools like make, git, vi, and more can be used to write modern C in your usual style while targeting the PDP-11.

Installation

These instructions were tested on FreeBSD 12 with GCC 7.3.0 from ports as the host compiler. The cross compiler was built from the GCC 10.2.0 and Binutils 2.35.1 source code.

Building GCC requires GNU Make. On FreeBSD either install via pkg install gmake or build from ports under devel/gmake. On Linux your make command is probably gmake in disguise. Run make --version and see if the first line is something like GNU Make 4.2.1.

In addition to GCC, we will also need to compile GNU Binutils since it contains the assembler, linker, and other necessary tools.

Obtain suitable source code tarballs from these links.

I like to build all my cross compilers under one folder in my home directory, each with a version specific sub-folder.

setenv PREFIX "$HOME/cross-compiler/pdp11-gcc10.2.0"

Remember to make any $PATH changes permanent. For tcsh on FreeBSD, this means editing ~/.cshrc. To set the $PATH for this session, execute the following.

setenv PATH "$PREFIX/bin:$PATH"

The $TARGET environment variable is critical as it tells GCC what kind of cross compiler we desire. In our case, this target triplet is requesting code for the PDP-11 architecture, wrapped in an a.out container, with no hosted environment. That means this is a bare-metal target. There will be no C standard library, only the C language itself.

setenv TARGET pdp11-aout

Both GCC and binutils are best built from outside the source tree. Make two directories to hold the build detritus. Use a clean build directory each time you reconfigure or rebuild.

cd $HOME/cross-compiler/pdp11-gcc10.2.0
mkdir workdir-binutils
mkdir workdir-gcc

Build binutils first. Assuming you saved the source code in ~/cross-compiler/pdp11-gcc10.2.0/, simply do the following.

cd $HOME/cross-compiler/pdp11-gcc10.2.0
tar xzf binutils-2.35.1.tar.gz
cd workdir-binutils

Now configure, build and install binutils.

../binutils-2.35.1/configure --target=$TARGET --prefix="$PREFIX" \
        --with-sysroot --disable-nls --disable-werror
gmake
gmake install

Verify that you can access a series of files in your $PATH named pdp11-aout-* (e.g. pdp11-aout-as), and that checking their version with pdp11-aout-as --version results in something like GNU Binutils 2.35.1.

With binutils built and installed, now it’s time to build GCC.

Follow a similar process to unpack the source code, but note the new requirement to download dependencies. In older versions of GCC this command was ./contrib/download-dependencies instead of ./contrib/download-prerequisites.

cd $HOME/cross-compiler/pdp11-gcc10.2.0
tar xzf gcc-10.2.0.tar.gz
cd gcc-10.2.0
./contrib/download-prerequisites
cd ../workdir-gcc

Configuring GCC proceeds similarly to binutils. Both GNU as and GNU ld are part of binutils, hence the directive informing GCC to use them.

../gcc-10.2.0/configure --target=$TARGET --prefix="$PREFIX" \
        --disable-nls --enable-languages=c --without-headers \
        --with-gnu-as --with-gnu-ld --disable-libssp
gmake all-gcc
gmake install-gcc

Verify that pdp11-aout-gcc --version from your $PATH reports something like pdp11-aout-gcc 10.2.0.

That’s it, you’re done. You now have a cross compiler that will run on your workstation and output PDP-11 compatible binaries in a.out format.

At this point you can skip ahead to the next section or continue reading about some potential pitfalls of the cross compiler we’ve just built.

Potential Pitfalls

Below are a few problems I ran into while using my cross compiler, some of which may apply when compiling your own code for the PDP-11. I hope that by mentioning the problems here, along with symptoms and workarounds, you might be saved some time when encountering them.

Compiling libgcc

Our newly built cross compiler expects libgcc to exist at link time, but we didn’t build it. So what is libgcc anyway? Quoting from the GCC manual:

GCC provides a low-level runtime library, libgcc.a or libgcc_s.so.1 on some
platforms. GCC generates calls to routines in this library automatically,
whenever it needs to perform some operation that is too complicated to emit
inline code for.

Most of the routines in libgcc handle arithmetic operations that the target
processor cannot perform directly. This includes integer multiply and divide on
some machines, and all floating-point and fixed-point operations on other
machines. libgcc also includes routines for exception handling, and a handful
of miscellaneous operations.

Some of these routines can be defined in mostly machine-independent C. Others
must be hand-written in assembly language for each processor that needs them.

Why didn’t we build libgcc? Because we encountered this error message.

Problem

Consider the following C code which performs division and modulus operations on 16-bit unsigned integers.

#include "pdp11.h"
#include <stdint.h>

uint16_t a=8, b=64;
printf("b \% a = %o\n", b % a);
printf("b / a = %o\n", b / a);

If we try to compile this code, we receive two errors from the linker.

pdp11-aout-ld: example.o:example.o:(.text+0x8e): undefined reference to `__umodhi3'
pdp11-aout-ld: example.o:example.o:(.text+0xac): undefined reference to `__udivhi3'

The two functions referenced, __umodhi3 and __udivhi3 are part of libgcc. The names reference the unsigned modulo or division on half-integer types. Per the GCC manual, the half-integer mode uses a two-byte integer.

Solution

There are two ways around this problem.

The first (and superior) option is figuring out how to build libgcc. The command to initiate the build is gmake all-target-libgcc, executed under the same environment in which gmake all-gcc was executed earlier in this guide. If you figure out what I’m doing wrong, let me know.

The second option is to implement your own functions for __umodhi3(), __udivhi3(), and whatever else might come up. It’s not hard to make something functional, though catching all the edge cases could be challenging.

Using uint32

Although the PDP-11 utilizes a 16-bit word, GCC is clever enough to allow operations on 32-bit words by breaking them up into smaller operations. For example, in the following assembly code generated by GCC, note how the 32-bit word is pushed onto the stack as two separate words.

uint32_t a=0710004010          uint16_t a=010;

add     $-4, sp                add     $-2, sp
mov     $3440, (sp)            mov     $10, (sp)
mov     $4010, 2(sp)

Problem

Whenever I try to make real use of code with uint32_t, I encounter internal compiler errors like the following.

memtest.c:119:1: error: insn does not satisfy its constraints:
 }
 ^
(insn 95 44 45 (set (reg:HI 1 r1)
        (reg/f:HI 16 virtual-incoming-args)) "memtest.c":114 14 {movhi}
     (nil))
memtest.c:119:1: internal compiler error: in extract_constrain_insn_cached, at recog.c:2225
no stack trace because unwind library not available
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://gcc.gnu.org/bugs/> for instructions.
*** Error code 1

In each case, adding a single uint32_t operation in one spot in the code resulted in a compiler error in a completely different part of the code. Removing the offending uint32_t line caused the program to again compile and execute normally. In each case, I already had uint32_t related code working elsewhere in the program.

Solution

Until I track down the bug causing these errors, I’ve been using structs containing pairs of uint16_t words and writing helper functions to perform operations on them.

GNU Assembler Bug

If you’re stuck using an older version of GNU binutils, as I was while cross compiling from a SPARCstation 20, there is a bug in the GNU assembler that crops up whenever double-indirection is used in GCC. It was present until at least GNU Binutil 2.28 but appears to be fixed no later than 2.32 per the following code snippet in binutils-2.32/gas/config/tc-pdp11.c.

if (*str == '@' || *str == '*')
{
    /* @(Rn) == @0(Rn): Mode 7, Indexed deferred.
    Check for auto-increment deferred.  */
    if ( ...

Problem

One of the addressing modes supported by the PDP-11 is ‘index deferred’, represented by @X(Rn). This operand indicates that Rn contains a pointer which should be dereferenced and the result added to X to generate a new pointer to the final location. For example, consider the following four values, one stored in a register and the other three in memory. Then @2(R1) is the value 222.

Similarly, @0(R1) is the value 111. In most PDP-11 assemblers, including DEC’s MACRO-11 assembler, the string @(Rn) is an alias to @0(Rn). But when the GNU assembler encounters @(Rn) it assembles it as though it were (Rn), a single level of indirection instead of two levels!

If we’re only writing assembly then we can work around this bug by always using the form @0(Rn). But what if we’re writing C and using GCC to compile it? Consider the following C code example, taken directly from some stack-based debugger code written for the PDP-11.

uint16_t ** csp = (uint16_t **) 070000;
*csp = (uint16_t *) 060000;
**csp = 0;

When GCC compiles this to assembly it generates code of the form @(Rn) when assigning a value to **csp thus causing the value 0 to overwrite the value 060000 at *csp if GNU as is used to assemble the code.

Solution

The following patch, tested on GNU binutils 2.28, fixes the bug. It’s a little hacky since it overloads the operand->code variable to pass unrelated state information to parse_reg().

--- tc-pdp11.c  2017-06-24 22:33:00.260210000 -0700
+++ tc-pdp11.c.fixed    2017-06-24 22:32:12.455205000 -0700
@@ -431,6 +431,9 @@
 {
   LITTLENUM_TYPE literal_float[2];

+  /* Store the value (if any) passed by parse_op_noreg() before parse_reg() overwrites it. */
+  int deferred = operand->code;
+
   str = skip_whitespace (str);

   switch (*str)
@@ -451,6 +454,15 @@
      operand->code |= 020;
      str++;
    }
+      /*
+       * This catches the case where @(Rn) is interpreted as (Rn) rather than @0(Rn)
+       */
+      else if (deferred)
+        {
+          operand->additional = 1;
+          operand->word = 0;
+          operand->code |= 060;
+        }
       else
    {
      operand->code |= 010;
@@ -581,6 +593,12 @@

   if (*str == '@' || *str == '*')
     {
+      /*
+       * operand->code is overwritten by parse_reg() inside parse_op_no_deferred()
+       * We use it to temporarily catch the alias @(Rn) -> @0(Rn) since
+       *   parse_op_no_deferred() starts at str+1 and thus misses the '@'.
+       */
+      operand->code |= 010;
       str = parse_op_no_deferred (str + 1, operand);
       if (operand->error)
return str;