Overview
TODO: Write introduction. Goal is to build a cross compiler targeting pdp11-aout.
TODO: What kind of joint header do I want across all the articles in a set, linking them together?
This document guides you through building a cross compiler using GCC on FreeBSD. This cross compiler will run on a modern AMD64 machine but emit code which runs on a DEC PDP-11. In addition to the compiler, these instructions also build associated tooling like an assembler, linker, etc.
In this manner, modern programming tools like make, git, vi, and more can
be used to write modern C in your usual style while targeting the PDP-11.
Installation
These instructions were tested on FreeBSD 12 with GCC 7.3.0 from ports as the host compiler. The cross compiler was built from the GCC 10.2.0 and Binutils 2.35.1 source code.
Building GCC requires GNU Make. On FreeBSD either install via pkg install
gmake or build from ports under devel/gmake. On Linux your make command is
probably gmake in disguise. Run make --version and see if the first line is
something like GNU Make 4.2.1.
In addition to GCC, we will also need to compile GNU Binutils since it contains the assembler, linker, and other necessary tools.
Obtain suitable source code tarballs from these links.
I like to build all my cross compilers under one folder in my home directory, each with a version specific sub-folder.
setenv PREFIX "$HOME/cross-compiler/pdp11-gcc10.2.0"
Remember to make any $PATH changes permanent. For tcsh on FreeBSD, this
means editing ~/.cshrc. To set the $PATH for this session, execute the
following.
setenv PATH "$PREFIX/bin:$PATH"
The $TARGET environment variable is critical as it tells GCC what kind of
cross compiler we desire. In our case, this target
triplet is requesting code for the
PDP-11 architecture, wrapped in an a.out container, with no hosted
environment. That means this is a bare-metal target. There will be no C
standard library, only the C language itself.
setenv TARGET pdp11-aout
Both GCC and binutils are best built from outside the source tree. Make two directories to hold the build detritus. Use a clean build directory each time you reconfigure or rebuild.
cd $HOME/cross-compiler/pdp11-gcc10.2.0
mkdir workdir-binutils
mkdir workdir-gcc
Build binutils first. Assuming you saved the source code in
~/cross-compiler/pdp11-gcc10.2.0/, simply do the following.
cd $HOME/cross-compiler/pdp11-gcc10.2.0
tar xzf binutils-2.35.1.tar.gz
cd workdir-binutils
Now configure, build and install binutils.
../binutils-2.35.1/configure --target=$TARGET --prefix="$PREFIX" \
--with-sysroot --disable-nls --disable-werror
gmake
gmake install
Verify that you can access a series of files in your $PATH named
pdp11-aout-* (e.g. pdp11-aout-as), and that checking their version with
pdp11-aout-as --version results in something like GNU Binutils 2.35.1.
With binutils built and installed, now it’s time to build GCC.
Follow a similar process to unpack the source code, but note the new
requirement to download dependencies. In older versions of GCC this command was
./contrib/download-dependencies instead of
./contrib/download-prerequisites.
cd $HOME/cross-compiler/pdp11-gcc10.2.0
tar xzf gcc-10.2.0.tar.gz
cd gcc-10.2.0
./contrib/download-prerequisites
cd ../workdir-gcc
Configuring GCC proceeds similarly to binutils. Both GNU as and GNU ld are
part of binutils, hence the directive informing GCC to use them.
../gcc-10.2.0/configure --target=$TARGET --prefix="$PREFIX" \
--disable-nls --enable-languages=c --without-headers \
--with-gnu-as --with-gnu-ld --disable-libssp
gmake all-gcc
gmake install-gcc
Verify that pdp11-aout-gcc --version from your $PATH reports something like
pdp11-aout-gcc 10.2.0.
That’s it, you’re done. You now have a cross compiler that will run on your
workstation and output PDP-11 compatible binaries in a.out format.
At this point you can skip ahead to the next section or continue reading about some potential pitfalls of the cross compiler we’ve just built.
Potential Pitfalls
Below are a few problems I ran into while using my cross compiler, some of which may apply when compiling your own code for the PDP-11. I hope that by mentioning the problems here, along with symptoms and workarounds, you might be saved some time when encountering them.
Compiling libgcc
Our newly built cross compiler expects libgcc to exist at link time, but we
didn’t build it. So what is libgcc anyway? Quoting from the GCC
manual:
GCC provides a low-level runtime library, libgcc.a or libgcc_s.so.1 on some
platforms. GCC generates calls to routines in this library automatically,
whenever it needs to perform some operation that is too complicated to emit
inline code for.
Most of the routines in libgcc handle arithmetic operations that the target
processor cannot perform directly. This includes integer multiply and divide on
some machines, and all floating-point and fixed-point operations on other
machines. libgcc also includes routines for exception handling, and a handful
of miscellaneous operations.
Some of these routines can be defined in mostly machine-independent C. Others
must be hand-written in assembly language for each processor that needs them.
Why didn’t we build libgcc? Because we encountered this error
message.
Problem
Consider the following C code which performs division and modulus operations on 16-bit unsigned integers.
#include "pdp11.h"
#include <stdint.h>
uint16_t a=8, b=64;
printf("b \% a = %o\n", b % a);
printf("b / a = %o\n", b / a);
If we try to compile this code, we receive two errors from the linker.
pdp11-aout-ld: example.o:example.o:(.text+0x8e): undefined reference to `__umodhi3'
pdp11-aout-ld: example.o:example.o:(.text+0xac): undefined reference to `__udivhi3'
The two functions referenced, __umodhi3 and __udivhi3 are part of libgcc.
The names reference the unsigned modulo or division on
half-integer types. Per the GCC
manual,
the half-integer mode uses a two-byte integer.
Solution
There are two ways around this problem.
The first (and superior) option is figuring out how to build libgcc. The
command to initiate the build is gmake all-target-libgcc, executed under the
same environment in which gmake all-gcc was executed earlier in this guide.
If you figure out what I’m doing wrong, let me know.
The second option is to implement your own functions for __umodhi3(),
__udivhi3(), and whatever else might come up. It’s not hard to make something
functional, though catching all the edge cases could be challenging.
Using uint32
Although the PDP-11 utilizes a 16-bit word, GCC is clever enough to allow operations on 32-bit words by breaking them up into smaller operations. For example, in the following assembly code generated by GCC, note how the 32-bit word is pushed onto the stack as two separate words.
uint32_t a=0710004010 uint16_t a=010;
add $-4, sp add $-2, sp
mov $3440, (sp) mov $10, (sp)
mov $4010, 2(sp)
Problem
Whenever I try to make real use of code with uint32_t, I encounter internal
compiler errors like the following.
memtest.c:119:1: error: insn does not satisfy its constraints:
}
^
(insn 95 44 45 (set (reg:HI 1 r1)
(reg/f:HI 16 virtual-incoming-args)) "memtest.c":114 14 {movhi}
(nil))
memtest.c:119:1: internal compiler error: in extract_constrain_insn_cached, at recog.c:2225
no stack trace because unwind library not available
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://gcc.gnu.org/bugs/> for instructions.
*** Error code 1
In each case, adding a single uint32_t operation in one spot in the code
resulted in a compiler error in a completely different part of the code.
Removing the offending uint32_t line caused the program to again compile and
execute normally. In each case, I already had uint32_t related code working
elsewhere in the program.
Solution
Until I track down the bug causing these errors, I’ve been using structs
containing pairs of uint16_t words and writing helper functions to perform
operations on them.
GNU Assembler Bug
If you’re stuck using an older version of GNU binutils, as I was while cross
compiling from a SPARCstation 20, there is a bug in the GNU assembler that
crops up whenever double-indirection is used in GCC. It was present until at
least GNU Binutil 2.28 but appears to be fixed no later than 2.32 per the
following code snippet in binutils-2.32/gas/config/tc-pdp11.c.
if (*str == '@' || *str == '*')
{
/* @(Rn) == @0(Rn): Mode 7, Indexed deferred.
Check for auto-increment deferred. */
if ( ...
Problem
One of the addressing modes supported by the PDP-11 is ‘index deferred’,
represented by @X(Rn). This operand indicates that Rn contains a pointer
which should be dereferenced and the result added to X to generate a new
pointer to the final location. For example, consider the following four values,
one stored in a register and the other three in memory. Then @2(R1) is the
value 222.
R1: 1000
1000: 2000
2000: 111
2002: 222
Similarly, @0(R1) is the value 111. In most PDP-11 assemblers, including
DEC’s MACRO-11 assembler, the string @(Rn) is an alias to @0(Rn). But when
the GNU assembler encounters @(Rn) it assembles it as though it were (Rn),
a single level of indirection instead of two levels!
If we’re only writing assembly then we can work around this bug by always using
the form @0(Rn). But what if we’re writing C and using GCC to compile it?
Consider the following C code example, taken directly from some stack-based
debugger code written for the PDP-11.
uint16_t ** csp = (uint16_t **) 070000;
*csp = (uint16_t *) 060000;
**csp = 0;
When GCC compiles this to assembly it generates code of the form @(Rn) when
assigning a value to **csp thus causing the value 0 to overwrite the value
060000 at *csp if GNU as is used to assemble the code.
Solution
The following patch, tested on GNU binutils 2.28, fixes the bug. It’s a little
hacky since it overloads the operand->code variable to pass unrelated state
information to parse_reg().
--- tc-pdp11.c 2017-06-24 22:33:00.260210000 -0700
+++ tc-pdp11.c.fixed 2017-06-24 22:32:12.455205000 -0700
@@ -431,6 +431,9 @@
{
LITTLENUM_TYPE literal_float[2];
+ /* Store the value (if any) passed by parse_op_noreg() before parse_reg() overwrites it. */
+ int deferred = operand->code;
+
str = skip_whitespace (str);
switch (*str)
@@ -451,6 +454,15 @@
operand->code |= 020;
str++;
}
+ /*
+ * This catches the case where @(Rn) is interpreted as (Rn) rather than @0(Rn)
+ */
+ else if (deferred)
+ {
+ operand->additional = 1;
+ operand->word = 0;
+ operand->code |= 060;
+ }
else
{
operand->code |= 010;
@@ -581,6 +593,12 @@
if (*str == '@' || *str == '*')
{
+ /*
+ * operand->code is overwritten by parse_reg() inside parse_op_no_deferred()
+ * We use it to temporarily catch the alias @(Rn) -> @0(Rn) since
+ * parse_op_no_deferred() starts at str+1 and thus misses the '@'.
+ */
+ operand->code |= 010;
str = parse_op_no_deferred (str + 1, operand);
if (operand->error)
return str;