Overview
TODO: Write introduction. Goal is to build a cross compiler targeting pdp11-aout.
TODO: What kind of joint header do I want across all the articles in a set, linking them together?
This document guides you through building a cross compiler using GCC on FreeBSD. This cross compiler will run on a modern AMD64 machine but emit code which runs on a DEC PDP-11. In addition to the compiler, these instructions also build associated tooling like an assembler, linker, etc.
In this manner, modern programming tools like make
, git
, vi
, and more can
be used to write modern C in your usual style while targeting the PDP-11.
Installation
These instructions were tested on FreeBSD 12 with GCC 7.3.0 from ports as the host compiler. The cross compiler was built from the GCC 10.2.0 and Binutils 2.35.1 source code.
Building GCC requires GNU Make. On FreeBSD either install via pkg install
gmake
or build from ports under devel/gmake
. On Linux your make
command is
probably gmake
in disguise. Run make --version
and see if the first line is
something like GNU Make 4.2.1
.
In addition to GCC, we will also need to compile GNU Binutils since it contains the assembler, linker, and other necessary tools.
Obtain suitable source code tarballs from these links.
I like to build all my cross compilers under one folder in my home directory, each with a version specific sub-folder.
setenv PREFIX "$HOME/cross-compiler/pdp11-gcc10.2.0"
Remember to make any $PATH
changes permanent. For tcsh
on FreeBSD, this
means editing ~/.cshrc
. To set the $PATH
for this session, execute the
following.
setenv PATH "$PREFIX/bin:$PATH"
The $TARGET
environment variable is critical as it tells GCC what kind of
cross compiler we desire. In our case, this target
triplet is requesting code for the
PDP-11 architecture, wrapped in an a.out
container, with no hosted
environment. That means this is a bare-metal target. There will be no C
standard library, only the C language itself.
setenv TARGET pdp11-aout
Both GCC and binutils are best built from outside the source tree. Make two directories to hold the build detritus. Use a clean build directory each time you reconfigure or rebuild.
cd $HOME/cross-compiler/pdp11-gcc10.2.0
mkdir workdir-binutils
mkdir workdir-gcc
Build binutils first. Assuming you saved the source code in
~/cross-compiler/pdp11-gcc10.2.0/
, simply do the following.
cd $HOME/cross-compiler/pdp11-gcc10.2.0
tar xzf binutils-2.35.1.tar.gz
cd workdir-binutils
Now configure, build and install binutils.
../binutils-2.35.1/configure --target=$TARGET --prefix="$PREFIX" \
--with-sysroot --disable-nls --disable-werror
gmake
gmake install
Verify that you can access a series of files in your $PATH
named
pdp11-aout-*
(e.g. pdp11-aout-as
), and that checking their version with
pdp11-aout-as --version
results in something like GNU Binutils 2.35.1
.
With binutils built and installed, now it’s time to build GCC.
Follow a similar process to unpack the source code, but note the new
requirement to download dependencies. In older versions of GCC this command was
./contrib/download-dependencies
instead of
./contrib/download-prerequisites
.
cd $HOME/cross-compiler/pdp11-gcc10.2.0
tar xzf gcc-10.2.0.tar.gz
cd gcc-10.2.0
./contrib/download-prerequisites
cd ../workdir-gcc
Configuring GCC proceeds similarly to binutils. Both GNU as
and GNU ld
are
part of binutils, hence the directive informing GCC to use them.
../gcc-10.2.0/configure --target=$TARGET --prefix="$PREFIX" \
--disable-nls --enable-languages=c --without-headers \
--with-gnu-as --with-gnu-ld --disable-libssp
gmake all-gcc
gmake install-gcc
Verify that pdp11-aout-gcc --version
from your $PATH
reports something like
pdp11-aout-gcc 10.2.0
.
That’s it, you’re done. You now have a cross compiler that will run on your
workstation and output PDP-11 compatible binaries in a.out
format.
At this point you can skip ahead to the next section or continue reading about some potential pitfalls of the cross compiler we’ve just built.
Potential Pitfalls
Below are a few problems I ran into while using my cross compiler, some of which may apply when compiling your own code for the PDP-11. I hope that by mentioning the problems here, along with symptoms and workarounds, you might be saved some time when encountering them.
Compiling libgcc
Our newly built cross compiler expects libgcc
to exist at link time, but we
didn’t build it. So what is libgcc
anyway? Quoting from the GCC
manual:
GCC provides a low-level runtime library, libgcc.a or libgcc_s.so.1 on some
platforms. GCC generates calls to routines in this library automatically,
whenever it needs to perform some operation that is too complicated to emit
inline code for.
Most of the routines in libgcc handle arithmetic operations that the target
processor cannot perform directly. This includes integer multiply and divide on
some machines, and all floating-point and fixed-point operations on other
machines. libgcc also includes routines for exception handling, and a handful
of miscellaneous operations.
Some of these routines can be defined in mostly machine-independent C. Others
must be hand-written in assembly language for each processor that needs them.
Why didn’t we build libgcc
? Because we encountered this error
message.
Problem
Consider the following C code which performs division and modulus operations on 16-bit unsigned integers.
#include "pdp11.h"
#include <stdint.h>
uint16_t a=8, b=64;
printf("b \% a = %o\n", b % a);
printf("b / a = %o\n", b / a);
If we try to compile this code, we receive two errors from the linker.
pdp11-aout-ld: example.o:example.o:(.text+0x8e): undefined reference to `__umodhi3'
pdp11-aout-ld: example.o:example.o:(.text+0xac): undefined reference to `__udivhi3'
The two functions referenced, __umodhi3
and __udivhi3
are part of libgcc
.
The names reference the unsigned modulo or division on
half-integer types. Per the GCC
manual,
the half-integer mode uses a two-byte integer.
Solution
There are two ways around this problem.
The first (and superior) option is figuring out how to build libgcc
. The
command to initiate the build is gmake all-target-libgcc
, executed under the
same environment in which gmake all-gcc
was executed earlier in this guide.
If you figure out what I’m doing wrong, let me know.
The second option is to implement your own functions for __umodhi3()
,
__udivhi3()
, and whatever else might come up. It’s not hard to make something
functional, though catching all the edge cases could be challenging.
Using uint32
Although the PDP-11 utilizes a 16-bit word, GCC is clever enough to allow operations on 32-bit words by breaking them up into smaller operations. For example, in the following assembly code generated by GCC, note how the 32-bit word is pushed onto the stack as two separate words.
uint32_t a=0710004010 uint16_t a=010;
add $-4, sp add $-2, sp
mov $3440, (sp) mov $10, (sp)
mov $4010, 2(sp)
Problem
Whenever I try to make real use of code with uint32_t
, I encounter internal
compiler errors like the following.
memtest.c:119:1: error: insn does not satisfy its constraints:
}
^
(insn 95 44 45 (set (reg:HI 1 r1)
(reg/f:HI 16 virtual-incoming-args)) "memtest.c":114 14 {movhi}
(nil))
memtest.c:119:1: internal compiler error: in extract_constrain_insn_cached, at recog.c:2225
no stack trace because unwind library not available
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://gcc.gnu.org/bugs/> for instructions.
*** Error code 1
In each case, adding a single uint32_t
operation in one spot in the code
resulted in a compiler error in a completely different part of the code.
Removing the offending uint32_t
line caused the program to again compile and
execute normally. In each case, I already had uint32_t
related code working
elsewhere in the program.
Solution
Until I track down the bug causing these errors, I’ve been using structs
containing pairs of uint16_t
words and writing helper functions to perform
operations on them.
GNU Assembler Bug
If you’re stuck using an older version of GNU binutils, as I was while cross
compiling from a SPARCstation 20, there is a bug in the GNU assembler that
crops up whenever double-indirection is used in GCC. It was present until at
least GNU Binutil 2.28 but appears to be fixed no later than 2.32 per the
following code snippet in binutils-2.32/gas/config/tc-pdp11.c
.
if (*str == '@' || *str == '*')
{
/* @(Rn) == @0(Rn): Mode 7, Indexed deferred.
Check for auto-increment deferred. */
if ( ...
Problem
One of the addressing modes supported by the PDP-11 is ‘index deferred’,
represented by @X(Rn)
. This operand indicates that Rn
contains a pointer
which should be dereferenced and the result added to X
to generate a new
pointer to the final location. For example, consider the following four values,
one stored in a register and the other three in memory. Then @2(R1)
is the
value 222
.
R1: 1000
1000: 2000
2000: 111
2002: 222
Similarly, @0(R1)
is the value 111
. In most PDP-11 assemblers, including
DEC’s MACRO-11 assembler, the string @(Rn)
is an alias to @0(Rn)
. But when
the GNU assembler encounters @(Rn)
it assembles it as though it were (Rn)
,
a single level of indirection instead of two levels!
If we’re only writing assembly then we can work around this bug by always using
the form @0(Rn)
. But what if we’re writing C and using GCC to compile it?
Consider the following C code example, taken directly from some stack-based
debugger code written for the PDP-11.
uint16_t ** csp = (uint16_t **) 070000;
*csp = (uint16_t *) 060000;
**csp = 0;
When GCC compiles this to assembly it generates code of the form @(Rn)
when
assigning a value to **csp
thus causing the value 0
to overwrite the value
060000
at *csp
if GNU as
is used to assemble the code.
Solution
The following patch, tested on GNU binutils 2.28, fixes the bug. It’s a little
hacky since it overloads the operand->code
variable to pass unrelated state
information to parse_reg()
.
--- tc-pdp11.c 2017-06-24 22:33:00.260210000 -0700
+++ tc-pdp11.c.fixed 2017-06-24 22:32:12.455205000 -0700
@@ -431,6 +431,9 @@
{
LITTLENUM_TYPE literal_float[2];
+ /* Store the value (if any) passed by parse_op_noreg() before parse_reg() overwrites it. */
+ int deferred = operand->code;
+
str = skip_whitespace (str);
switch (*str)
@@ -451,6 +454,15 @@
operand->code |= 020;
str++;
}
+ /*
+ * This catches the case where @(Rn) is interpreted as (Rn) rather than @0(Rn)
+ */
+ else if (deferred)
+ {
+ operand->additional = 1;
+ operand->word = 0;
+ operand->code |= 060;
+ }
else
{
operand->code |= 010;
@@ -581,6 +593,12 @@
if (*str == '@' || *str == '*')
{
+ /*
+ * operand->code is overwritten by parse_reg() inside parse_op_no_deferred()
+ * We use it to temporarily catch the alias @(Rn) -> @0(Rn) since
+ * parse_op_no_deferred() starts at str+1 and thus misses the '@'.
+ */
+ operand->code |= 010;
str = parse_op_no_deferred (str + 1, operand);
if (operand->error)
return str;