Xeon Phi Setup/Build Notes
These notes cover the creation of a Debian 10.9 (buster) server with ZFS root which serves as host to Knights Corner Xeon Phi coprocessor cards.
Each of these coprocessor cards features a P54C-derived core extended to support the X86-64 instruction set, 4-way SMT, and a beefy 512-bit vector processor bolted alongside. Sixty of these cores are connected on a roughly 1 terabit/s bi-directional ring bus. In addition to 8GB of GDDR5 RAM, each core has 512kB of local cache and, via the ring bus and a distributed tag store, all caches are coherent and quickly accessible from remote cores. This hardware is packaged up on a PCIe card which presents a virtual network interface to the host. The coprocessor card runs Linux+BusyBox, allowing SSH access to a traditional Linux environment on a familiar 60-core x86-64 architecture.
The hostname frostburg.subgeniuskitty.com
stems from the original
FROSTBURG, a CM-5 designed by
Thinking Machines. Although the fundamental connection topology of a fat tree
was different than the ring used in this Xeon Phi, the systems are somewhat
similar. Both feature a NUMA cluster of repackaged and extended commercial
processor cores operating on independent instruction streams in a MIMD fashion
focused on small local data stores. By coincidence, both also feature similar
core counts and total memory size.
The information on this page includes:
Hardware compatibility notes for Xeon Phi and Xeon host.
Installation of Debian 10.9 (buster) root on encrypted ZFS mirror with automated snapshots and scrubs.
Porting the Xeon Phi kernel module to newer versions of the Linux kernel.
(TODO) Installing MPSS toolkit on Debian (or CentOS VM).
(TODO) Building GCC toolchain for Xeon Phi.
(TODO) Installing Intel toolchain for Xeon Phi.
These notes are a high-level checklist for my reference rather than a step-by-step installation guide for the public. That means they make no attempt to explain all options at each step, rather that they mention only the options I use on my servers. It also means they use my domains, my file system paths, etc in the examples. Don’t blindly copy and paste.
Hardware
The host system was kept low power both figuratively and literally. It will primarily serve as a host for the Phi coprocessors and bridge to the network.
Chassis: Supermicro 2027GR-TR2
Motherboard: Supermicro X9DRG-HF+II
CPU: 2x Xeon E5-2637
RAM: 8x 4GB DDR3 RDIMM
Storage: 2x Intel 160GB X-25M SSD
Payload: 4x Intel Xeon Phi 5110P
To enter the BIOS, use the DEL
key. Similarly, a boot device selection menu
is obtained by pressing F11
. System will display two-character status codes
in the bottom right corner of display.
Support files are stored under hw_support/Intel Xeon Phi/supermicro/
.
Memory
Using eight identical sticks of MT36JSZF51272PZ-1G4 RAM. These are ECC DDR3
2Rx4 PC3-10600 RDIMMS operating at 1.5V. Per page 2-12 of the manual
(MNL_1502.pdf
), DIMMs are installed in all blue memory slots.
Processors & Heatsinks
Xeon E5-2637 CPUs selected for lower power, high frequency, cheap price, and
‘full’ PCIe lane count. They only need to be a host for the real show. Per page
5-7 of the chassis manual (MNL-1564.pdf
), CPU1 requires heatsink SNK-P0048PS
and CPU2 requires heatsink SNK-P0047PS.
SAS Backplane & Motherboard SATA
The SAS backplane is a little odd. The first eight drive bays connect via a pair of SFF-8087 connectors and the last two drive bays connect via standard 7-pin SATA connectors.
Since the motherboard provides ten 7-pin SATA connectors, two cables breaking
out SFF-8087 to quad SATA will be required. I tried using just such a cable,
but had no luck. There doesn’t appear to be anything configurable on the
backplane itself. The backplane manual is stored at BPN-SAS-218A.pdf
. My
cable was of unknown origin. Per photos on some eBay auctions, the proper
Supermicro cable appears to be part number 672042095704. In addition to the
four SATA connectors, this cable also bundles some sort of 4-pin header,
presumably the SGPIO connection.
In the meantime, since I only intend to use two small drives in a ZFS mirror for the OS and home directories, with all other storage on network shares, simply use the last two slots and connect with normal 30"+ SATA cables.
These last two drive bay slots are connected to the two white SATA ports on the motherboard, with the lowest numbered drive slot connected to the rear-most white SATA port. When SFF-8087 connectors are eventually used to increase local storage, relocate the boot drives to drive slots 0 and 1, and connect these slots to the white SATA ports.
On the motherboard, the white ports are SATA3 and the black ports are SATA2.
The line of 2x white and 4x black SATA ports are part of the primary SATA
controller or I_SATA
. The other line of 4x black SATA ports is part of the
secondary or S_SATA
controller. Put any boot drives on the I_SATA
ports.
Xeon Phi
Section 5.1 of the Intel Xeon Phi Coprocessor Datasheet (DocID 328209-004EN) mentions that connecting the card via both 2x4 and 2x3 power connectors enables higher sustained power draw up to 245 watts versus 225 watts of other power cable configurations. This chassis will easily support the higher power draw and heat dissipation.
The Xeon Phi coprocessor cards reserve PCIe MMIO address space sufficient to
map the entire coprocessor card’s RAM. Since this is >4GB, PCIe Base Address
Registers (BAR) of greater than 32-bit size are required. This should be
enabled in the BIOS of this particular motherboard under
PCIe/PCI/PnP Configuration
-> Above 4G Decoding
.
In general, motherboards with chipsets equal to or newer than the C602 should work. This includes most Supermicro motherboards from the X9xxx generation or later. None of the Supermicro X8xxx generation motherboards appear to be compatible.
The Xeon Phi 5110P, per the suffix, is passively cooled. Section 3 of the Intel Xeon Phi Coprocessor Datasheet (DocID 328209-004EN) details the cooling and mounting requirements.
Optional Fans
There are a number of optional fans for this chassis, all detailed in the
chassis manual (MNL-1564.pdf
). My machine includes the optional fan for
another double-height, full-length PCIe card with backpanel IO slots, intended
to support something like a GPU to drive monitors. Since the optional fan is
installed and since the power budget easily supports it, this means the fifth
Xeon Phi card could be installed, albeit with slower PCIe connection.
Regardless, since this fan is installed, whenever fewer than four Xeon Phi cards are installed, preferentially locate them on the left hand side of chassis, near the lower numbered drive bays.
Power Supply
The system contains dual redundant power supplies. Each is capable of supplying 1600 watts, but only when connected to a 240 volt source. When connected to a 120 volt source, maximum power output is 1000 watts.
Rackmount
The chassis is over 30" long and protrudes from rear of rack by approximately ½". To avoid the rear cable snagging passing carts and elbows, chassis was mounted at top of rack (after empty 1U). The Supermicro rails required cutting four notches in the vertical posts, so this is a semi-permanent home.
Inserting or extracting the server from the rack at that height requires an extraordinary amount of free space in front of the rack and some advance planning. Where possible, try to do hardware modifications in-rack. The rails are extremely solid even when the server is fully extended. The grey OS-114/WQM-4 sonar test set chassis makes a solid step stool at the ideal height for working on the server while installed in the rack.
USB Ports
There are only two USB ports, both located on the rear of the chassis. During OS installation, if a mouse is required in addition to the keyboard and USB install drive, then a USB hub is required.
Debian Buster Installation
These installation instructions use the following XFCE Debian live image.
debian-live-10.9.0-amd64-xfce.iso
Both the Gnome and XFCE live images were unusably slow in GUI mode. The text
installer was fast and responsive, as were VTYs (Ctrl
+Alt
+F2
) from within
the live environment. Only the GUIs were slow, but they were slow to the point
of being unusable, with single keypresses registering over a dozen times. Once
Debian was installed on the SSD and booting normally, the GUI is perfectly
usable. Since the local terminal is only used to install and start an OpenSSH
daemon, and since this can be done from a VTY, the issue was not investigated
further.
The root on ZFS portion of this installation process is derived from the guide located here:
Remote Access
From the F11
BIOS boot menu, select the UEFI entry for the USB live image.
Lacking a mouse, press CTRL
+ALT
+F2
after X is running in order to access
a text-only VTY, already logged in as the user user
. Install an SSH server so
the remaining install can be done over the network.
apt-get update
apt-get install openssh-server
systemctl enable ssh
From wherever you intend to complete the install, SSH into the live Debian
environment as user user
with password live
.
ZFS Configuration
Edit /etc/apt/sources.list
to include the following entries.
deb http://deb.debian.org/debian/ buster main contrib
deb http://deb.debian.org/debian/ buster-backports main contrib
deb-src http://deb.debian.org/debian/ buster main contrib
Install the ZFS kernel module. Specify --no-install-recommends
to avoid
picking up zfsutils-linux
since it will fail at this point. See
https://github.com/openzfs/zfs/issues/9599 for more details.
apt-get install -t buster-backports --no-install-recommends zfs-dkms
modprobe zfs
With the kernel module successfully loaded, proceed to install ZFS.
apt-get install -t buster-backports zfsutils-linux
After using dd
to eliminate any existing partition tables, partition the
disks for use with UEFI and ZFS.
First, create a UEFI partition on each disk.
sgdisk -n2:1M:+512M -t2:EF00 /dev/disk/by-id/ata-INTEL_SSDSA2M160G2GN_BTPO1252011L160AGN
Next, create a partition for the boot pool.
sgdisk -n3:0:+1G -t3:BF01 /dev/disk/by-id/ata-INTEL_SSDSA2M160G2GN_BTPO1252011L160AGN
Finally, create a partition for the encrypted pool.
sgdisk -n4:0:0 -t4:BF00 /dev/disk/by-id/ata-INTEL_SSDSA2M160G2GN_BTPO1252011L160AGN
Now that partitioning is complete, create the boot and root pools.
The boot pool uses only ZFS options supported by GRUB.
zpool create \
-o cachefile=/etc/zfs/zpool.cache \
-o ashift=12 -d \
-o feature@async_destroy=enabled \
-o feature@bookmarks=enabled \
-o feature@embedded_data=enabled \
-o feature@empty_bpobj=enabled \
-o feature@enabled_txg=enabled \
-o feature@extensible_dataset=enabled \
-o feature@filesystem_limits=enabled \
-o feature@hole_birth=enabled \
-o feature@large_blocks=enabled \
-o feature@lz4_compress=enabled \
-o feature@spacemap_histogram=enabled \
-o feature@zpool_checkpoint=enabled \
-O acltype=posixacl -O canmount=off -O compression=lz4 \
-O devices=off -O normalization=formD -O relatime=on -O xattr=sa \
-O mountpoint=/boot -R /mnt \
bpool mirror \
/dev/disk/by-id/ata-INTEL_SSDSA2M160G2GN_BTPO1252011L160AGN-part3
/dev/disk/by-id/ata-INTEL_SSDSA2M160G2GN_BTHC72250AKD480MGN-part3
Now create the root pool with ZFS encryption.
zpool create \
-o ashift=12 \
-O encryption=aes-256-gcm \
-O keylocation=prompt -O keyformat=passphrase \
-O acltype=posixacl -O canmount=off -O compression=lz4 \
-O dnodesize=auto -O normalization=formD -O relatime=on \
-O xattr=sa -O mountpoint=/ -R /mnt \
rpool mirror \
/dev/disk/by-id/ata-INTEL_SSDSA2M160G2GN_BTPO1252011L160AGN-part4
/dev/disk/by-id/ata-INTEL_SSDSA2M160G2GN_BTHC72250AKD480MGN-part4
All the pools are created, so now it’s time to setup filesystems. Start with some containers.
zfs create -o canmount=off -o mountpoint=none rpool/ROOT
zfs create -o canmount=off -o mountpoint=none bpool/BOOT
Now add filesystems for boot and root.
zfs create -o canmount=noauto -o mountpoint=/ rpool/ROOT/debian
zfs mount rpool/ROOT/debian
zfs create -o mountpoint=/boot bpool/BOOT/debian
Create a filesystem to contain home directories and mount root’s homedir in the correct location.
zfs create rpool/home
zfs create -o mountpoint=/root rpool/home/root
chmod 700 /mnt/root
Create filesystems under /var
and exclude temporary files from snapshots.
zfs create -o canmount=off rpool/var
zfs create -o canmount=off rpool/var/lib
zfs create rpool/var/log
zfs create rpool/var/spool
zfs create -o com.sun:auto-snapshot=false rpool/var/cache
zfs create -o com.sun:auto-snapshot=false rpool/var/tmp
chmod 1777 /mnt/var/tmp
zfs create rpool/var/mail
Create a few other misc filesystems.
zfs create rpool/srv
zfs create -o canmount=off rpool/usr
zfs create rpool/usr/local
Temporarily mount a tmpfs
at /run
.
mkdir /mnt/run
mount -t tmpfs tmpfs /mnt/run
mkdir /mnt/run/lock
Debian Configuration
Install a minimal Debian system.
apt-get install debootstrap
debootstrap buster /mnt
Copy the zpool cache into the new system.
mkdir /mnt/etc/zfs
cp /etc/zfs/zpool.cache /mnt/etc/zfs
Set the hostname.
echo frostburg > /mnt/etc/hostname
echo "127.0.1.1 frostburg.subgeniuskitty.com frostburg" >> /mnt/etc/hosts
Configure networking.
vi /mnt/etc/network/interfaces.d/enp129s0f0
auto enp129s0f0
iface enp129s0f0 inet static
address 192.168.1.7/24
gateway 192.168.1.1
vi /etc/resolv.conf
search subgeniuskitty.com
nameserver 192.168.1.1
Configure packages sources.
vi /mnt/etc/apt/sources.list
deb http://deb.debian.org/debian buster main contrib
deb-src http://deb.debian.org/debian buster main contrib
deb http://security.debian.org/debian-security buster/updates main contrib
deb-src http://security.debian.org/debian-security buster/updates main contrib
deb http://deb.debian.org/debian buster-updates main contrib
deb-src http://deb.debian.org/debian buster-updates main contrib
vi /mnt/etc/apt/sources.list.d/buster-backports.list
deb http://deb.debian.org/debian buster-backports main contrib
deb-src http://deb.debian.org/debian buster-backports main contrib
vi /mnt/etc/apt/preferences.d/90_zfs
Package: libnvpair1linux libuutil1linux libzfs2linux libzfslinux-dev libzpool2linux python3-pyzfs pyzfs-doc spl spl-dkms zfs-dkms zfs-dracut zfs-initramfs zfs-test zfsutils-linux zfsutils-linux-dev zfs-zed
Pin: release n=buster-backports
Pin-Priority: 990
apt-get update
Chroot into the new environment.
mount --rbind /dev /mnt/dev
mount --rbind /proc /mnt/proc
mount --rbind /sys /mnt/sys
chroot /mnt
Configure the new environment as a basic system.
ln -s /proc/self/mounts /etc/mtab
apt-get update
export TERM=vt100
apt-get install console-setup locales
dpkg-reconfigure locales tzdata keyboard-configuration console-setup
Install ZFS on the new system.
apt-get install dpkg-dev linux-headers-amd64 linux-image-amd64
apt-get install zfs-initramfs
echo REMAKE_INITRD=yes > /etc/dkms/zfs.conf
Install GRUB and configure UEFI boot partition.
apt-get install dosfstools
mkdosfs -F 32 -s 1 -n EFI /dev/disk/by-id/ata-INTEL_SSDSA2M160G2GN_BTPO1252011L160AGN-part2
mkdir /boot/efi
echo "/dev/disk/by-id/ata-INTEL_SSDSA2M160G2GN_BTPO1252011L160AGN-part2 /boot/efi vfat defaults 0 0" >> /etc/fstab
mount /boot/efi
apt-get install grub-efi-amd64 shim-signed
apt-get remove --purge os-prober
Ensure the bpool is always imported, even if /etc/zfs/zpool.cache
doesn’t
exist or doesn’t include a relevant entry.
vi /etc/systemd/system/zfs-import-bpool.service
[Unit]
DefaultDependencies=no
Before=zfs-import-scan.service
Before=zfs-import-cache.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zpool import -N -o cachefile=none bpool
# Work-around to preserve zpool cache:
ExecStartPre=-/bin/mv /etc/zfs/zpool.cache /etc/zfs/preboot_zpool.cache
ExecStartPost=-/bin/mv /etc/zfs/preboot_zpool.cache /etc/zfs/zpool.cache
[Install]
WantedBy=zfs-import.target
systemctl enable zfs-import-bpool.service
Create a tmpfs
mounted at /tmp
.
cp /usr/share/systemd/tmp.mount /etc/systemd/system/
systemctl enable tmp.mount
Bootloader Configuration
Verify ZFS boot filesystem is recognized.
grub-probe /boot
Refresh initrd.
update-initramfs -c -k all
Configure GRUB by editing /etc/default/grub
. Remove the quiet
option from
GRUB_CMDLINE_LINUX_DEFAULT
and add the following two options to the
appropriate entries.
GRUB_CMDLINE_LINUX="root=ZFS=rpool/ROOT/debian"
GRUB_TERMINAL=console
Install GRUB to the UEFI boot partition.
grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=debian-1 --recheck --no-floppy
Install GRUB on the other hard drives, incrementing -2
to -N
as necessary.
umount /boot/efi
dd if=/dev/disk/by-id/scsi-SATA_disk1-part2 \
of=/dev/disk/by-id/scsi-SATA_disk2-part2
efibootmgr -c -g -d /dev/disk/by-id/scsi-SATA_disk2 \
-p 2 -L "debian-2" -l '\EFI\debian\grubx64.efi'
mount /boot/efi
Fix filesystem mount ordering. Quoting from the install reference, “We need to
activate zfs-mount-generator
. This makes systemd aware of the separate
mountpoints, which is important for things like /var/log
and /var/tmp
. In
turn, rsyslog.service
depends on var-log.mount
by way of local-fs.target
and services using the PrivateTmp
feature of systemd automatically use
After=var-tmp.mount
.”
mkdir /etc/zfs/zfs-list.cache
touch /etc/zfs/zfs-list.cache/bpool
touch /etc/zfs/zfs-list.cache/rpool
zed -F
From another SSH session, verify that zed updated the cache by making sure the previously created empty files are not empty.
cat /etc/zfs/zfs-list.cache/bpool
cat /etc/zfs/zfs-list.cache/rpool
If all is well, return to the previous SSH session and terminate zed
with
Ctrl
+C
.
Fix the paths to eliminate /mnt
.
sed -Ei "s|/mnt/?|/|" /etc/zfs/zfs-list.cache/*
Reboot
The Debian install is almost ready for use without the live Debian host environment. Only a few steps remain.
Do a final system update.
apt-get dist-upgrade
Disable log compression since ZFS is already compressing at the block level.
for file in /etc/logrotate.d/* ; do
if grep -Eq "(^|[^#y])compress" "$file" ; then
sed -i -r "s/(^|[^#y])(compress)/\1#\2/" "$file"
fi
done
Install an SSH server so we can login again after rebooting.
apt-get install openssh-server
Set a root password.
passwd
Create a user account.
zfs create rpool/home/ataylor
adduser ataylor
mkdir /etc/skel/.ssh && chmod 700 /etc/skel/.ssh
cp -a /etc/skel/. /home/ataylor/
scp ataylor@lagavulin:/usr/home/ataylor/.ssh/id_rsa.pub /home/ataylor/.ssh/authorized_keys
chown -R ataylor:ataylor /home/ataylor
usermod -a -G audio,cdrom,dip,floppy,netdev,plugdev,sudo,video ataylor
Snapshot the install.
zfs snapshot bpool/BOOT/debian@install
zfs snapshot rpool/ROOT/debian@install
Exit the chroot and unmount all filesystems.
exit
mount | grep -v zfs | tac | awk '/\/mnt/ {print $3}' | xargs -i{} umount -lf {}
zpool export -a
Reboot the computer and remove the USB stick. Installation is complete.
UNIX Userland
Install various no-config-required userland packages before continuing.
apt-get install net-tools bzip2 zip ntp htop xterm screen git \
build-essential pciutils smartmontools gdb valgrind wget \
texlive texlive-latex-extra graphviz firefox sysfsutils
X Window Manager
Install X and dwm to ensure all dependencies are met for running my dwm-derived window manager.
apt-get install xorg dwm numlockx
Install dependencies for building my window manager.
apt-get install libx11-dev libxft-dev libxinerama-dev
Copy the Hophib Modern Desktop git repo to the new server. Make the following changes:
hhmd/src/mk.conf
: Change the installation prefix from/hh
to/home/ataylor/bin
hhmd/src/window_manager/Makefile
: Change library and include paths from/usr/local/...
to/usr/...
hhmd/src/window_manager/dwm-status.c
: Change#include <sys/time.h>
to#include <time.h>
and add#define _GNU_SOURCE
as well as#define _DEFAULT_SOURCE
to the top of the filehhmd/src/window_manager/dwm.c
: Add#define _POSIX_C_SOURCE 2
to the top of the file.hhmd/src/window_manager/dwm-watchdog.sh
: Change paths and executable names from/hh/...
to/home/ataylor/bin/...
and fromwm
todwm
.
Execute make clean install
. Verify that dwm
, dwm-status
and
dwm-watchdog.sh
all ended up in /home/ataylor/bin
with appropriate
permissions. Delete the man pages that were installed in ataylor’s homedir.
Create ~/.xinitrc
with following contents.
/usr/bin/numlockx &
/home/ataylor/bin/dwm-status &
/home/ataylor/bin/dwm-watchdog.sh
Verify X and my window manager start successfully and that dwm-watchdog.sh
keeps X and X applications alive during a window manager live restart.
VIM
Install gvim.
apt-get install gvim
Create ~/.vimrc
with the following contents.
set nocompatible
filetype off
set mouse=r
set number
syntax on
set tabstop=4
set expandtab
"Folding
"http://vim.wikia.com/wiki/Folding_for_plain_text_files_based_on_indentation
"set foldmethod=expr
"set foldexpr=(getline(v:lnum)=~'^$')?-1:((indent(v:lnum)<indent(v:lnum+1))?('>'.indent(v:lnum+1)):indent(v:lnum))
"set foldtext=getline(v:foldstart)
"set fillchars=fold:\ "(there's a space after that \)
"highlight Folded ctermfg=DarkGreen ctermbg=Black
"set foldcolumn=6
" Color the 100th column.
set colorcolumn=100
highlight ColorColumn ctermbg = darkgray
TCSH
Install tcsh.
apt-get install tcsh
Change the default shell for new users by editing /etc/adduser.conf
, setting
the DSHELL
variable to /bin/tcsh
. Then use the chsh
command to change the
shell for root and ataylor. Create ~/.cshrc
in ataylor’s and root’s homedir
with the following contents. Remember to also copy it to /etc/skel
and set
permissions so it’s used for any future users on the system.
# .cshrc - csh resource script, read at beginning of execution by each shell
alias h history 25
alias j jobs -l
alias la ls -aF
alias lf ls -FA
alias ll ls -lF --color
alias ls ls --color
# These are normally set through /etc/login.conf. You may override them here
# if wanted.
set path = (/sbin /bin /usr/sbin /usr/bin /usr/local/sbin /usr/local/bin $HOME/bin)
setenv EDITOR vim
setenv PAGER more
if ($?prompt) then
# An interactive shell -- set some stuff up
set prompt = "%N@%m:%~ %# "
set promptchars = "%#"
set filec
set history = 1000
set savehist = (1000 merge)
set autolist = ambiguous
# Use history to aid expansion
set autoexpand
set autorehash
set mail = (/var/mail/$USER)
if ( $?tcsh ) then
bindkey "^W" backward-delete-word
bindkey -k up history-search-backward
bindkey -k down history-search-forward
endif
endif
XScreensaver
Install Xscreensaver and configure screen locking.
apt-get install xscreensaver xscreensaver-data
Run xscreensaver-demo
and select some screensavers. If inspiration doesn’t
strike, do single screensaver mode with the abstractile
hack; it looks good
on pretty much any hardware. Remember to enable screen locking.
Add the following line to ~/.xinitrc
.
/bin/xscreensaver -nosplash &
Go Toolchain
The version of Go provided via apt-get
is always out of date, so all Go
installs on this server are done via tarball from the https://golang.com
website. Go 1.16.3 is used for this example but the newest version of Go may be
found at https://golang.org/dl/.
Previous versions of Go are installed entirely under /usr/local/go
. Delete
the entire /usr/local/go
directory before proceeding.
wget https://golang.org/dl/go1.16.3.linux-amd64.tar.gz
tar -C /usr/local -xzf go1.16.3.linux-amd64.tar.gz
If this is the first time installing Go on the system, update everyone’s
$PATH
to include /usr/local/go/bin
. Remember to update files under
/etc/skel
at the same time.
ZFS Snapshots
In order to configure automatic ZFS snapshots, use the auto-zfs-snapshot
package.
apt-get install auto-zfs-snapshot
In addition to the snapshot script itself, this package includes automatically
enabled cron entries, but it will only snapshot filesystems with the
com.sun:auto-snapshot
property set to true
. Since we already manually set
that property to false
for /var/cache
and /var/tmp
, simply set it to
true
for the two parent pools and allow filesystems to inherit wherever
possible.
zfs set com.sun:auto-snapshot=true rpool
zfs set com.sun:auto-snapshot=true bpool
Verify that relevant filesystems inherited the property.
zfs get com.sun:auto-snapshot
After waiting 15+ minutes, verify that snapshots begin to appear.
zfs list -t snapshot
ZFS Scrubs
Automate ZFS scrubs by creating /etc/cron.d/zfs-scrubs
with the following
contents.
PATH=/etc:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin
0 0 0 * * root /sbin/zpool scrub rpool
0 0 0 * * root /sbin/zpool scrub bpool
Status Updates
In order to receive status updates like failed drive notifications, we must
first configure the system to send email through the SGK mail server. Rather
than use exim4
as provided by the base system, instead use msmtp
.
apt-get install msmtp-mta
Create the file /etc/msmtprc
with the following contents.
# Set default values for all following accounts.
defaults
auth on
tls on
tls_trust_file /etc/ssl/certs/ca-certificates.crt
tls_starttls off
# Account: subgeniuskitty
account default
host mail.subgeniuskitty.com
port 465
from ataylor@subgeniuskitty.com
user ataylor@subgeniuskitty.com
password <plaintext-password>
Create the file /etc/cron.d/status-emails
with the following contents.
PATH=/etc:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin
SHELL=/bin/bash
0 0 * * 0 root /sbin/zpool status | echo -e "Subject:FROSTBURG: zpool status\n\n $(cat -)" | msmtp ataylor@subgeniuskitty.com
IRC Environment
IRC is used for collaboration on the server. First install daemon and client.
apt-get install ngircd irssi
Configure the server by editing /etc/ngircd/ngircd.conf
. The defaults are
mostly acceptable but the server must be given a name and restricted to only
listen for local connections. While we’re at it, the max nick length is only 9
by default and should be increased. Note that these values need to be inserted
under the appropriate category, as shown below, but the categories already
exist in the config file.
[Global]
Name = frostburg.subgeniuskitty.com
Info = Frostburg - Private IRC Server
Listen = 127.0.0.1
[Limits]
MaxNickLength = 32
Restart the server and verify it listens on the correct addresses.
# systemctl restart ngircd
# netstat -an | grep LISTEN
tcp 0 0 127.0.0.1:6667 0.0.0.0:* LISTEN
Startup a client in screen for each user.
screen -dR irc
irssi
/connect localhost
/join #channel
Public SSH Access
Although frostburg is on a private subnet, I want public SSH access. The easiest way to set this up is via a reverse SSH tunnel to one of the public subgeniuskitty.com servers.
This section refers to three machines:
The server is frostburg.subgeniuskitty.com, a machine which we desire to access across the internet despite residing on a private subnet.
The endpoint is a server with public IP address which will serve as an access portal for the server.
The client is the human user’s workstation, the machine which is attempting to login to the server via the endpoint.
First, setup appropriate login credentials on the server, which in this case
is frostburg.subgeniuskitty.com
. Ignore any warnings about /home/username
already existing or not being owned by the correct user. These are simply a
side effect of using ZFS since we must create the homedir before adding the
user, but we can’t change ownership until after the new user exists.
server:~ # zfs create rpool/home/username
server:~ # adduser username
server:~ # cp -a /etc/skel/. /home/username
server:~ # chown -R username:username /home/username
server:~ # zfs snapshot rpoot/home/username@account_creation
If necessary for the intended tasks, add the user to any relevant groups with something like the following command.
server:~ # usermod -a -G netdev,plugdev,sudo,video username
The user will also need login credentials on the endpoint. These credentials don’t need to allow anything other than simply SSHing through to the server.
endpoint:~ # adduser username
With appropriate credentials successfully created, move on to setting up a reverse SSH tunnel from server to endpoint.
First, create an SSH key on the server with no passphrase and authorize it for logins on the endpoint. This will be used to bring the tunnel up when the machine boots. If a non-empty passphrase is specified, you will need to type it during the boot process.
server:~ # ssh-keygen
server:~ # scp /root/.ssh/id_rsa.pub username@endpoint:/home/username/temp_key_file
server:~ # ssh username@endpoint
(login requires password)
endpoint:~ % mkdir -p /home/username/.ssh
endpoint:~ % mv /home/username/temp_key_file /home/username/.ssh/authorized_keys
endpoint:~ % logout
server:~ # ssh username@endpoint
(login does not require password)
endpoint:~ % logout
server:~ # mv /root/.ssh/id_rsa rtunnel_nopwd
server:~ # mv /root/.ssh/id_rsa.pub rtunnel_nopwd.pub
Next, create the tunnel using AutoSSH to maintain a long-term connection.
server:~ # apt-get install autossh
server:~ # vi /etc/systemd/system/autossh-tunnel.service
[Unit]
Description=AutoSSH tunnel between frostburg.SGK and www.SGK
After=network-online.target
[Service]
Environment="AUTOSSH_GATETIME=0"
ExecStart=/bin/autossh -N -M 0 -o "ServerAliveInterval 30" -o "ServerAliveCountMax 3" -i /root/.ssh/rtunnel_nopwd -R 4242:localhost:22 username@endpoint
[Install]
WantedBy=multi-user.target
server:~ # systemctl daemon-reload
server:~ # systemctl start autossh-tunnel.service
server:~ # systemctl enable autossh-tunnel.service
At this point the SSH tunnel is operational. Let’s make things a little easier for the user by storing most of the config options in an SSH config file.
endpoint:~ # su - username
endpoint:~ % vi /home/username/.ssh/config
Host server
Hostname localhost
User username
Port 4242
Now, when we execute ssh server
, it is equivalent to the command
ssh -p 4242 username@localhost
, much easier to remember.
It’s time to test everything out. Starting from the client, you should now be able to login to the server via the endpoint.
client:~ % ssh username@endpoint
endpoint:~ % ssh server
server:~ %
Xeon Phi Kernel Module
It appears that Linux kernel version 4.19.181 included with Debian 10.9 already has some sort of in-tree kernel support for these Xeon Phi coprocessor cards as seen in the final lines of the following diagnostic output. Also note that the card allocated an 8GB PCIe MMIO region, indicating that the 64-bit BAR setting in the BIOS is working as intended.
root@frostburg:~ # lspci | grep -i Co-processor
02:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 5100 series (rev 11)
root@frostburg:~ # lspci -s 02:00.0 -vv
02:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 5100 series (rev 11)
<snip>
Region 0: Memory at 21c00000000 (64-bit, prefetchable) [size=8G]
<snip>
Kernel driver in use: mic
Kernel modules: mic_host
However, since the Intel manuals are plastered with warnings about using exact, sanctioned combinations of kernel module, MPSS software, and Phi firmware, I decided to avoid the kernel module included with the system and instead attempt porting the kernel module source code included with MPSS onto a newer Linux kernel. Once I have everything operational and understand how it should work, then I can try the open-source driver.
I have updated the Intel kernel driver to work with newer Linux kernels. My
work is based upon the kernel source included with MPSS 3.8.6, the latest/last
release from Intel. Since the Xeon Phi x100 series is EOL, I don’t think Intel
intends to release any more versions of MPSS. Check README.md
in my
xeon-phi-kernel-module
git repo for up-to-date information regarding kernel version compatibility.
Before compiling the kernel module, verify that relevant kernel headers are installed.
% uname -a
Linux frostburg 4.19.0-16-amd64 #1 SMP Debian 4.19.181-1 (2021-03-19) x86_64 GNU/Linux
% dpkg -l | grep linux-header
ii linux-headers-4.19.0-16-amd64 4.19.181-1 amd64 Header files for Linux 4.19.0-16-amd64
ii linux-headers-4.19.0-16-common 4.19.181-1 all Common header files for Linux 4.19.0-16
ii linux-headers-amd64 4.19+105+deb10u11 amd64 Header files for Linux amd64 configuration (meta-package)
Download and compile my updated version of the Intel kernel driver. Sample compilation output is included below.
% git clone git://git.subgeniuskitty.com/xeon-phi-kernel-module/
% cd xeon-phi-kernel-module/
% make clean all
make -C /lib/modules/4.19.0-16-amd64/build M=xeon-phi-kernel-module modules \
INSTALL_MOD_PATH=
make[1]: Entering directory '/usr/src/linux-headers-4.19.0-16-amd64'
CC [M] xeon-phi-kernel-module/dma/mic_dma_lib.o
CC [M] xeon-phi-kernel-module/dma/mic_dma_md.o
CC [M] xeon-phi-kernel-module/host/acptboot.o
CC [M] xeon-phi-kernel-module/host/ioctl.o
CC [M] xeon-phi-kernel-module/host/linpm.o
CC [M] xeon-phi-kernel-module/host/linpsmi.o
CC [M] xeon-phi-kernel-module/host/linscif_host.o
CC [M] xeon-phi-kernel-module/host/linsysfs.o
CC [M] xeon-phi-kernel-module/host/linux.o
CC [M] xeon-phi-kernel-module/host/linvcons.o
CC [M] xeon-phi-kernel-module/host/linvnet.o
CC [M] xeon-phi-kernel-module/host/micpsmi.o
CC [M] xeon-phi-kernel-module/host/micscif_pm.o
CC [M] xeon-phi-kernel-module/host/pm_ioctl.o
CC [M] xeon-phi-kernel-module/host/pm_pcstate.o
CC [M] xeon-phi-kernel-module/host/tools_support.o
CC [M] xeon-phi-kernel-module/host/uos_download.o
CC [M] xeon-phi-kernel-module/host/vhost/mic_vhost.o
CC [M] xeon-phi-kernel-module/host/vhost/mic_blk.o
CC [M] xeon-phi-kernel-module/host/vmcore.o
CC [M] xeon-phi-kernel-module/micscif/micscif_api.o
CC [M] xeon-phi-kernel-module/micscif/micscif_debug.o
CC [M] xeon-phi-kernel-module/micscif/micscif_fd.o
CC [M] xeon-phi-kernel-module/micscif/micscif_intr.o
CC [M] xeon-phi-kernel-module/micscif/micscif_nm.o
CC [M] xeon-phi-kernel-module/micscif/micscif_nodeqp.o
CC [M] xeon-phi-kernel-module/micscif/micscif_ports.o
CC [M] xeon-phi-kernel-module/micscif/micscif_rb.o
CC [M] xeon-phi-kernel-module/micscif/micscif_rma_dma.o
CC [M] xeon-phi-kernel-module/micscif/micscif_rma_list.o
CC [M] xeon-phi-kernel-module/micscif/micscif_rma.o
CC [M] xeon-phi-kernel-module/micscif/micscif_select.o
CC [M] xeon-phi-kernel-module/micscif/micscif_smpt.o
CC [M] xeon-phi-kernel-module/micscif/micscif_sysfs.o
CC [M] xeon-phi-kernel-module/micscif/micscif_va_gen.o
CC [M] xeon-phi-kernel-module/micscif/micscif_va_node.o
CC [M] xeon-phi-kernel-module/vnet/micveth_dma.o
CC [M] xeon-phi-kernel-module/vnet/micveth_param.o
LD [M] xeon-phi-kernel-module/mic.o
Building modules, stage 2.
MODPOST 1 modules
CC xeon-phi-kernel-module/mic.mod.o
LD [M] xeon-phi-kernel-module/mic.ko
make[1]: Leaving directory '/usr/src/linux-headers-4.19.0-16-amd64'
At this point you can manually load/install the new kernel module (mic.ko
)
which is found in the current directory, or execute make install
. The latter
command also installs the SCIF header file, as well as putting some config files
under /usr/local/etc/
. The information in those config files won’t be picked
up by the system (we will install configs in the correct location in a moment),
but it is useful as a reference. Sample make install
output is shown below.
# make install
make -C /lib/modules/4.19.0-16-amd64/build M=/home/ataylor/xeon-phi-kernel-module modules_install \
INSTALL_MOD_PATH=
make[1]: Entering directory '/usr/src/linux-headers-4.19.0-16-amd64'
INSTALL /home/ataylor/xeon-phi-kernel-module/mic.ko
DEPMOD 4.19.0-16-amd64
Warning: modules_install: missing 'System.map' file. Skipping depmod.
make[1]: Leaving directory '/usr/src/linux-headers-4.19.0-16-amd64'
install -d /usr/local/etc/sysconfig/modules
install mic.modules /usr/local/etc/sysconfig/modules
install -d /usr/local/etc/modprobe.d
install -m644 mic.conf /usr/local/etc/modprobe.d
install -d /usr/local/etc/udev/rules.d
install -m644 udev-mic.rules /usr/local/etc/udev/rules.d/50-udev-mic.rules
install -d /lib/modules/4.19.0-16-amd64
install -m644 Module.symvers /lib/modules/4.19.0-16-amd64/scif.symvers
install -d /usr/src/linux-headers-4.19.0-16-amd64/include/modules
install -m644 include/scif.h /usr/src/linux-headers-4.19.0-16-amd64/include/modules
Create the file /etc/modprobe.d/mic.conf
with the following contents,
intended to accomplish two things. First, blacklist the in-tree MIC kernel
module that shipped with our kernel, including all associated modules, and
second, configure the Intel MIC kernel module which we just built and installed.
The options shown are drawn from the defaults in
/usr/local/etc/modprobe.d/mic.conf
.
# Blacklist the in-tree kernel modules associated with the Knight's Corner Xeon
# Phi so that we can load the Intel kernel module.
# These two modules depend on the various bus modules that follow.
blacklist mic_host
blacklist mic_x100_dma
blacklist cosm_bus
blacklist vop_bus
blacklist scif_bus
blacklist mic_bus
# ^^^------ Blacklisting the in-tree MIC kernel module.
# ==============================================================================
# vvv------ Configuring the Intel MIC kernel module.
# The following options apply to the Intel Many Integrated Core (MIC) driver.
# Unless otherwise noted, the value "1" enables the feature and "0" disables
# it.
#
# Option: p2p
# Description: Enables use of SCIF interface peer to peer communication.
#
# Option: p2p_proxy
# Description: Enables use of SCIF P2P Proxy DMA which converts DMA
# reads into DMA writes for performance on certain Intel
# platforms.
#
# Option: reg_cache
# Description: Enables SCIF Registration Caching.
#
# Option: huge_page
# Description: Enables SCIF Huge Page Support.
#
# Option: watchdog
# Description: Enables SCIF watchdog for Lost Node detection.
#
# Option: watchdog_auto_reboot
# Description: Configures behavior of MIC host driver upon detection of a lost
# node. This option is a nop if watchdog=0. Setting value "1"
# allows host driver to reboot node back to "online" state,
# whereas value "0" only allows the host driver to reset the node
# back to "ready" state, leaving the user responsible for rebooting
# the node (or not).
#
# Option: crash_dump
# Description: Enables uOS Kernel Crash Dump Captures.
#
# Option: ulimit
# Description: Enables ulimit checks on max locked memory for scif_register.
#
options mic reg_cache=1 huge_page=1 watchdog=1 watchdog_auto_reboot=1 crash_dump=1 p2p=1 p2p_proxy=1 ulimit=0
options mic_host reg_cache=1 huge_page=1 watchdog=1 watchdog_auto_reboot=1 crash_dump=1 p2p=1 p2p_proxy=1 ulimit=0
Finally, add the line mic
to the file /etc/modules-load.d/modules.conf
,
instructing the system to load this kernel module on boot, then run depmod
to
ensure the system is aware of the new kernel module, followed by a reboot to
verify everything works.
After the system comes back up, verify that the module loaded with your desired
options using the systool
command, sample output below.
# systool -v -m mic
Module = "mic"
Attributes:
coresize = "741376"
initsize = "0"
initstate = "live"
refcnt = "0"
taint = "OE"
uevent = <store method only>
Parameters:
crash_dump = "Y"
huge_page = "Y"
msi = "Y"
p2p_proxy = "Y"
p2p = "Y"
pm_qos_cpu_dma_lat = "-1"
psmi = "N"
ramoops_count = "4"
reg_cache = "Y"
ulimit = "N"
vnet = "dma"
vnet_addr = "0"
vnet_num_buffers = "62"
watchdog_auto_reboot= "Y"
watchdog = "Y"
Sections:
<snip>