UMX virtual machine @ macos9
Intro
This is a follow-up to the First Computer ‘Challenge’ post. At the end of December I thought it will be fun to program something small as a recreational activity on the computer of the period I had my first PC; inspired by jcs I waited for the Clamshell iBook and thought what would be something unusual to make.
After my 2021 AoC journey I became tiny bit less scared about approaching ICFPC 2006 so my choice about task to tackle was made. Working through actual puzzles still seems quite complex and time consuming but implementing the virtual machine described there seemed like fun; also making an interpreter for one of the AoC2021 days was surprisingly nice so there is that.
Hardware
The clamshell iBook I got is in very good condition; I managed to “rock” the 20 years old battery so it actually holds the charge for an hour or two (depends on load), and the 12.1" 800x600 screen is still quite comfortable to work with. We got long new years holidays here and I got out to chill in the small village up north, it was actually quite a good experience in absense of bunch of everyday distractions.
It has an IBM PowerPC 750cx (G3) CPU - same 750 series that is being used on Mars, clocked at 366MHz. USB 1.1, 100mbit ethernet and 24X CD-ROM proved to be invaluable for transferring data in and out, for I do not have an AirPort card for this model and even if I had it does not support modern wireless network standards.
One of the things this laptop has tenfold better than modern macbooks is a trackpad. It is tiny and it is usable; I would gladly give up 2 finger scroll on my M1 to have this cursed touch thing at least two times smaller.
Another killer-feature is the innate handle. For a laptop of this weight (3 kg) and size having something to carry it with is awesome. I wish modern “heavyweights” had something like that.
Some time later I swapped HDD for the CF card via adapter, the pain of replacing the hard drive is substantial as you have to take the whole laptop apart, including display/lid.
But despite that laptop is absolutely amazing repair-wise: everything is modular, there is no “heat”-based adhesive anywhere, including battery - yes, you literally can take the battery module apart without breaking it and it even has plain 18650 li-ion cells inside! Needless to say I am super impressed by the whole thing, especially in comparison to modern apple hardware that has everything glued/soldered-down.
Some things were quite common to solder back then though as well, namely RAM: this machine has 64mb on motherboard and another 64mb as a SODIMM module. Somewhere in this century I was promised to receive a 128mb module for upgrade but since it was sent via regular mail I bear no hope.
Programming Environment
Brief research showed that the most popular and effective programming environment for the MacOS 9 was a Metrowerks CodeWarrior suite that allowed to use C, C++ and some more. The last version that was available for the “classic” mac os is CodeWarrior Pro 8.3; after that even with PowerPC architecture support they moved onto the MacOS X.
Even though it was released in 2002 the C99 standard wasn’t supported fully by MW compiler and some #pragma tricks are needed to make things less decrepit.
Overall apart from being unable to declare variables in control structures (most prominently for
loops) I was not hindered much. I had to dig through the official compiler documentation in search of settings that weren’t available in common project/language settings yet, particularly #pragma gcc_extensions on
- it unlocked some useful improvements, namely automatic struct variables initialization with non-const values, void *
arithmetic and so on.
There is no terminal or unix pipes or even preemptive multitasking in MacOS9 - some of my unsuccessful filesystem interaction attempts or accidental infinite loops could (and did) completely hung the system! Last time I’ve seen such fragile OS was highschool programming classes; we had Windows 98 and Turbo Pascal and stuff like that, but most importantly it was possible to kill whole system with a short inline assembly code:
cli
@:
jmp @
That rendered the whole operating system completely unusable, nothing but hard reset would help; teachers weren’t very knowledgeable in topic so it felt fun to mess with things like that.
Version Control
I wanted to do some proper version control - not only local, but saving stuff remotely as well. CW8 has some innate support for CVS but it requires some external plugins and programs. I started tinkering with cvs server on my “base” iMac (though in retrospective I should’ve just do that on any OpenBSD machine) but then on an unrelated occurence one of the #cyberpals IRC channel denizens @jjuran suggested me an absolutely awesome and handy program MacRelix (of his authorship no less). It is an UNIX-like environment that runs on classic Macs, and among commands and utilities included it has old but totally functional Git!
So the repository for my UMIX implementation was fully formed and worked through on that old mac; I’ve used an intermediate repository shared on r/w via standard git:// protocol on my mac and pushed it afterwards to the “public” origin. Important detail is that since repository was created with the v1.x git any commit from v2.x can instantly and irreversibly break it for the older version1.
Quirks
One of first peculiarities I’ve been reminded of is byte order aka endianness. Macppc has it big-endian, while most of modern architectures (ARM64, AMD64) are little endian by default2; the virtual machine data is essentially big-endian and needs to be processed accordingly on aforementioned platforms.
Big endian is also being called “network byte ordering” hence the naming of handy POSIX functions ntohl/ntohs/htonl/htons
(“network-to-host-long”, “host-to-network-long” etc.) Since they’re POSIX CodeWarrior compiler does not support them - that and whole UNIX deal comes sometime later for Macs with an OS X arrival. I honestly didn’t bother with endianness while writing and trying out the program at all and was met with “broken” behavior when first tried to run the VM on my desktop. Ideally ntohl()
does nothing if host is already big endian but since these functions are not available at all I simply wrapped their usage into compiler-dependant macro3.
The biggest hurdle I’ve faced though ended up quite simple in nature but still caught me by surprise. So the OS-9 uses \r
line ending for text files; modern Unix and derivatives go with \n
, and Windows (supposedly) utilizes \r\n
. I accomodated for that when started using a repository to have Unix EOL by default in source code (thankfully CW IDE supports that properly). All that didn’t trigger any flags in my mind for the input file processing though and at some point I’ve stumbled upon (what I thought at the moment) one of the strangest bugs: memory corruption.
Once instructions were complete I’ve tried some of them on OS-9 and it worked just fine, even fast tests on “big” openbsd/mac machines went alright. But the sandmark.umz
(VM implementation validator/benchmark) kept dying on me on OS9 while being completely fine on any other machine I have. My first thought was that I somehow corrupt the heap with overlapping chunks or badly formed pointer arithmetics but running whole deal with memory sanitizers didn’t reveal anything of sorts4. Here is tiny chunk of data as seen with xxd -b
:
00000012: 00000000 00110101 11010000 00000000 00000000 00001101 .5....
And that’s what I got upon bytecode execution:
00000000 00110101 11010000 00000000 00000000 00001010
Notice how last 1101 morphs into 1010; that was before I tinkered with editor and line endings so honestly I didn’t think of that at this point. I rechecked the file for being corrupt, rechecked my interpreter, rechecked byte order twice but couldn’t figure. I was knee-deep into per-bit processing for the instructions/operands parsing and I guess that framed my thought process a lot.
Several days later I showed my code to awesome lads on #lobsters-advent and (I think) on a completely unrelated note got tipped about EOL and it suddenly clicked with me: I only so “corrupted” data as 1101->1010 or 0xD->0xA but completely forgot that those are codes for \r and \n! Instantly it all came along: I opened file with fopen(filename, "r")
and since the default mode for fopen()
is text it replaces what it thought to be \r to \n on reading. And the bug didn’t reproduce on any other machine because it is all Unix or derivatives and it \n there already.
In the end whole problem was solved by setting the reading mode argument to fread()
into “rb” (read, binary). No more “corruption”, yay :)
Another interesting part about OS9 is the executable files format and memory layout; setting up stack size is fairly common nowaday but with such an old system it is necessary to set up a heap size as well. Quite an experience as I had to try out several values to run the VM at all with different input files.
Result
Overall I had tons of fun and even figuring out the bugs was amusing. Working outside of usual “comfort zone” was, eh, for a lack of better word - “enlightening” :D Since the machine is magnitudes slower than any of my other computers it is quite neccessary to “scale down” the thought process in general and roughly speaking accomodate to kilobytes instead of megabytes. Another thing I realized is how much I got used to standard Unix environment with pipes and POSIX functions; it is quite difficult to wrap ones head around completely different operating system approach.
I will most certainly try to implement some more stuff there!
Source code: Repository
-
I’ve briefly tried to look for any options for git commit/init that would allow v2 to be backward compatible but sadly found nothing. ↩︎
-
“By default” means that it is possible to switch the the mode CPU on architectures such as ARM64, Power, SPARC and some more; interestingly enough x86-64 is LE-only. ↩︎
-
Looking back it is better to validate not only compiler but the architecture properly as well. Checking the endianness at the runtime/start is quite simple as well. ↩︎
-
I spent couple of evenings trying to debug that on OS9 but staring at endless walls of binary and hexadecimal numbers wasn’t fruitful after all so I shamefully went to use modern clang/lldb on modern computers. ↩︎