---
title: Dissecting m8trix
date: 2024-07-31
---
[m8trix](https://www.pouet.net/prod.php?which=63126)
by
[HellMood](https://www.pouet.net/user.php?who=97586)
is one of my favorite demos.
It packs a pretty cool Matrix-style effect in only 8 bytes:
animated gif (epilepsy warning)
C:\M8TRIX>debug M8TRIX.COM
-U ; disassemble the beginning of the program
073D:0100 C41C LES BX,[SI]
073D:0102 9F LAHF
073D:0103 AB STOSW
073D:0104 47 INC DI
073D:0105 47 INC DI
073D:0106 EBF9 JMP 0101
-U 101 ; disassemble the loop body
073D:0101 1C9F SBB AL,9F
073D:0103 AB STOSW
073D:0104 47 INC DI
073D:0105 47 INC DI
073D:0106 EBF9 JMP 0101
-R ; look at the registers
AX=FFFF BX=0000 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000
DS=073D ES=073D SS=073D CS=073D IP=0100 NV UP EI PL ZR NA PE NC
073D:0100 C41C LES BX,[SI] DS:0000=20CD
Let's step through this.
### LES BX,[SI]
AX=FFFF BX=0000 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000
DS=073D ES=073D SS=073D CS=073D IP=0100 NV UP EI PL ZR NA PE NC
073D:0100 C41C LES BX,[SI] DS:0000=20CD
-T ; single step and show register state
AX=FFFF BX=20CD CX=0008 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000
DS=073D ES=9FFF SS=073D CS=073D IP=0102 NV UP EI PL ZR NA PE NC
`LES` loads a far pointer from memory.
The first two bytes of `[SI]` will be loaded into `BX`,
and the next two bytes will be loaded into `ES`.
We're implicitly using the `DS` segment here, which is where DOS loaded our program into.
To be more exact --
our program was loaded into `DS:0100`,
whereas `DS:0000` (which `[SI]` points at) contains the
[Program Segment Prefix](https://en.wikipedia.org/wiki/Program_Segment_Prefix).
Let's take a look at it:
-d 0000
073D:0000 CD 20 FF 9F 00 EA FF FF-AD DE BD 1D 94 01 00 00 . ..............
[...]
-u 0000
073D:0000 CD20 INT 20
073D:0002 FF9F00EA CALL FAR [BX+EA00]
[...]
The first two bytes always contain `INT 20`, the instruction that quits your program.
This means that you can quit your program by jumping to `CS:0000` (`CS` = `DS` = `SP`).
DOS also ensures that the word on top of the stack is `0000`, so you can quit with a `RET`.
Nifty.
It also means that `BX` will always be set to `20CD`, but we don't actually really care about that.
The next two bytes point to the segment of the first free byte in memory.
So, by loading them into `ES`, we make it point to the first free area in memory.
On most systems that will be `9FFF`.
This is very convenient, as the
[mode 13](https://en.wikipedia.org/wiki/Mode_13h)
framebuffer begins at `A0000`, or `9FFF:0010`.
This is a
[well known sizecoding trick](http://www.sizecoding.org/wiki/General_Coding_Tricks#A_smaller_way_to_point_to_Mode_13.27s_screen_segment).
...except mode 13 is a graphic mode.
We're in mode 3[^mode3],
a text mode,
and the text buffer is located at `B800`,
completely out of reach of `ES`.
What?
[^mode3]: [`MOV AH, 0F; INT 10`](https://en.wikipedia.org/wiki/INT_10H), and look at the registers. `AL` is the current mode.
Well, DEBUG fooled us.
When you start a program under DOS, `SI=0100`.
[Usually.](https://www.fysnet.net/yourhelp.htm)
However, for whatever reason, DEBUG zeroes it out instead.
You can fix it by running `RSI 0100`[^rsi] before the first instruction.
This is also why the page I've linked to uses `[BX]`, as you can count on it actually being zero.
[^rsi]: No, `RSI` doesn't stand for the 64-bit register. `R` is the register command, which accepts `SI` as the argument.
But let's get back to m8trix.
If `SI=0100`, then `[SI]` points to the beginning of our program!
```
-RSI 0100
-R
AX=FFFF BX=0000 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
DS=073D ES=073D SS=073D CS=073D IP=0100 NV UP EI PL ZR NA PE NC
073D:0100 C41C LES BX,[SI] DS:0100=1CC4
-T
AX=FFFF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
DS=073D ES=AB9F SS=073D CS=073D IP=0102 NV UP EI PL ZR NA PE NC
073D:0102 9F LAHF
-U 100
073D:0100 C41C LES BX,[SI]
073D:0102 9F LAHF
073D:0103 AB STOSW
```
As you can see, this means that `BX=1CC4` (the `LES` instruction itself), and `ES=AB9F`.
This means that `ES` spans `AB9F0-BB9F0`, which includes the entire text buffer!
### LAHF
AX=FFFF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
DS=073D ES=AB9F SS=073D CS=073D IP=0102 NV UP EI PL ZR NA PE NC
073D:0102 9F LAHF
-t
AX=46FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL ZR NA PE NC
`LAHF` is pretty straightforward, it just loads the top byte of `FLAGS` into `AH`.
Except, once again, `DEBUG` doesn't set the `FLAGS` register correctly.
If we were to run m8trix
[outside of DEBUG](https://www.fysnet.net/yourhelp.htm),
the top byte of flags would be `02`, and thus this instruction would set `AH=02`.
This can be fixed in the debugger by running `RAX 02FF`.
### STOSW
-rax 02FF
-rax 02FF
-r
AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL ZR NA PE NC
073D:0103 AB STOSW
-t
AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0002
DS=073D ES=AB9F SS=073D CS=073D IP=0104 NV UP EI PL ZR NA PE NC
`STOSW` -- "Store (word) string" -- is a bit more complex.
It writes the word at `AX` to `ES:DI`, and increments[^df] `DI` by two --
the amount of bytes written.
This instruction will be run over and over again, with `DI` taking on
every even value and overflowing every once in a while,
overwriting everything in `ES` -- including the text buffer -- over and over again.
Each character in the text buffer is represented by a word,
so each `STOSW` writes a complete character to the screen.
`AH=02` sets the color to dark green,
and `AL` (which changes each iteration) chooses the character
[^df]: If the direction flag was set, it would instead decrement it.
### skipping a column, misaligned jump
AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0002
DS=073D ES=AB9F SS=073D CS=073D IP=0104 NV UP EI PL ZR NA PE NC
073D:0104 47 INC DI
-t
AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0003
DS=073D ES=AB9F SS=073D CS=073D IP=0105 NV UP EI PL NZ NA PE NC
073D:0105 47 INC DI
-t
AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
DS=073D ES=AB9F SS=073D CS=073D IP=0106 NV UP EI PL NZ NA PO NC
073D:0106 EBF9 JMP 0101
-t
AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
DS=073D ES=AB9F SS=073D CS=073D IP=0101 NV UP EI PL NZ NA PO NC
073D:0101 1C9F SBB AL,9F
We don't want the columns to be packed too tightly together,
so we skip every other character by adding two bytes to `DI`.
We then jump to `0101`, uncovering a hidden `SBB`.
### misaligned jump, SBB
AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
DS=073D ES=AB9F SS=073D CS=073D IP=0101 NV UP EI PL NZ NA PO NC
073D:0101 1C9F SBB AL,9F
-t
AX=0260 BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL NZ NA PE NC
This is the last instruction, and it's the one that modifies `AL` to animate the character.
It subtracts `9F` from `AL` with borrow, which is pretty much the grade-school approach.
That is --
if it underflows,
it will "borrow" a bit from the next byte by setting the carry flag.
The next `SBB` will see that the carry flag is set,
subtract an additional `1`,
and unset the carry flag (unless it also underflowed).
Let's see that in practice:
-rax 028F
-r
AX=028F BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0008
DS=073D ES=AB9F SS=073D CS=073D IP=0101 NV UP EI PL NZ NA PO NC
073D:0101 1C9F SBB AL,9F
-t
AX=02F0 BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0008
DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI NG NZ NA PE CY
-rip 0101 ; i don't care about the rest of the loop, just run the SBB again
-t
AX=0250 BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0008
DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL NZ AC PE NC
Notice how the second `SBB` subtracted `A0` instead of `9F` because of the carry flag.
Why does that matter?
Let's imagine this was a regular `SUB` instead, without a borrow.
`9F` is odd (coprime to `100`),
so it would take `100` iterations for `AL` to loop around
(remember, we're working with hexadecimal here).
The loop runs for `10000/2=8000` iterations before `DI` repeats,
and `8000` is divisible by `100`,
so each pass would have the exact same `AL` values for each character.
Instead of an animation we'd get a much less impressive static screen.
Instead, `AL` repeats every `55` (decimal 85) `SBB` calls,
which is coprime to `100`,
so the `AL` values will differ from pass to pass.
There's probably a way to determine the period by hand but I just used Python.
Not all operands work for this, but `9F` seems to be one of the good ones.
To quote the author,
"without CR flag, there would be no animation :)".
### ending remarks
I think I've explained every aspect of how m8trix works by now.
I don't think I need to tell you how brilliant it is.
Notice how the third byte has three different meanings!
At first it's read as the low byte of the segment offset,
then it's part of the `LAHF` instruction,
and then it's the operand for the `SBB`.
`STOSW` is not only the perfect instruction for writing characters in text mode,
it also works as the high byte of the segment offset that you need to write those
characters in the first place.
Everything fits together so nicely :)
## m7trix
Soon after m8trix was published,
several people tried coming up with ideas to shrink it down even more.
What follows is the final version HellMood published:
C:\M8TRIX>debug M7TRIX.COM
-U
073D:0100 C41C LES BX,[SI]
073D:0102 9F LAHF
073D:0103 AB STOSW
073D:0104 91 XCHG AX,CX
073D:0105 EBFA JMP 0101
-U 101
073D:0101 1C9F SBB AL,9F
073D:0103 AB STOSW
073D:0104 91 XCHG AX,CX
073D:0105 EBFA JMP 0101
Not only is this version smaller, it also looks better, as it clears the screen!
It's also simple enough that I won't bother tracing through it again.
In short -- instead of skipping over every other column,
we swap `AX` and `CX` back and forth.
Both are running the same character animation, but, as `CH=00`,
every other column is rendered as black or black,
so the characters are invisible.
This takes care both of skipping columns AND clearing the screen.
The character cycle is apparently[^apparently] different
because the carry flag gets reused between odd and even columns,
but the period still works out to be 85 --
which I find interesting but I don't really feel like researching why that is.
[^apparently]: The Python script I'm using for testing says so, but I can't really tell if that's true by just looking at the output.
## bonus: simplified version
This is a slightly modified version
that works under DEBUG and doesn't use misaligned jumps.
It's easy to experiment with
as you can just load it into DEBUG,
use the assembler to change a single instruction,
and see what happens.
```
073D:0100 BB9FAB MOV BX,AB9F
073D:0103 8EC3 MOV ES,BX
073D:0105 B402 MOV AH,02
073D:0107 AB STOSW
073D:0108 47 INC DI
073D:0109 47 INC DI
073D:010A 1C9F SBB AL,9F
073D:010C EBF9 JMP 0107
```