--- title: Dissecting m8trix date: 2024-07-31 --- [m8trix](https://www.pouet.net/prod.php?which=63126) by [HellMood](https://www.pouet.net/user.php?who=97586) is one of my favorite demos. It packs a pretty cool Matrix-style effect in only 8 bytes:
animated gif (epilepsy warning)
The author even provided the source with some comments: ```asm org 100h S: les bx,[si] ; sets ES to the screen, assume si = 0x100 ; 0x101 is SBB AL,9F and changes the char ; without CR flag, there would be ; no animation ;) lahf ; gets 0x02 (green) in the first run ; afterwards, it is not called again ; because of alignment ;) stosw ; print the green char ... ; (is also 0xAB9F and works as segment) inc di ; and skip one row inc di ; jmp short S+1 ; repeat on 0x101 ``` ...yeah, I didn't really get it at first either. Let's try to actually understand how it works (and learn some stuff about DOS along the way). Note that **I'll be using hexadecimal numbers "by default"** (without 0x) throughout this article to be consistent with DEBUG's output. ## DEBUG The only tool I'll be using on DOS's side will be DEBUG. It's a [delightful](https://tilde.zone/@dzwdz/112866746891156279) little tool that ships with MS-DOS. I've personally used the FreeDOS version under DOSBox, as that's what I had handy. There's builtin help if you type in `?`, you can also check out [this more in-depth guide](https://montcs.bloomu.edu/Information/LowLevel/DOS-Debug.html), or [this video of someone using it to assemble new binaries](https://www.youtube.com/watch?v=zc-W8xq7L5Q). There's a small issue, though. m8trix doesn't actually work as-is under DEBUG, for reasons I'll explain later. ## a bad explanation of segmentation If you're a bit rusty on how real mode segmentation works, then here's a quick reminder. There are a few 16-bit segment registers (`CS`, `DS`, `SS`, `ES`). When you reference memory in real mode you always[^ithink] use one of those registers, even if it's implicit. If you reference `ES:BX`, the real address this maps to is computed as `ES * 0x10 + BX`. This means that there are multiple ways to reference one physical memory location (even if that is only slightly relevant here). As another example, `B800:1234` points to `B9234`. [^ithink]: At least I think so, but I'm not sure. ## the first look My comments are prefixed with a semicolon. As mentioned, all numbers shown are in hexadecimal.
C:\M8TRIX>debug M8TRIX.COM
-U ; disassemble the beginning of the program
073D:0100 C41C              LES     BX,[SI]
073D:0102 9F                LAHF
073D:0103 AB                STOSW
073D:0104 47                INC     DI
073D:0105 47                INC     DI
073D:0106 EBF9              JMP     0101
-U 101 ; disassemble the loop body
073D:0101 1C9F              SBB     AL,9F
073D:0103 AB                STOSW
073D:0104 47                INC     DI
073D:0105 47                INC     DI
073D:0106 EBF9              JMP     0101
-R ; look at the registers
AX=FFFF BX=0000 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000
DS=073D ES=073D SS=073D CS=073D IP=0100 NV UP EI PL ZR NA PE NC
073D:0100 C41C              LES     BX,[SI]                        DS:0000=20CD
Let's step through this. ### LES BX,[SI]
AX=FFFF BX=0000 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000
DS=073D ES=073D SS=073D CS=073D IP=0100 NV UP EI PL ZR NA PE NC
073D:0100 C41C              LES     BX,[SI]                        DS:0000=20CD
-T ; single step and show register state
AX=FFFF BX=20CD CX=0008 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000
DS=073D ES=9FFF SS=073D CS=073D IP=0102 NV UP EI PL ZR NA PE NC
`LES` loads a far pointer from memory. The first two bytes of `[SI]` will be loaded into `BX`, and the next two bytes will be loaded into `ES`. We're implicitly using the `DS` segment here, which is where DOS loaded our program into. To be more exact -- our program was loaded into `DS:0100`, whereas `DS:0000` (which `[SI]` points at) contains the [Program Segment Prefix](https://en.wikipedia.org/wiki/Program_Segment_Prefix). Let's take a look at it:
-d 0000
073D:0000  CD 20 FF 9F 00 EA FF FF-AD DE BD 1D 94 01 00 00 . ..............
[...]
-u 0000
073D:0000 CD20              INT     20
073D:0002 FF9F00EA          CALL    FAR [BX+EA00]
[...]
The first two bytes always contain `INT 20`, the instruction that quits your program. This means that you can quit your program by jumping to `CS:0000` (`CS` = `DS` = `SP`). DOS also ensures that the word on top of the stack is `0000`, so you can quit with a `RET`. Nifty. It also means that `BX` will always be set to `20CD`, but we don't actually really care about that. The next two bytes point to the segment of the first free byte in memory. So, by loading them into `ES`, we make it point to the first free area in memory. On most systems that will be `9FFF`. This is very convenient, as the [mode 13](https://en.wikipedia.org/wiki/Mode_13h) framebuffer begins at `A0000`, or `9FFF:0010`. This is a [well known sizecoding trick](http://www.sizecoding.org/wiki/General_Coding_Tricks#A_smaller_way_to_point_to_Mode_13.27s_screen_segment). ...except mode 13 is a graphic mode. We're in mode 3[^mode3], a text mode, and the text buffer is located at `B800`, completely out of reach of `ES`. What? [^mode3]: [`MOV AH, 0F; INT 10`](https://en.wikipedia.org/wiki/INT_10H), and look at the registers. `AL` is the current mode. Well, DEBUG fooled us. When you start a program under DOS, `SI=0100`. [Usually.](https://www.fysnet.net/yourhelp.htm) However, for whatever reason, DEBUG zeroes it out instead. You can fix it by running `RSI 0100`[^rsi] before the first instruction. This is also why the page I've linked to uses `[BX]`, as you can count on it actually being zero. [^rsi]: No, `RSI` doesn't stand for the 64-bit register. `R` is the register command, which accepts `SI` as the argument. But let's get back to m8trix. If `SI=0100`, then `[SI]` points to the beginning of our program! ``` -RSI 0100 -R AX=FFFF BX=0000 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000 DS=073D ES=073D SS=073D CS=073D IP=0100 NV UP EI PL ZR NA PE NC 073D:0100 C41C LES BX,[SI] DS:0100=1CC4 -T AX=FFFF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000 DS=073D ES=AB9F SS=073D CS=073D IP=0102 NV UP EI PL ZR NA PE NC 073D:0102 9F LAHF -U 100 073D:0100 C41C LES BX,[SI] 073D:0102 9F LAHF 073D:0103 AB STOSW ``` As you can see, this means that `BX=1CC4` (the `LES` instruction itself), and `ES=AB9F`. This means that `ES` spans `AB9F0-BB9F0`, which includes the entire text buffer! ### LAHF
AX=FFFF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
DS=073D ES=AB9F SS=073D CS=073D IP=0102 NV UP EI PL ZR NA PE NC
073D:0102 9F                LAHF
-t
AX=46FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL ZR NA PE NC
`LAHF` is pretty straightforward, it just loads the top byte of `FLAGS` into `AH`. Except, once again, `DEBUG` doesn't set the `FLAGS` register correctly. If we were to run m8trix [outside of DEBUG](https://www.fysnet.net/yourhelp.htm), the top byte of flags would be `02`, and thus this instruction would set `AH=02`. This can be fixed in the debugger by running `RAX 02FF`. ### STOSW
-rax 02FF
-rax 02FF
-r
AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL ZR NA PE NC
073D:0103 AB                STOSW
-t
AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0002
DS=073D ES=AB9F SS=073D CS=073D IP=0104 NV UP EI PL ZR NA PE NC
`STOSW` -- "Store (word) string" -- is a bit more complex. It writes the word at `AX` to `ES:DI`, and increments[^df] `DI` by two -- the amount of bytes written. This instruction will be run over and over again, with `DI` taking on every even value and overflowing every once in a while, overwriting everything in `ES` -- including the text buffer -- over and over again. Each character in the text buffer is represented by a word, so each `STOSW` writes a complete character to the screen. `AH=02` sets the color to dark green, and `AL` (which changes each iteration) chooses the character [^df]: If the direction flag was set, it would instead decrement it. ### skipping a column, misaligned jump
AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0002
DS=073D ES=AB9F SS=073D CS=073D IP=0104 NV UP EI PL ZR NA PE NC
073D:0104 47                INC     DI
-t
AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0003
DS=073D ES=AB9F SS=073D CS=073D IP=0105 NV UP EI PL NZ NA PE NC
073D:0105 47                INC     DI
-t
AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
DS=073D ES=AB9F SS=073D CS=073D IP=0106 NV UP EI PL NZ NA PO NC
073D:0106 EBF9              JMP     0101
-t
AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
DS=073D ES=AB9F SS=073D CS=073D IP=0101 NV UP EI PL NZ NA PO NC
073D:0101 1C9F              SBB     AL,9F
We don't want the columns to be packed too tightly together, so we skip every other character by adding two bytes to `DI`. We then jump to `0101`, uncovering a hidden `SBB`. ### misaligned jump, SBB
AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
DS=073D ES=AB9F SS=073D CS=073D IP=0101 NV UP EI PL NZ NA PO NC
073D:0101 1C9F              SBB     AL,9F
-t
AX=0260 BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL NZ NA PE NC
This is the last instruction, and it's the one that modifies `AL` to animate the character. It subtracts `9F` from `AL` with borrow, which is pretty much the grade-school approach. That is -- if it underflows, it will "borrow" a bit from the next byte by setting the carry flag. The next `SBB` will see that the carry flag is set, subtract an additional `1`, and unset the carry flag (unless it also underflowed). Let's see that in practice:
-rax 028F
-r
AX=028F BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0008
DS=073D ES=AB9F SS=073D CS=073D IP=0101 NV UP EI PL NZ NA PO NC
073D:0101 1C9F              SBB     AL,9F
-t
AX=02F0 BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0008
DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI NG NZ NA PE CY
-rip 0101 ; i don't care about the rest of the loop, just run the SBB again
-t
AX=0250 BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0008
DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL NZ AC PE NC
Notice how the second `SBB` subtracted `A0` instead of `9F` because of the carry flag. Why does that matter? Let's imagine this was a regular `SUB` instead, without a borrow. `9F` is odd (coprime to `100`), so it would take `100` iterations for `AL` to loop around (remember, we're working with hexadecimal here). The loop runs for `10000/2=8000` iterations before `DI` repeats, and `8000` is divisible by `100`, so each pass would have the exact same `AL` values for each character. Instead of an animation we'd get a much less impressive static screen. Instead, `AL` repeats every `55` (decimal 85) `SBB` calls, which is coprime to `100`, so the `AL` values will differ from pass to pass. There's probably a way to determine the period by hand but I just used Python. Not all operands work for this, but `9F` seems to be one of the good ones. To quote the author, "without CR flag, there would be no animation :)". ### ending remarks I think I've explained every aspect of how m8trix works by now. I don't think I need to tell you how brilliant it is. Notice how the third byte has three different meanings! At first it's read as the low byte of the segment offset, then it's part of the `LAHF` instruction, and then it's the operand for the `SBB`. `STOSW` is not only the perfect instruction for writing characters in text mode, it also works as the high byte of the segment offset that you need to write those characters in the first place. Everything fits together so nicely :) ## m7trix Soon after m8trix was published, several people tried coming up with ideas to shrink it down even more. What follows is the final version HellMood published:
C:\M8TRIX>debug M7TRIX.COM
-U
073D:0100 C41C              LES     BX,[SI]
073D:0102 9F                LAHF
073D:0103 AB                STOSW
073D:0104 91                XCHG    AX,CX
073D:0105 EBFA              JMP     0101
-U 101
073D:0101 1C9F              SBB     AL,9F
073D:0103 AB                STOSW
073D:0104 91                XCHG    AX,CX
073D:0105 EBFA              JMP     0101
Not only is this version smaller, it also looks better, as it clears the screen! It's also simple enough that I won't bother tracing through it again. In short -- instead of skipping over every other column, we swap `AX` and `CX` back and forth. Both are running the same character animation, but, as `CH=00`, every other column is rendered as black or black, so the characters are invisible. This takes care both of skipping columns AND clearing the screen. The character cycle is apparently[^apparently] different because the carry flag gets reused between odd and even columns, but the period still works out to be 85 -- which I find interesting but I don't really feel like researching why that is. [^apparently]: The Python script I'm using for testing says so, but I can't really tell if that's true by just looking at the output. ## bonus: simplified version This is a slightly modified version that works under DEBUG and doesn't use misaligned jumps. It's easy to experiment with as you can just load it into DEBUG, use the assembler to change a single instruction, and see what happens. ``` 073D:0100 BB9FAB MOV BX,AB9F 073D:0103 8EC3 MOV ES,BX 073D:0105 B402 MOV AH,02 073D:0107 AB STOSW 073D:0108 47 INC DI 073D:0109 47 INC DI 073D:010A 1C9F SBB AL,9F 073D:010C EBF9 JMP 0107 ```