summary refs log tree commit diff
path: root/src/m8trix.md
diff options
context:
space:
mode:
Diffstat (limited to 'src/m8trix.md')
-rw-r--r--src/m8trix.md363
1 files changed, 363 insertions, 0 deletions
diff --git a/src/m8trix.md b/src/m8trix.md
new file mode 100644
index 0000000..d8b0cd9
--- /dev/null
+++ b/src/m8trix.md
@@ -0,0 +1,363 @@
+---
+title: Dissecting m8trix
+date: 2024-07-31
+---
+
+[m8trix](https://www.pouet.net/prod.php?which=63126)
+by
+[HellMood](https://www.pouet.net/user.php?who=97586)
+is one of my favorite demos.
+It packs a pretty cool Matrix-style effect in only 8 bytes:
+
+<details>
+<summary>animated gif (epilepsy warning)</summary>
+<img style="width: 100%" src="//tilde.town/~dzwdz/m8trix3.gif" />
+</details>
+
+The author even provided the source with some comments:
+```asm
+org 100h
+
+S: 
+les bx,[si]		; sets ES to the screen, assume si = 0x100
+				; 0x101 is SBB AL,9F and changes the char
+				; without CR flag, there would be
+				; no animation ;)
+lahf			; gets 0x02 (green) in the first run
+				; afterwards, it is not called again
+				; because of alignment ;)
+stosw			; print the green char ...
+				; (is also 0xAB9F and works as segment)
+inc di			; and skip one row
+inc di			;
+jmp short S+1   ; repeat on 0x101 
+```
+
+...yeah, I didn't really get it at first either.
+Let's try to actually understand how it works
+(and learn some stuff about DOS along the way).
+
+Note that **I'll be using hexadecimal numbers "by default"** (without 0x)
+throughout this article to be consistent with DEBUG's output.
+
+## DEBUG
+The only tool I'll be using on DOS's side will be DEBUG.
+It's a
+[delightful](https://tilde.zone/@dzwdz/112866746891156279)
+little tool that ships with MS-DOS.
+I've personally used the FreeDOS version under DOSBox, as that's what I had handy.
+
+There's builtin help if you type in `?`, you can also check out
+[this more in-depth guide](https://montcs.bloomu.edu/Information/LowLevel/DOS-Debug.html),
+or
+[this video of someone using it to assemble new binaries](https://www.youtube.com/watch?v=zc-W8xq7L5Q).
+
+There's a small issue, though.
+m8trix doesn't actually work as-is under DEBUG,
+for reasons I'll explain later.
+
+## a bad explanation of segmentation
+If you're a bit rusty on how real mode segmentation works, then here's a quick reminder.
+There are a few 16-bit segment registers (`CS`, `DS`, `SS`, `ES`).
+When you reference memory in real mode you always[^ithink] use one of those registers,
+even if it's implicit.
+
+If you reference `ES:BX`, the real address this maps to is computed as `ES * 0x10 + BX`.
+This means that there are multiple ways to reference one physical memory location
+(even if that is only slightly relevant here).
+
+As another example, `B800:1234` points to `B9234`.
+
+[^ithink]: At least I think so, but I'm not sure.
+
+## the first look
+My comments are prefixed with a semicolon.
+As mentioned, all numbers shown are in hexadecimal.
+<pre><code>C:\M8TRIX>debug M8TRIX.COM
+-U <i>; disassemble the beginning of the program</i>
+073D:0100 C41C              <a href="//www.felixcloutier.com/x86/lds:les:lfs:lgs:lss">LES</a>     BX,[SI]
+073D:0102 9F                <a href="//www.felixcloutier.com/x86/lahf">LAHF</a>
+073D:0103 AB                <a href="//www.felixcloutier.com/x86/stos:stosb:stosw:stosd:stosq">STOSW</a>
+073D:0104 47                <a href="//www.felixcloutier.com/x86/inc">INC</a>     DI
+073D:0105 47                <a href="//www.felixcloutier.com/x86/inc">INC</a>     DI
+073D:0106 EBF9              <a href="//https://www.felixcloutier.com/x86/jmp">JMP</a>     0101
+-U 101 <i>; disassemble the loop body</i>
+073D:0101 1C9F              <a href="//https://www.felixcloutier.com/x86/sbb">SBB</a>     AL,9F
+073D:0103 AB                <a href="//www.felixcloutier.com/x86/stos:stosb:stosw:stosd:stosq">STOSW</a>
+073D:0104 47                <a href="//www.felixcloutier.com/x86/inc">INC</a>     DI
+073D:0105 47                <a href="//www.felixcloutier.com/x86/inc">INC</a>     DI
+073D:0106 EBF9              <a href="//https://www.felixcloutier.com/x86/jmp">JMP</a>     0101
+-R ; <i>look at the registers</i>
+AX=FFFF BX=0000 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000
+DS=073D ES=073D SS=073D CS=073D IP=0100 NV UP EI PL ZR NA PE NC
+073D:0100 C41C              LES     BX,[SI]                        DS:0000=20CD
+</code></pre>
+
+Let's step through this.
+
+### LES BX,[SI]
+<pre><code>AX=FFFF BX=0000 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000
+DS=073D ES=073D SS=073D CS=073D IP=0100 NV UP EI PL ZR NA PE NC
+073D:0100 C41C              LES     BX,[SI]                        DS:0000=20CD
+-T <i>; single step and show register state</i>
+AX=FFFF BX=<b>20CD</b> CX=0008 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000
+DS=073D ES=<b>9FFF</b> SS=073D CS=073D IP=0102 NV UP EI PL ZR NA PE NC
+</code></pre>
+
+`LES` loads a far pointer from memory.
+The first two bytes of `[SI]` will be loaded into `BX`,
+and the next two bytes will be loaded into `ES`.
+
+We're implicitly using the `DS` segment here, which is where DOS loaded our program into.
+To be more exact --
+our program was loaded into `DS:0100`,
+whereas `DS:0000` (which `[SI]` points at) contains the
+[Program Segment Prefix](https://en.wikipedia.org/wiki/Program_Segment_Prefix).
+Let's take a look at it:
+
+<pre><code>-d 0000
+073D:0000  CD 20 FF 9F 00 EA FF FF-AD DE BD 1D 94 01 00 00 . ..............
+[...]
+-u 0000
+073D:0000 CD20              INT     20
+073D:0002 FF9F00EA          CALL    FAR [BX+EA00]
+[...]
+</code></pre>
+
+The first two bytes always contain `INT 20`, the instruction that quits your program.
+This means that you can quit your program by jumping to `CS:0000` (`CS` = `DS` = `SP`).
+DOS also ensures that the word on top of the stack is `0000`, so you can quit with a `RET`.
+Nifty.
+It also means that `BX` will always be set to `20CD`, but we don't actually really care about that.
+
+The next two bytes point to the segment of the first free byte in memory.
+So, by loading them into `ES`, we make it point to the first free area in memory.
+On most systems that will be `9FFF`.
+This is very convenient, as the
+[mode 13](https://en.wikipedia.org/wiki/Mode_13h)
+framebuffer begins at `A0000`, or `9FFF:0010`.
+This is a
+[well known sizecoding trick](http://www.sizecoding.org/wiki/General_Coding_Tricks#A_smaller_way_to_point_to_Mode_13.27s_screen_segment).
+
+...except mode 13 is a graphic mode.
+We're in mode 3[^mode3],
+a text mode,
+and the text buffer is located at `B800`,
+completely out of reach of `ES`.
+What?
+
+[^mode3]: [`MOV AH, 0F; INT 10`](https://en.wikipedia.org/wiki/INT_10H), and look at the registers. `AL` is the current mode. 
+
+Well, DEBUG fooled us.
+When you start a program under DOS, `SI=0100`.
+[Usually.](https://www.fysnet.net/yourhelp.htm)
+However, for whatever reason, DEBUG zeroes it out instead.
+You can fix it by running `RSI 0100`[^rsi] before the first instruction.
+This is also why the page I've linked to uses `[BX]`, as you can count on it actually being zero.
+
+[^rsi]: No, `RSI` doesn't stand for the 64-bit register. `R` is the register command, which accepts `SI` as the argument.
+
+But let's get back to m8trix.
+If `SI=0100`, then `[SI]` points to the beginning of our program!
+```
+-RSI 0100
+-R
+AX=FFFF BX=0000 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
+DS=073D ES=073D SS=073D CS=073D IP=0100 NV UP EI PL ZR NA PE NC
+073D:0100 C41C              LES     BX,[SI]                        DS:0100=1CC4
+-T
+AX=FFFF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
+DS=073D ES=AB9F SS=073D CS=073D IP=0102 NV UP EI PL ZR NA PE NC
+073D:0102 9F                LAHF
+-U 100
+073D:0100 C41C              LES     BX,[SI]
+073D:0102 9F                LAHF
+073D:0103 AB                STOSW
+```
+As you can see, this means that `BX=1CC4` (the `LES` instruction itself), and `ES=AB9F`.
+This means that `ES` spans `AB9F0-BB9F0`, which includes the entire text buffer!
+
+### LAHF
+<pre><code>AX=FFFF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
+DS=073D ES=AB9F SS=073D CS=073D IP=0102 NV UP EI PL ZR NA PE NC
+073D:0102 9F                LAHF
+-t
+AX=<b>46</b>FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
+DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL ZR NA PE NC
+</code></pre>
+`LAHF` is pretty straightforward, it just loads the top byte of `FLAGS` into `AH`.
+Except, once again, `DEBUG` doesn't set the `FLAGS` register correctly.
+If we were to run m8trix
+[outside of DEBUG](https://www.fysnet.net/yourhelp.htm),
+the top byte of flags would be `02`, and thus this instruction would set `AH=02`.
+This can be fixed in the debugger by running `RAX 02FF`.
+
+### STOSW
+<pre><code>-rax 02FF
+-rax 02FF
+-r
+AX=<b>02FF</b> BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=<b>0000</b>
+DS=073D ES=<b>AB9F</b> SS=073D CS=073D IP=0103 NV UP EI PL ZR NA PE NC
+073D:0103 AB                STOSW
+-t
+AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=<b>0002</b>
+DS=073D ES=AB9F SS=073D CS=073D IP=0104 NV UP EI PL ZR NA PE NC
+</code></pre>
+`STOSW` -- "Store (word) string" -- is a bit more complex.
+It writes the word at `AX` to `ES:DI`, and increments[^df] `DI` by two --
+the amount of bytes written.
+
+This instruction will be run over and over again, with `DI` taking on
+every even value and overflowing every once in a while,
+overwriting everything in `ES` -- including the text buffer -- over and over again.
+
+Each character in the text buffer is represented by a word,
+so each `STOSW` writes a complete character to the screen.
+`AH=02` sets the color to dark green,
+and `AL` (which changes each iteration) chooses the character
+
+[^df]: If the direction flag was set, it would instead decrement it.
+
+### skipping a column, misaligned jump
+<pre><code>AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0002
+DS=073D ES=AB9F SS=073D CS=073D IP=0104 NV UP EI PL ZR NA PE NC
+073D:0104 47                INC     DI
+-t
+AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=<b>0003</b>
+DS=073D ES=AB9F SS=073D CS=073D IP=0105 NV UP EI PL NZ NA PE NC
+073D:0105 47                INC     DI
+-t
+AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
+DS=073D ES=AB9F SS=073D CS=073D IP=0106 NV UP EI PL NZ NA PO NC
+073D:0106 EBF9              JMP     0101
+-t
+AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
+DS=073D ES=AB9F SS=073D CS=073D IP=0101 NV UP EI PL NZ NA PO NC
+073D:0101 1C9F              SBB     AL,9F
+</code></pre>
+We don't want the columns to be packed too tightly together,
+so we skip every other character by adding two bytes to `DI`.
+
+We then jump to `0101`, uncovering a hidden `SBB`.
+
+### misaligned jump, SBB
+<pre><code>AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
+DS=073D ES=AB9F SS=073D CS=073D IP=0101 NV UP EI PL NZ NA PO NC
+073D:0101 1C9F              SBB     AL,9F
+-t
+AX=02<b>60</b> BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
+DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL NZ NA PE NC
+</code></pre>
+
+This is the last instruction, and it's the one that modifies `AL` to animate the character.
+It subtracts `9F` from `AL` with borrow, which is pretty much the grade-school approach.
+That is --
+if it underflows,
+it will "borrow" a bit from the next byte by setting the carry flag.
+The next `SBB` will see that the carry flag is set,
+subtract an additional `1`,
+and unset the carry flag (unless it also underflowed).
+
+Let's see that in practice:
+<pre><code>-rax 028F
+-r
+AX=02<b>8F</b> BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0008
+DS=073D ES=AB9F SS=073D CS=073D IP=0101 NV UP EI PL NZ NA PO <b>NC</b>
+073D:0101 1C9F              SBB     AL,9F
+-t
+AX=02<b>F0</b> BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0008
+DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI NG NZ NA PE <b>CY</b>
+-rip 0101 <i>; i don't care about the rest of the loop, just run the SBB again</i>
+-t
+AX=02<b>50</b> BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0008
+DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL NZ AC PE <b>NC</b>
+</code></pre>
+Notice how the second `SBB` subtracted `A0` instead of `9F` because of the carry flag.
+
+Why does that matter?
+Let's imagine this was a regular `SUB` instead, without a borrow.
+`9F` is odd (coprime to `100`),
+so it would take `100` iterations for `AL` to loop around
+(remember, we're working with hexadecimal here).
+The loop runs for `10000/2=8000` iterations before `DI` repeats,
+and `8000` is divisible by `100`,
+so each pass would have the exact same `AL` values for each character.
+Instead of an animation we'd get a much less impressive static screen.
+
+Instead, `AL` repeats every `55` (decimal 85) `SBB` calls,
+which is coprime to `100`,
+so the `AL` values will differ from pass to pass.
+There's probably a way to determine the period by hand but I just used Python.
+Not all operands work for this, but `9F` seems to be one of the good ones.
+
+To quote the author,
+"without CR flag, there would be no animation :)".
+
+### ending remarks
+I think I've explained every aspect of how m8trix works by now.
+I don't think I need to tell you how brilliant it is.
+
+Notice how the third byte has three different meanings!
+At first it's read as the low byte of the segment offset,
+then it's part of the `LAHF` instruction,
+and then it's the operand for the `SBB`.
+
+`STOSW` is not only the perfect instruction for writing characters in text mode,
+it also works as the high byte of the segment offset that you need to write those
+characters in the first place.
+
+Everything fits together so nicely :)
+
+## m7trix
+Soon after m8trix was published,
+several people tried coming up with ideas to shrink it down even more.
+What follows is the final version HellMood published:
+
+<pre><code>C:\M8TRIX>debug M7TRIX.COM
+-U
+073D:0100 C41C              <a href="//www.felixcloutier.com/x86/lds:les:lfs:lgs:lss">LES</a>     BX,[SI]
+073D:0102 9F                <a href="//www.felixcloutier.com/x86/lahf">LAHF</a>
+073D:0103 AB                <a href="//www.felixcloutier.com/x86/stos:stosb:stosw:stosd:stosq">STOSW</a>
+073D:0104 91                <a href="//www.felixcloutier.com/x86/xchg">XCHG</a>    AX,CX
+073D:0105 EBFA              <a href="//www.felixcloutier.com/x86/jmp">JMP</a>     0101
+-U 101
+073D:0101 1C9F              <a href="//www.felixcloutier.com/x86/sbb">SBB</a>     AL,9F
+073D:0103 AB                <a href="//www.felixcloutier.com/x86/stos:stosb:stosw:stosd:stosq">STOSW</a>
+073D:0104 91                <a href="//www.felixcloutier.com/x86/xchg">XCHG</a>    AX,CX
+073D:0105 EBFA              <a href="//https://www.felixcloutier.com/x86/jmp">JMP</a>     0101
+</code></pre>
+
+Not only is this version smaller, it also looks better, as it clears the screen!
+It's also simple enough that I won't bother tracing through it again.
+
+In short -- instead of skipping over every other column,
+we swap `AX` and `CX` back and forth.
+Both are running the same character animation, but, as `CH=00`,
+every other column is rendered as black or black,
+so the characters are invisible.
+This takes care both of skipping columns AND clearing the screen.
+
+The character cycle is apparently[^apparently] different
+because the carry flag gets reused between odd and even columns,
+but the period still works out to be 85 --
+which I find interesting but I don't really feel like researching why that is.
+
+[^apparently]: The Python script I'm using for testing says so, but I can't really tell if that's true by just looking at the output.
+
+## bonus: simplified version
+This is a slightly modified version
+that works under DEBUG and doesn't use misaligned jumps.
+It's easy to experiment with
+as you can just load it into DEBUG,
+use the assembler to change a single instruction,
+and see what happens.
+```
+073D:0100 BB9FAB            MOV     BX,AB9F
+073D:0103 8EC3              MOV     ES,BX
+073D:0105 B402              MOV     AH,02
+073D:0107 AB                STOSW
+073D:0108 47                INC     DI
+073D:0109 47                INC     DI
+073D:010A 1C9F              SBB     AL,9F
+073D:010C EBF9              JMP     0107
+```