diff options
-rw-r--r-- | Makefile | 1 | ||||
-rw-r--r-- | src/feed.ass | 1 | ||||
-rw-r--r-- | src/m8trix.md | 363 |
3 files changed, 365 insertions, 0 deletions
diff --git a/Makefile b/Makefile index 33a705b..ff0be70 100644 --- a/Makefile +++ b/Makefile @@ -6,6 +6,7 @@ out/feeds.html \ out/our.html \ out/tui.html \ out/python_recursion.html \ +out/m8trix.html \ # posts end out/index.html: src/index.sh src/feed.ass diff --git a/src/feed.ass b/src/feed.ass index c058348..bbd578e 100644 --- a/src/feed.ass +++ b/src/feed.ass @@ -1,5 +1,6 @@ # actually simple syndication # https://tilde.town/~dzwdz/ass/ +2024-07-31 https://tilde.town/~dzwdz/blog/m8trix.html Dissecting m8trix 2023-08-19 https://tilde.town/~dzwdz/blog/tui.html On TUIs 2023-07-23 https://tilde.town/~dzwdz/blog/our.html /town/our, a tildebrained irc bot 2023-05-25 https://tilde.town/~dzwdz/blog/feeds.html Linear feeds are a dark pattern diff --git a/src/m8trix.md b/src/m8trix.md new file mode 100644 index 0000000..d8b0cd9 --- /dev/null +++ b/src/m8trix.md @@ -0,0 +1,363 @@ +--- +title: Dissecting m8trix +date: 2024-07-31 +--- + +[m8trix](https://www.pouet.net/prod.php?which=63126) +by +[HellMood](https://www.pouet.net/user.php?who=97586) +is one of my favorite demos. +It packs a pretty cool Matrix-style effect in only 8 bytes: + +<details> +<summary>animated gif (epilepsy warning)</summary> +<img style="width: 100%" src="//tilde.town/~dzwdz/m8trix3.gif" /> +</details> + +The author even provided the source with some comments: +```asm +org 100h + +S: +les bx,[si] ; sets ES to the screen, assume si = 0x100 + ; 0x101 is SBB AL,9F and changes the char + ; without CR flag, there would be + ; no animation ;) +lahf ; gets 0x02 (green) in the first run + ; afterwards, it is not called again + ; because of alignment ;) +stosw ; print the green char ... + ; (is also 0xAB9F and works as segment) +inc di ; and skip one row +inc di ; +jmp short S+1 ; repeat on 0x101 +``` + +...yeah, I didn't really get it at first either. +Let's try to actually understand how it works +(and learn some stuff about DOS along the way). + +Note that **I'll be using hexadecimal numbers "by default"** (without 0x) +throughout this article to be consistent with DEBUG's output. + +## DEBUG +The only tool I'll be using on DOS's side will be DEBUG. +It's a +[delightful](https://tilde.zone/@dzwdz/112866746891156279) +little tool that ships with MS-DOS. +I've personally used the FreeDOS version under DOSBox, as that's what I had handy. + +There's builtin help if you type in `?`, you can also check out +[this more in-depth guide](https://montcs.bloomu.edu/Information/LowLevel/DOS-Debug.html), +or +[this video of someone using it to assemble new binaries](https://www.youtube.com/watch?v=zc-W8xq7L5Q). + +There's a small issue, though. +m8trix doesn't actually work as-is under DEBUG, +for reasons I'll explain later. + +## a bad explanation of segmentation +If you're a bit rusty on how real mode segmentation works, then here's a quick reminder. +There are a few 16-bit segment registers (`CS`, `DS`, `SS`, `ES`). +When you reference memory in real mode you always[^ithink] use one of those registers, +even if it's implicit. + +If you reference `ES:BX`, the real address this maps to is computed as `ES * 0x10 + BX`. +This means that there are multiple ways to reference one physical memory location +(even if that is only slightly relevant here). + +As another example, `B800:1234` points to `B9234`. + +[^ithink]: At least I think so, but I'm not sure. + +## the first look +My comments are prefixed with a semicolon. +As mentioned, all numbers shown are in hexadecimal. +<pre><code>C:\M8TRIX>debug M8TRIX.COM +-U <i>; disassemble the beginning of the program</i> +073D:0100 C41C <a href="//www.felixcloutier.com/x86/lds:les:lfs:lgs:lss">LES</a> BX,[SI] +073D:0102 9F <a href="//www.felixcloutier.com/x86/lahf">LAHF</a> +073D:0103 AB <a href="//www.felixcloutier.com/x86/stos:stosb:stosw:stosd:stosq">STOSW</a> +073D:0104 47 <a href="//www.felixcloutier.com/x86/inc">INC</a> DI +073D:0105 47 <a href="//www.felixcloutier.com/x86/inc">INC</a> DI +073D:0106 EBF9 <a href="//https://www.felixcloutier.com/x86/jmp">JMP</a> 0101 +-U 101 <i>; disassemble the loop body</i> +073D:0101 1C9F <a href="//https://www.felixcloutier.com/x86/sbb">SBB</a> AL,9F +073D:0103 AB <a href="//www.felixcloutier.com/x86/stos:stosb:stosw:stosd:stosq">STOSW</a> +073D:0104 47 <a href="//www.felixcloutier.com/x86/inc">INC</a> DI +073D:0105 47 <a href="//www.felixcloutier.com/x86/inc">INC</a> DI +073D:0106 EBF9 <a href="//https://www.felixcloutier.com/x86/jmp">JMP</a> 0101 +-R ; <i>look at the registers</i> +AX=FFFF BX=0000 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000 +DS=073D ES=073D SS=073D CS=073D IP=0100 NV UP EI PL ZR NA PE NC +073D:0100 C41C LES BX,[SI] DS:0000=20CD +</code></pre> + +Let's step through this. + +### LES BX,[SI] +<pre><code>AX=FFFF BX=0000 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000 +DS=073D ES=073D SS=073D CS=073D IP=0100 NV UP EI PL ZR NA PE NC +073D:0100 C41C LES BX,[SI] DS:0000=20CD +-T <i>; single step and show register state</i> +AX=FFFF BX=<b>20CD</b> CX=0008 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000 +DS=073D ES=<b>9FFF</b> SS=073D CS=073D IP=0102 NV UP EI PL ZR NA PE NC +</code></pre> + +`LES` loads a far pointer from memory. +The first two bytes of `[SI]` will be loaded into `BX`, +and the next two bytes will be loaded into `ES`. + +We're implicitly using the `DS` segment here, which is where DOS loaded our program into. +To be more exact -- +our program was loaded into `DS:0100`, +whereas `DS:0000` (which `[SI]` points at) contains the +[Program Segment Prefix](https://en.wikipedia.org/wiki/Program_Segment_Prefix). +Let's take a look at it: + +<pre><code>-d 0000 +073D:0000 CD 20 FF 9F 00 EA FF FF-AD DE BD 1D 94 01 00 00 . .............. +[...] +-u 0000 +073D:0000 CD20 INT 20 +073D:0002 FF9F00EA CALL FAR [BX+EA00] +[...] +</code></pre> + +The first two bytes always contain `INT 20`, the instruction that quits your program. +This means that you can quit your program by jumping to `CS:0000` (`CS` = `DS` = `SP`). +DOS also ensures that the word on top of the stack is `0000`, so you can quit with a `RET`. +Nifty. +It also means that `BX` will always be set to `20CD`, but we don't actually really care about that. + +The next two bytes point to the segment of the first free byte in memory. +So, by loading them into `ES`, we make it point to the first free area in memory. +On most systems that will be `9FFF`. +This is very convenient, as the +[mode 13](https://en.wikipedia.org/wiki/Mode_13h) +framebuffer begins at `A0000`, or `9FFF:0010`. +This is a +[well known sizecoding trick](http://www.sizecoding.org/wiki/General_Coding_Tricks#A_smaller_way_to_point_to_Mode_13.27s_screen_segment). + +...except mode 13 is a graphic mode. +We're in mode 3[^mode3], +a text mode, +and the text buffer is located at `B800`, +completely out of reach of `ES`. +What? + +[^mode3]: [`MOV AH, 0F; INT 10`](https://en.wikipedia.org/wiki/INT_10H), and look at the registers. `AL` is the current mode. + +Well, DEBUG fooled us. +When you start a program under DOS, `SI=0100`. +[Usually.](https://www.fysnet.net/yourhelp.htm) +However, for whatever reason, DEBUG zeroes it out instead. +You can fix it by running `RSI 0100`[^rsi] before the first instruction. +This is also why the page I've linked to uses `[BX]`, as you can count on it actually being zero. + +[^rsi]: No, `RSI` doesn't stand for the 64-bit register. `R` is the register command, which accepts `SI` as the argument. + +But let's get back to m8trix. +If `SI=0100`, then `[SI]` points to the beginning of our program! +``` +-RSI 0100 +-R +AX=FFFF BX=0000 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000 +DS=073D ES=073D SS=073D CS=073D IP=0100 NV UP EI PL ZR NA PE NC +073D:0100 C41C LES BX,[SI] DS:0100=1CC4 +-T +AX=FFFF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000 +DS=073D ES=AB9F SS=073D CS=073D IP=0102 NV UP EI PL ZR NA PE NC +073D:0102 9F LAHF +-U 100 +073D:0100 C41C LES BX,[SI] +073D:0102 9F LAHF +073D:0103 AB STOSW +``` +As you can see, this means that `BX=1CC4` (the `LES` instruction itself), and `ES=AB9F`. +This means that `ES` spans `AB9F0-BB9F0`, which includes the entire text buffer! + +### LAHF +<pre><code>AX=FFFF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000 +DS=073D ES=AB9F SS=073D CS=073D IP=0102 NV UP EI PL ZR NA PE NC +073D:0102 9F LAHF +-t +AX=<b>46</b>FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000 +DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL ZR NA PE NC +</code></pre> +`LAHF` is pretty straightforward, it just loads the top byte of `FLAGS` into `AH`. +Except, once again, `DEBUG` doesn't set the `FLAGS` register correctly. +If we were to run m8trix +[outside of DEBUG](https://www.fysnet.net/yourhelp.htm), +the top byte of flags would be `02`, and thus this instruction would set `AH=02`. +This can be fixed in the debugger by running `RAX 02FF`. + +### STOSW +<pre><code>-rax 02FF +-rax 02FF +-r +AX=<b>02FF</b> BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=<b>0000</b> +DS=073D ES=<b>AB9F</b> SS=073D CS=073D IP=0103 NV UP EI PL ZR NA PE NC +073D:0103 AB STOSW +-t +AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=<b>0002</b> +DS=073D ES=AB9F SS=073D CS=073D IP=0104 NV UP EI PL ZR NA PE NC +</code></pre> +`STOSW` -- "Store (word) string" -- is a bit more complex. +It writes the word at `AX` to `ES:DI`, and increments[^df] `DI` by two -- +the amount of bytes written. + +This instruction will be run over and over again, with `DI` taking on +every even value and overflowing every once in a while, +overwriting everything in `ES` -- including the text buffer -- over and over again. + +Each character in the text buffer is represented by a word, +so each `STOSW` writes a complete character to the screen. +`AH=02` sets the color to dark green, +and `AL` (which changes each iteration) chooses the character + +[^df]: If the direction flag was set, it would instead decrement it. + +### skipping a column, misaligned jump +<pre><code>AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0002 +DS=073D ES=AB9F SS=073D CS=073D IP=0104 NV UP EI PL ZR NA PE NC +073D:0104 47 INC DI +-t +AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=<b>0003</b> +DS=073D ES=AB9F SS=073D CS=073D IP=0105 NV UP EI PL NZ NA PE NC +073D:0105 47 INC DI +-t +AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004 +DS=073D ES=AB9F SS=073D CS=073D IP=0106 NV UP EI PL NZ NA PO NC +073D:0106 EBF9 JMP 0101 +-t +AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004 +DS=073D ES=AB9F SS=073D CS=073D IP=0101 NV UP EI PL NZ NA PO NC +073D:0101 1C9F SBB AL,9F +</code></pre> +We don't want the columns to be packed too tightly together, +so we skip every other character by adding two bytes to `DI`. + +We then jump to `0101`, uncovering a hidden `SBB`. + +### misaligned jump, SBB +<pre><code>AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004 +DS=073D ES=AB9F SS=073D CS=073D IP=0101 NV UP EI PL NZ NA PO NC +073D:0101 1C9F SBB AL,9F +-t +AX=02<b>60</b> BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004 +DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL NZ NA PE NC +</code></pre> + +This is the last instruction, and it's the one that modifies `AL` to animate the character. +It subtracts `9F` from `AL` with borrow, which is pretty much the grade-school approach. +That is -- +if it underflows, +it will "borrow" a bit from the next byte by setting the carry flag. +The next `SBB` will see that the carry flag is set, +subtract an additional `1`, +and unset the carry flag (unless it also underflowed). + +Let's see that in practice: +<pre><code>-rax 028F +-r +AX=02<b>8F</b> BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0008 +DS=073D ES=AB9F SS=073D CS=073D IP=0101 NV UP EI PL NZ NA PO <b>NC</b> +073D:0101 1C9F SBB AL,9F +-t +AX=02<b>F0</b> BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0008 +DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI NG NZ NA PE <b>CY</b> +-rip 0101 <i>; i don't care about the rest of the loop, just run the SBB again</i> +-t +AX=02<b>50</b> BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0008 +DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL NZ AC PE <b>NC</b> +</code></pre> +Notice how the second `SBB` subtracted `A0` instead of `9F` because of the carry flag. + +Why does that matter? +Let's imagine this was a regular `SUB` instead, without a borrow. +`9F` is odd (coprime to `100`), +so it would take `100` iterations for `AL` to loop around +(remember, we're working with hexadecimal here). +The loop runs for `10000/2=8000` iterations before `DI` repeats, +and `8000` is divisible by `100`, +so each pass would have the exact same `AL` values for each character. +Instead of an animation we'd get a much less impressive static screen. + +Instead, `AL` repeats every `55` (decimal 85) `SBB` calls, +which is coprime to `100`, +so the `AL` values will differ from pass to pass. +There's probably a way to determine the period by hand but I just used Python. +Not all operands work for this, but `9F` seems to be one of the good ones. + +To quote the author, +"without CR flag, there would be no animation :)". + +### ending remarks +I think I've explained every aspect of how m8trix works by now. +I don't think I need to tell you how brilliant it is. + +Notice how the third byte has three different meanings! +At first it's read as the low byte of the segment offset, +then it's part of the `LAHF` instruction, +and then it's the operand for the `SBB`. + +`STOSW` is not only the perfect instruction for writing characters in text mode, +it also works as the high byte of the segment offset that you need to write those +characters in the first place. + +Everything fits together so nicely :) + +## m7trix +Soon after m8trix was published, +several people tried coming up with ideas to shrink it down even more. +What follows is the final version HellMood published: + +<pre><code>C:\M8TRIX>debug M7TRIX.COM +-U +073D:0100 C41C <a href="//www.felixcloutier.com/x86/lds:les:lfs:lgs:lss">LES</a> BX,[SI] +073D:0102 9F <a href="//www.felixcloutier.com/x86/lahf">LAHF</a> +073D:0103 AB <a href="//www.felixcloutier.com/x86/stos:stosb:stosw:stosd:stosq">STOSW</a> +073D:0104 91 <a href="//www.felixcloutier.com/x86/xchg">XCHG</a> AX,CX +073D:0105 EBFA <a href="//www.felixcloutier.com/x86/jmp">JMP</a> 0101 +-U 101 +073D:0101 1C9F <a href="//www.felixcloutier.com/x86/sbb">SBB</a> AL,9F +073D:0103 AB <a href="//www.felixcloutier.com/x86/stos:stosb:stosw:stosd:stosq">STOSW</a> +073D:0104 91 <a href="//www.felixcloutier.com/x86/xchg">XCHG</a> AX,CX +073D:0105 EBFA <a href="//https://www.felixcloutier.com/x86/jmp">JMP</a> 0101 +</code></pre> + +Not only is this version smaller, it also looks better, as it clears the screen! +It's also simple enough that I won't bother tracing through it again. + +In short -- instead of skipping over every other column, +we swap `AX` and `CX` back and forth. +Both are running the same character animation, but, as `CH=00`, +every other column is rendered as black or black, +so the characters are invisible. +This takes care both of skipping columns AND clearing the screen. + +The character cycle is apparently[^apparently] different +because the carry flag gets reused between odd and even columns, +but the period still works out to be 85 -- +which I find interesting but I don't really feel like researching why that is. + +[^apparently]: The Python script I'm using for testing says so, but I can't really tell if that's true by just looking at the output. + +## bonus: simplified version +This is a slightly modified version +that works under DEBUG and doesn't use misaligned jumps. +It's easy to experiment with +as you can just load it into DEBUG, +use the assembler to change a single instruction, +and see what happens. +``` +073D:0100 BB9FAB MOV BX,AB9F +073D:0103 8EC3 MOV ES,BX +073D:0105 B402 MOV AH,02 +073D:0107 AB STOSW +073D:0108 47 INC DI +073D:0109 47 INC DI +073D:010A 1C9F SBB AL,9F +073D:010C EBF9 JMP 0107 +``` |