From 06db3d96258f6c6a60ae4ec36fe8e8dcd29ffe86 Mon Sep 17 00:00:00 2001 From: dzwdz Date: Wed, 31 Jul 2024 21:49:19 +0200 Subject: m8trix post --- src/m8trix.md | 363 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 363 insertions(+) create mode 100644 src/m8trix.md (limited to 'src/m8trix.md') diff --git a/src/m8trix.md b/src/m8trix.md new file mode 100644 index 0000000..d8b0cd9 --- /dev/null +++ b/src/m8trix.md @@ -0,0 +1,363 @@ +--- +title: Dissecting m8trix +date: 2024-07-31 +--- + +[m8trix](https://www.pouet.net/prod.php?which=63126) +by +[HellMood](https://www.pouet.net/user.php?who=97586) +is one of my favorite demos. +It packs a pretty cool Matrix-style effect in only 8 bytes: + +
+animated gif (epilepsy warning) + +
+ +The author even provided the source with some comments: +```asm +org 100h + +S: +les bx,[si] ; sets ES to the screen, assume si = 0x100 + ; 0x101 is SBB AL,9F and changes the char + ; without CR flag, there would be + ; no animation ;) +lahf ; gets 0x02 (green) in the first run + ; afterwards, it is not called again + ; because of alignment ;) +stosw ; print the green char ... + ; (is also 0xAB9F and works as segment) +inc di ; and skip one row +inc di ; +jmp short S+1 ; repeat on 0x101 +``` + +...yeah, I didn't really get it at first either. +Let's try to actually understand how it works +(and learn some stuff about DOS along the way). + +Note that **I'll be using hexadecimal numbers "by default"** (without 0x) +throughout this article to be consistent with DEBUG's output. + +## DEBUG +The only tool I'll be using on DOS's side will be DEBUG. +It's a +[delightful](https://tilde.zone/@dzwdz/112866746891156279) +little tool that ships with MS-DOS. +I've personally used the FreeDOS version under DOSBox, as that's what I had handy. + +There's builtin help if you type in `?`, you can also check out +[this more in-depth guide](https://montcs.bloomu.edu/Information/LowLevel/DOS-Debug.html), +or +[this video of someone using it to assemble new binaries](https://www.youtube.com/watch?v=zc-W8xq7L5Q). + +There's a small issue, though. +m8trix doesn't actually work as-is under DEBUG, +for reasons I'll explain later. + +## a bad explanation of segmentation +If you're a bit rusty on how real mode segmentation works, then here's a quick reminder. +There are a few 16-bit segment registers (`CS`, `DS`, `SS`, `ES`). +When you reference memory in real mode you always[^ithink] use one of those registers, +even if it's implicit. + +If you reference `ES:BX`, the real address this maps to is computed as `ES * 0x10 + BX`. +This means that there are multiple ways to reference one physical memory location +(even if that is only slightly relevant here). + +As another example, `B800:1234` points to `B9234`. + +[^ithink]: At least I think so, but I'm not sure. + +## the first look +My comments are prefixed with a semicolon. +As mentioned, all numbers shown are in hexadecimal. +
C:\M8TRIX>debug M8TRIX.COM
+-U ; disassemble the beginning of the program
+073D:0100 C41C              LES     BX,[SI]
+073D:0102 9F                LAHF
+073D:0103 AB                STOSW
+073D:0104 47                INC     DI
+073D:0105 47                INC     DI
+073D:0106 EBF9              JMP     0101
+-U 101 ; disassemble the loop body
+073D:0101 1C9F              SBB     AL,9F
+073D:0103 AB                STOSW
+073D:0104 47                INC     DI
+073D:0105 47                INC     DI
+073D:0106 EBF9              JMP     0101
+-R ; look at the registers
+AX=FFFF BX=0000 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000
+DS=073D ES=073D SS=073D CS=073D IP=0100 NV UP EI PL ZR NA PE NC
+073D:0100 C41C              LES     BX,[SI]                        DS:0000=20CD
+
+ +Let's step through this. + +### LES BX,[SI] +
AX=FFFF BX=0000 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000
+DS=073D ES=073D SS=073D CS=073D IP=0100 NV UP EI PL ZR NA PE NC
+073D:0100 C41C              LES     BX,[SI]                        DS:0000=20CD
+-T ; single step and show register state
+AX=FFFF BX=20CD CX=0008 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000
+DS=073D ES=9FFF SS=073D CS=073D IP=0102 NV UP EI PL ZR NA PE NC
+
+ +`LES` loads a far pointer from memory. +The first two bytes of `[SI]` will be loaded into `BX`, +and the next two bytes will be loaded into `ES`. + +We're implicitly using the `DS` segment here, which is where DOS loaded our program into. +To be more exact -- +our program was loaded into `DS:0100`, +whereas `DS:0000` (which `[SI]` points at) contains the +[Program Segment Prefix](https://en.wikipedia.org/wiki/Program_Segment_Prefix). +Let's take a look at it: + +
-d 0000
+073D:0000  CD 20 FF 9F 00 EA FF FF-AD DE BD 1D 94 01 00 00 . ..............
+[...]
+-u 0000
+073D:0000 CD20              INT     20
+073D:0002 FF9F00EA          CALL    FAR [BX+EA00]
+[...]
+
+ +The first two bytes always contain `INT 20`, the instruction that quits your program. +This means that you can quit your program by jumping to `CS:0000` (`CS` = `DS` = `SP`). +DOS also ensures that the word on top of the stack is `0000`, so you can quit with a `RET`. +Nifty. +It also means that `BX` will always be set to `20CD`, but we don't actually really care about that. + +The next two bytes point to the segment of the first free byte in memory. +So, by loading them into `ES`, we make it point to the first free area in memory. +On most systems that will be `9FFF`. +This is very convenient, as the +[mode 13](https://en.wikipedia.org/wiki/Mode_13h) +framebuffer begins at `A0000`, or `9FFF:0010`. +This is a +[well known sizecoding trick](http://www.sizecoding.org/wiki/General_Coding_Tricks#A_smaller_way_to_point_to_Mode_13.27s_screen_segment). + +...except mode 13 is a graphic mode. +We're in mode 3[^mode3], +a text mode, +and the text buffer is located at `B800`, +completely out of reach of `ES`. +What? + +[^mode3]: [`MOV AH, 0F; INT 10`](https://en.wikipedia.org/wiki/INT_10H), and look at the registers. `AL` is the current mode. + +Well, DEBUG fooled us. +When you start a program under DOS, `SI=0100`. +[Usually.](https://www.fysnet.net/yourhelp.htm) +However, for whatever reason, DEBUG zeroes it out instead. +You can fix it by running `RSI 0100`[^rsi] before the first instruction. +This is also why the page I've linked to uses `[BX]`, as you can count on it actually being zero. + +[^rsi]: No, `RSI` doesn't stand for the 64-bit register. `R` is the register command, which accepts `SI` as the argument. + +But let's get back to m8trix. +If `SI=0100`, then `[SI]` points to the beginning of our program! +``` +-RSI 0100 +-R +AX=FFFF BX=0000 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000 +DS=073D ES=073D SS=073D CS=073D IP=0100 NV UP EI PL ZR NA PE NC +073D:0100 C41C LES BX,[SI] DS:0100=1CC4 +-T +AX=FFFF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000 +DS=073D ES=AB9F SS=073D CS=073D IP=0102 NV UP EI PL ZR NA PE NC +073D:0102 9F LAHF +-U 100 +073D:0100 C41C LES BX,[SI] +073D:0102 9F LAHF +073D:0103 AB STOSW +``` +As you can see, this means that `BX=1CC4` (the `LES` instruction itself), and `ES=AB9F`. +This means that `ES` spans `AB9F0-BB9F0`, which includes the entire text buffer! + +### LAHF +
AX=FFFF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
+DS=073D ES=AB9F SS=073D CS=073D IP=0102 NV UP EI PL ZR NA PE NC
+073D:0102 9F                LAHF
+-t
+AX=46FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
+DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL ZR NA PE NC
+
+`LAHF` is pretty straightforward, it just loads the top byte of `FLAGS` into `AH`. +Except, once again, `DEBUG` doesn't set the `FLAGS` register correctly. +If we were to run m8trix +[outside of DEBUG](https://www.fysnet.net/yourhelp.htm), +the top byte of flags would be `02`, and thus this instruction would set `AH=02`. +This can be fixed in the debugger by running `RAX 02FF`. + +### STOSW +
-rax 02FF
+-rax 02FF
+-r
+AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
+DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL ZR NA PE NC
+073D:0103 AB                STOSW
+-t
+AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0002
+DS=073D ES=AB9F SS=073D CS=073D IP=0104 NV UP EI PL ZR NA PE NC
+
+`STOSW` -- "Store (word) string" -- is a bit more complex. +It writes the word at `AX` to `ES:DI`, and increments[^df] `DI` by two -- +the amount of bytes written. + +This instruction will be run over and over again, with `DI` taking on +every even value and overflowing every once in a while, +overwriting everything in `ES` -- including the text buffer -- over and over again. + +Each character in the text buffer is represented by a word, +so each `STOSW` writes a complete character to the screen. +`AH=02` sets the color to dark green, +and `AL` (which changes each iteration) chooses the character + +[^df]: If the direction flag was set, it would instead decrement it. + +### skipping a column, misaligned jump +
AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0002
+DS=073D ES=AB9F SS=073D CS=073D IP=0104 NV UP EI PL ZR NA PE NC
+073D:0104 47                INC     DI
+-t
+AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0003
+DS=073D ES=AB9F SS=073D CS=073D IP=0105 NV UP EI PL NZ NA PE NC
+073D:0105 47                INC     DI
+-t
+AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
+DS=073D ES=AB9F SS=073D CS=073D IP=0106 NV UP EI PL NZ NA PO NC
+073D:0106 EBF9              JMP     0101
+-t
+AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
+DS=073D ES=AB9F SS=073D CS=073D IP=0101 NV UP EI PL NZ NA PO NC
+073D:0101 1C9F              SBB     AL,9F
+
+We don't want the columns to be packed too tightly together, +so we skip every other character by adding two bytes to `DI`. + +We then jump to `0101`, uncovering a hidden `SBB`. + +### misaligned jump, SBB +
AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
+DS=073D ES=AB9F SS=073D CS=073D IP=0101 NV UP EI PL NZ NA PO NC
+073D:0101 1C9F              SBB     AL,9F
+-t
+AX=0260 BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
+DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL NZ NA PE NC
+
+ +This is the last instruction, and it's the one that modifies `AL` to animate the character. +It subtracts `9F` from `AL` with borrow, which is pretty much the grade-school approach. +That is -- +if it underflows, +it will "borrow" a bit from the next byte by setting the carry flag. +The next `SBB` will see that the carry flag is set, +subtract an additional `1`, +and unset the carry flag (unless it also underflowed). + +Let's see that in practice: +
-rax 028F
+-r
+AX=028F BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0008
+DS=073D ES=AB9F SS=073D CS=073D IP=0101 NV UP EI PL NZ NA PO NC
+073D:0101 1C9F              SBB     AL,9F
+-t
+AX=02F0 BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0008
+DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI NG NZ NA PE CY
+-rip 0101 ; i don't care about the rest of the loop, just run the SBB again
+-t
+AX=0250 BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0008
+DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL NZ AC PE NC
+
+Notice how the second `SBB` subtracted `A0` instead of `9F` because of the carry flag. + +Why does that matter? +Let's imagine this was a regular `SUB` instead, without a borrow. +`9F` is odd (coprime to `100`), +so it would take `100` iterations for `AL` to loop around +(remember, we're working with hexadecimal here). +The loop runs for `10000/2=8000` iterations before `DI` repeats, +and `8000` is divisible by `100`, +so each pass would have the exact same `AL` values for each character. +Instead of an animation we'd get a much less impressive static screen. + +Instead, `AL` repeats every `55` (decimal 85) `SBB` calls, +which is coprime to `100`, +so the `AL` values will differ from pass to pass. +There's probably a way to determine the period by hand but I just used Python. +Not all operands work for this, but `9F` seems to be one of the good ones. + +To quote the author, +"without CR flag, there would be no animation :)". + +### ending remarks +I think I've explained every aspect of how m8trix works by now. +I don't think I need to tell you how brilliant it is. + +Notice how the third byte has three different meanings! +At first it's read as the low byte of the segment offset, +then it's part of the `LAHF` instruction, +and then it's the operand for the `SBB`. + +`STOSW` is not only the perfect instruction for writing characters in text mode, +it also works as the high byte of the segment offset that you need to write those +characters in the first place. + +Everything fits together so nicely :) + +## m7trix +Soon after m8trix was published, +several people tried coming up with ideas to shrink it down even more. +What follows is the final version HellMood published: + +
C:\M8TRIX>debug M7TRIX.COM
+-U
+073D:0100 C41C              LES     BX,[SI]
+073D:0102 9F                LAHF
+073D:0103 AB                STOSW
+073D:0104 91                XCHG    AX,CX
+073D:0105 EBFA              JMP     0101
+-U 101
+073D:0101 1C9F              SBB     AL,9F
+073D:0103 AB                STOSW
+073D:0104 91                XCHG    AX,CX
+073D:0105 EBFA              JMP     0101
+
+ +Not only is this version smaller, it also looks better, as it clears the screen! +It's also simple enough that I won't bother tracing through it again. + +In short -- instead of skipping over every other column, +we swap `AX` and `CX` back and forth. +Both are running the same character animation, but, as `CH=00`, +every other column is rendered as black or black, +so the characters are invisible. +This takes care both of skipping columns AND clearing the screen. + +The character cycle is apparently[^apparently] different +because the carry flag gets reused between odd and even columns, +but the period still works out to be 85 -- +which I find interesting but I don't really feel like researching why that is. + +[^apparently]: The Python script I'm using for testing says so, but I can't really tell if that's true by just looking at the output. + +## bonus: simplified version +This is a slightly modified version +that works under DEBUG and doesn't use misaligned jumps. +It's easy to experiment with +as you can just load it into DEBUG, +use the assembler to change a single instruction, +and see what happens. +``` +073D:0100 BB9FAB MOV BX,AB9F +073D:0103 8EC3 MOV ES,BX +073D:0105 B402 MOV AH,02 +073D:0107 AB STOSW +073D:0108 47 INC DI +073D:0109 47 INC DI +073D:010A 1C9F SBB AL,9F +073D:010C EBF9 JMP 0107 +``` -- cgit 1.4.1-2-gfad0