From 06db3d96258f6c6a60ae4ec36fe8e8dcd29ffe86 Mon Sep 17 00:00:00 2001
From: dzwdz
Date: Wed, 31 Jul 2024 21:49:19 +0200
Subject: m8trix post
---
src/m8trix.md | 363 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 363 insertions(+)
create mode 100644 src/m8trix.md
(limited to 'src/m8trix.md')
diff --git a/src/m8trix.md b/src/m8trix.md
new file mode 100644
index 0000000..d8b0cd9
--- /dev/null
+++ b/src/m8trix.md
@@ -0,0 +1,363 @@
+---
+title: Dissecting m8trix
+date: 2024-07-31
+---
+
+[m8trix](https://www.pouet.net/prod.php?which=63126)
+by
+[HellMood](https://www.pouet.net/user.php?who=97586)
+is one of my favorite demos.
+It packs a pretty cool Matrix-style effect in only 8 bytes:
+
+animated gif (epilepsy warning)
+
+
C:\M8TRIX>debug M8TRIX.COM
+-U ; disassemble the beginning of the program
+073D:0100 C41C LES BX,[SI]
+073D:0102 9F LAHF
+073D:0103 AB STOSW
+073D:0104 47 INC DI
+073D:0105 47 INC DI
+073D:0106 EBF9 JMP 0101
+-U 101 ; disassemble the loop body
+073D:0101 1C9F SBB AL,9F
+073D:0103 AB STOSW
+073D:0104 47 INC DI
+073D:0105 47 INC DI
+073D:0106 EBF9 JMP 0101
+-R ; look at the registers
+AX=FFFF BX=0000 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000
+DS=073D ES=073D SS=073D CS=073D IP=0100 NV UP EI PL ZR NA PE NC
+073D:0100 C41C LES BX,[SI] DS:0000=20CD
+
+
+Let's step through this.
+
+### LES BX,[SI]
+AX=FFFF BX=0000 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000
+DS=073D ES=073D SS=073D CS=073D IP=0100 NV UP EI PL ZR NA PE NC
+073D:0100 C41C LES BX,[SI] DS:0000=20CD
+-T ; single step and show register state
+AX=FFFF BX=20CD CX=0008 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000
+DS=073D ES=9FFF SS=073D CS=073D IP=0102 NV UP EI PL ZR NA PE NC
+
+
+`LES` loads a far pointer from memory.
+The first two bytes of `[SI]` will be loaded into `BX`,
+and the next two bytes will be loaded into `ES`.
+
+We're implicitly using the `DS` segment here, which is where DOS loaded our program into.
+To be more exact --
+our program was loaded into `DS:0100`,
+whereas `DS:0000` (which `[SI]` points at) contains the
+[Program Segment Prefix](https://en.wikipedia.org/wiki/Program_Segment_Prefix).
+Let's take a look at it:
+
+-d 0000
+073D:0000 CD 20 FF 9F 00 EA FF FF-AD DE BD 1D 94 01 00 00 . ..............
+[...]
+-u 0000
+073D:0000 CD20 INT 20
+073D:0002 FF9F00EA CALL FAR [BX+EA00]
+[...]
+
+
+The first two bytes always contain `INT 20`, the instruction that quits your program.
+This means that you can quit your program by jumping to `CS:0000` (`CS` = `DS` = `SP`).
+DOS also ensures that the word on top of the stack is `0000`, so you can quit with a `RET`.
+Nifty.
+It also means that `BX` will always be set to `20CD`, but we don't actually really care about that.
+
+The next two bytes point to the segment of the first free byte in memory.
+So, by loading them into `ES`, we make it point to the first free area in memory.
+On most systems that will be `9FFF`.
+This is very convenient, as the
+[mode 13](https://en.wikipedia.org/wiki/Mode_13h)
+framebuffer begins at `A0000`, or `9FFF:0010`.
+This is a
+[well known sizecoding trick](http://www.sizecoding.org/wiki/General_Coding_Tricks#A_smaller_way_to_point_to_Mode_13.27s_screen_segment).
+
+...except mode 13 is a graphic mode.
+We're in mode 3[^mode3],
+a text mode,
+and the text buffer is located at `B800`,
+completely out of reach of `ES`.
+What?
+
+[^mode3]: [`MOV AH, 0F; INT 10`](https://en.wikipedia.org/wiki/INT_10H), and look at the registers. `AL` is the current mode.
+
+Well, DEBUG fooled us.
+When you start a program under DOS, `SI=0100`.
+[Usually.](https://www.fysnet.net/yourhelp.htm)
+However, for whatever reason, DEBUG zeroes it out instead.
+You can fix it by running `RSI 0100`[^rsi] before the first instruction.
+This is also why the page I've linked to uses `[BX]`, as you can count on it actually being zero.
+
+[^rsi]: No, `RSI` doesn't stand for the 64-bit register. `R` is the register command, which accepts `SI` as the argument.
+
+But let's get back to m8trix.
+If `SI=0100`, then `[SI]` points to the beginning of our program!
+```
+-RSI 0100
+-R
+AX=FFFF BX=0000 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
+DS=073D ES=073D SS=073D CS=073D IP=0100 NV UP EI PL ZR NA PE NC
+073D:0100 C41C LES BX,[SI] DS:0100=1CC4
+-T
+AX=FFFF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
+DS=073D ES=AB9F SS=073D CS=073D IP=0102 NV UP EI PL ZR NA PE NC
+073D:0102 9F LAHF
+-U 100
+073D:0100 C41C LES BX,[SI]
+073D:0102 9F LAHF
+073D:0103 AB STOSW
+```
+As you can see, this means that `BX=1CC4` (the `LES` instruction itself), and `ES=AB9F`.
+This means that `ES` spans `AB9F0-BB9F0`, which includes the entire text buffer!
+
+### LAHF
+AX=FFFF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
+DS=073D ES=AB9F SS=073D CS=073D IP=0102 NV UP EI PL ZR NA PE NC
+073D:0102 9F LAHF
+-t
+AX=46FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
+DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL ZR NA PE NC
+
+`LAHF` is pretty straightforward, it just loads the top byte of `FLAGS` into `AH`.
+Except, once again, `DEBUG` doesn't set the `FLAGS` register correctly.
+If we were to run m8trix
+[outside of DEBUG](https://www.fysnet.net/yourhelp.htm),
+the top byte of flags would be `02`, and thus this instruction would set `AH=02`.
+This can be fixed in the debugger by running `RAX 02FF`.
+
+### STOSW
+-rax 02FF
+-rax 02FF
+-r
+AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
+DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL ZR NA PE NC
+073D:0103 AB STOSW
+-t
+AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0002
+DS=073D ES=AB9F SS=073D CS=073D IP=0104 NV UP EI PL ZR NA PE NC
+
+`STOSW` -- "Store (word) string" -- is a bit more complex.
+It writes the word at `AX` to `ES:DI`, and increments[^df] `DI` by two --
+the amount of bytes written.
+
+This instruction will be run over and over again, with `DI` taking on
+every even value and overflowing every once in a while,
+overwriting everything in `ES` -- including the text buffer -- over and over again.
+
+Each character in the text buffer is represented by a word,
+so each `STOSW` writes a complete character to the screen.
+`AH=02` sets the color to dark green,
+and `AL` (which changes each iteration) chooses the character
+
+[^df]: If the direction flag was set, it would instead decrement it.
+
+### skipping a column, misaligned jump
+AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0002
+DS=073D ES=AB9F SS=073D CS=073D IP=0104 NV UP EI PL ZR NA PE NC
+073D:0104 47 INC DI
+-t
+AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0003
+DS=073D ES=AB9F SS=073D CS=073D IP=0105 NV UP EI PL NZ NA PE NC
+073D:0105 47 INC DI
+-t
+AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
+DS=073D ES=AB9F SS=073D CS=073D IP=0106 NV UP EI PL NZ NA PO NC
+073D:0106 EBF9 JMP 0101
+-t
+AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
+DS=073D ES=AB9F SS=073D CS=073D IP=0101 NV UP EI PL NZ NA PO NC
+073D:0101 1C9F SBB AL,9F
+
+We don't want the columns to be packed too tightly together,
+so we skip every other character by adding two bytes to `DI`.
+
+We then jump to `0101`, uncovering a hidden `SBB`.
+
+### misaligned jump, SBB
+AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
+DS=073D ES=AB9F SS=073D CS=073D IP=0101 NV UP EI PL NZ NA PO NC
+073D:0101 1C9F SBB AL,9F
+-t
+AX=0260 BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
+DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL NZ NA PE NC
+
+
+This is the last instruction, and it's the one that modifies `AL` to animate the character.
+It subtracts `9F` from `AL` with borrow, which is pretty much the grade-school approach.
+That is --
+if it underflows,
+it will "borrow" a bit from the next byte by setting the carry flag.
+The next `SBB` will see that the carry flag is set,
+subtract an additional `1`,
+and unset the carry flag (unless it also underflowed).
+
+Let's see that in practice:
+-rax 028F
+-r
+AX=028F BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0008
+DS=073D ES=AB9F SS=073D CS=073D IP=0101 NV UP EI PL NZ NA PO NC
+073D:0101 1C9F SBB AL,9F
+-t
+AX=02F0 BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0008
+DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI NG NZ NA PE CY
+-rip 0101 ; i don't care about the rest of the loop, just run the SBB again
+-t
+AX=0250 BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0008
+DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL NZ AC PE NC
+
+Notice how the second `SBB` subtracted `A0` instead of `9F` because of the carry flag.
+
+Why does that matter?
+Let's imagine this was a regular `SUB` instead, without a borrow.
+`9F` is odd (coprime to `100`),
+so it would take `100` iterations for `AL` to loop around
+(remember, we're working with hexadecimal here).
+The loop runs for `10000/2=8000` iterations before `DI` repeats,
+and `8000` is divisible by `100`,
+so each pass would have the exact same `AL` values for each character.
+Instead of an animation we'd get a much less impressive static screen.
+
+Instead, `AL` repeats every `55` (decimal 85) `SBB` calls,
+which is coprime to `100`,
+so the `AL` values will differ from pass to pass.
+There's probably a way to determine the period by hand but I just used Python.
+Not all operands work for this, but `9F` seems to be one of the good ones.
+
+To quote the author,
+"without CR flag, there would be no animation :)".
+
+### ending remarks
+I think I've explained every aspect of how m8trix works by now.
+I don't think I need to tell you how brilliant it is.
+
+Notice how the third byte has three different meanings!
+At first it's read as the low byte of the segment offset,
+then it's part of the `LAHF` instruction,
+and then it's the operand for the `SBB`.
+
+`STOSW` is not only the perfect instruction for writing characters in text mode,
+it also works as the high byte of the segment offset that you need to write those
+characters in the first place.
+
+Everything fits together so nicely :)
+
+## m7trix
+Soon after m8trix was published,
+several people tried coming up with ideas to shrink it down even more.
+What follows is the final version HellMood published:
+
+C:\M8TRIX>debug M7TRIX.COM
+-U
+073D:0100 C41C LES BX,[SI]
+073D:0102 9F LAHF
+073D:0103 AB STOSW
+073D:0104 91 XCHG AX,CX
+073D:0105 EBFA JMP 0101
+-U 101
+073D:0101 1C9F SBB AL,9F
+073D:0103 AB STOSW
+073D:0104 91 XCHG AX,CX
+073D:0105 EBFA JMP 0101
+
+
+Not only is this version smaller, it also looks better, as it clears the screen!
+It's also simple enough that I won't bother tracing through it again.
+
+In short -- instead of skipping over every other column,
+we swap `AX` and `CX` back and forth.
+Both are running the same character animation, but, as `CH=00`,
+every other column is rendered as black or black,
+so the characters are invisible.
+This takes care both of skipping columns AND clearing the screen.
+
+The character cycle is apparently[^apparently] different
+because the carry flag gets reused between odd and even columns,
+but the period still works out to be 85 --
+which I find interesting but I don't really feel like researching why that is.
+
+[^apparently]: The Python script I'm using for testing says so, but I can't really tell if that's true by just looking at the output.
+
+## bonus: simplified version
+This is a slightly modified version
+that works under DEBUG and doesn't use misaligned jumps.
+It's easy to experiment with
+as you can just load it into DEBUG,
+use the assembler to change a single instruction,
+and see what happens.
+```
+073D:0100 BB9FAB MOV BX,AB9F
+073D:0103 8EC3 MOV ES,BX
+073D:0105 B402 MOV AH,02
+073D:0107 AB STOSW
+073D:0108 47 INC DI
+073D:0109 47 INC DI
+073D:010A 1C9F SBB AL,9F
+073D:010C EBF9 JMP 0107
+```
--
cgit 1.4.1-2-gfad0