summary refs log tree commit diff
path: root/src/m8trix.md
blob: d8b0cd94aa613324e4edf083122adce27aaae5f7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
---
title: Dissecting m8trix
date: 2024-07-31
---

[m8trix](https://www.pouet.net/prod.php?which=63126)
by
[HellMood](https://www.pouet.net/user.php?who=97586)
is one of my favorite demos.
It packs a pretty cool Matrix-style effect in only 8 bytes:

<details>
<summary>animated gif (epilepsy warning)</summary>
<img style="width: 100%" src="//tilde.town/~dzwdz/m8trix3.gif" />
</details>

The author even provided the source with some comments:
```asm
org 100h

S: 
les bx,[si]		; sets ES to the screen, assume si = 0x100
				; 0x101 is SBB AL,9F and changes the char
				; without CR flag, there would be
				; no animation ;)
lahf			; gets 0x02 (green) in the first run
				; afterwards, it is not called again
				; because of alignment ;)
stosw			; print the green char ...
				; (is also 0xAB9F and works as segment)
inc di			; and skip one row
inc di			;
jmp short S+1   ; repeat on 0x101 
```

...yeah, I didn't really get it at first either.
Let's try to actually understand how it works
(and learn some stuff about DOS along the way).

Note that **I'll be using hexadecimal numbers "by default"** (without 0x)
throughout this article to be consistent with DEBUG's output.

## DEBUG
The only tool I'll be using on DOS's side will be DEBUG.
It's a
[delightful](https://tilde.zone/@dzwdz/112866746891156279)
little tool that ships with MS-DOS.
I've personally used the FreeDOS version under DOSBox, as that's what I had handy.

There's builtin help if you type in `?`, you can also check out
[this more in-depth guide](https://montcs.bloomu.edu/Information/LowLevel/DOS-Debug.html),
or
[this video of someone using it to assemble new binaries](https://www.youtube.com/watch?v=zc-W8xq7L5Q).

There's a small issue, though.
m8trix doesn't actually work as-is under DEBUG,
for reasons I'll explain later.

## a bad explanation of segmentation
If you're a bit rusty on how real mode segmentation works, then here's a quick reminder.
There are a few 16-bit segment registers (`CS`, `DS`, `SS`, `ES`).
When you reference memory in real mode you always[^ithink] use one of those registers,
even if it's implicit.

If you reference `ES:BX`, the real address this maps to is computed as `ES * 0x10 + BX`.
This means that there are multiple ways to reference one physical memory location
(even if that is only slightly relevant here).

As another example, `B800:1234` points to `B9234`.

[^ithink]: At least I think so, but I'm not sure.

## the first look
My comments are prefixed with a semicolon.
As mentioned, all numbers shown are in hexadecimal.
<pre><code>C:\M8TRIX>debug M8TRIX.COM
-U <i>; disassemble the beginning of the program</i>
073D:0100 C41C              <a href="//www.felixcloutier.com/x86/lds:les:lfs:lgs:lss">LES</a>     BX,[SI]
073D:0102 9F                <a href="//www.felixcloutier.com/x86/lahf">LAHF</a>
073D:0103 AB                <a href="//www.felixcloutier.com/x86/stos:stosb:stosw:stosd:stosq">STOSW</a>
073D:0104 47                <a href="//www.felixcloutier.com/x86/inc">INC</a>     DI
073D:0105 47                <a href="//www.felixcloutier.com/x86/inc">INC</a>     DI
073D:0106 EBF9              <a href="//https://www.felixcloutier.com/x86/jmp">JMP</a>     0101
-U 101 <i>; disassemble the loop body</i>
073D:0101 1C9F              <a href="//https://www.felixcloutier.com/x86/sbb">SBB</a>     AL,9F
073D:0103 AB                <a href="//www.felixcloutier.com/x86/stos:stosb:stosw:stosd:stosq">STOSW</a>
073D:0104 47                <a href="//www.felixcloutier.com/x86/inc">INC</a>     DI
073D:0105 47                <a href="//www.felixcloutier.com/x86/inc">INC</a>     DI
073D:0106 EBF9              <a href="//https://www.felixcloutier.com/x86/jmp">JMP</a>     0101
-R ; <i>look at the registers</i>
AX=FFFF BX=0000 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000
DS=073D ES=073D SS=073D CS=073D IP=0100 NV UP EI PL ZR NA PE NC
073D:0100 C41C              LES     BX,[SI]                        DS:0000=20CD
</code></pre>

Let's step through this.

### LES BX,[SI]
<pre><code>AX=FFFF BX=0000 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000
DS=073D ES=073D SS=073D CS=073D IP=0100 NV UP EI PL ZR NA PE NC
073D:0100 C41C              LES     BX,[SI]                        DS:0000=20CD
-T <i>; single step and show register state</i>
AX=FFFF BX=<b>20CD</b> CX=0008 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000
DS=073D ES=<b>9FFF</b> SS=073D CS=073D IP=0102 NV UP EI PL ZR NA PE NC
</code></pre>

`LES` loads a far pointer from memory.
The first two bytes of `[SI]` will be loaded into `BX`,
and the next two bytes will be loaded into `ES`.

We're implicitly using the `DS` segment here, which is where DOS loaded our program into.
To be more exact --
our program was loaded into `DS:0100`,
whereas `DS:0000` (which `[SI]` points at) contains the
[Program Segment Prefix](https://en.wikipedia.org/wiki/Program_Segment_Prefix).
Let's take a look at it:

<pre><code>-d 0000
073D:0000  CD 20 FF 9F 00 EA FF FF-AD DE BD 1D 94 01 00 00 . ..............
[...]
-u 0000
073D:0000 CD20              INT     20
073D:0002 FF9F00EA          CALL    FAR [BX+EA00]
[...]
</code></pre>

The first two bytes always contain `INT 20`, the instruction that quits your program.
This means that you can quit your program by jumping to `CS:0000` (`CS` = `DS` = `SP`).
DOS also ensures that the word on top of the stack is `0000`, so you can quit with a `RET`.
Nifty.
It also means that `BX` will always be set to `20CD`, but we don't actually really care about that.

The next two bytes point to the segment of the first free byte in memory.
So, by loading them into `ES`, we make it point to the first free area in memory.
On most systems that will be `9FFF`.
This is very convenient, as the
[mode 13](https://en.wikipedia.org/wiki/Mode_13h)
framebuffer begins at `A0000`, or `9FFF:0010`.
This is a
[well known sizecoding trick](http://www.sizecoding.org/wiki/General_Coding_Tricks#A_smaller_way_to_point_to_Mode_13.27s_screen_segment).

...except mode 13 is a graphic mode.
We're in mode 3[^mode3],
a text mode,
and the text buffer is located at `B800`,
completely out of reach of `ES`.
What?

[^mode3]: [`MOV AH, 0F; INT 10`](https://en.wikipedia.org/wiki/INT_10H), and look at the registers. `AL` is the current mode. 

Well, DEBUG fooled us.
When you start a program under DOS, `SI=0100`.
[Usually.](https://www.fysnet.net/yourhelp.htm)
However, for whatever reason, DEBUG zeroes it out instead.
You can fix it by running `RSI 0100`[^rsi] before the first instruction.
This is also why the page I've linked to uses `[BX]`, as you can count on it actually being zero.

[^rsi]: No, `RSI` doesn't stand for the 64-bit register. `R` is the register command, which accepts `SI` as the argument.

But let's get back to m8trix.
If `SI=0100`, then `[SI]` points to the beginning of our program!
```
-RSI 0100
-R
AX=FFFF BX=0000 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
DS=073D ES=073D SS=073D CS=073D IP=0100 NV UP EI PL ZR NA PE NC
073D:0100 C41C              LES     BX,[SI]                        DS:0100=1CC4
-T
AX=FFFF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
DS=073D ES=AB9F SS=073D CS=073D IP=0102 NV UP EI PL ZR NA PE NC
073D:0102 9F                LAHF
-U 100
073D:0100 C41C              LES     BX,[SI]
073D:0102 9F                LAHF
073D:0103 AB                STOSW
```
As you can see, this means that `BX=1CC4` (the `LES` instruction itself), and `ES=AB9F`.
This means that `ES` spans `AB9F0-BB9F0`, which includes the entire text buffer!

### LAHF
<pre><code>AX=FFFF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
DS=073D ES=AB9F SS=073D CS=073D IP=0102 NV UP EI PL ZR NA PE NC
073D:0102 9F                LAHF
-t
AX=<b>46</b>FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0000
DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL ZR NA PE NC
</code></pre>
`LAHF` is pretty straightforward, it just loads the top byte of `FLAGS` into `AH`.
Except, once again, `DEBUG` doesn't set the `FLAGS` register correctly.
If we were to run m8trix
[outside of DEBUG](https://www.fysnet.net/yourhelp.htm),
the top byte of flags would be `02`, and thus this instruction would set `AH=02`.
This can be fixed in the debugger by running `RAX 02FF`.

### STOSW
<pre><code>-rax 02FF
-rax 02FF
-r
AX=<b>02FF</b> BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=<b>0000</b>
DS=073D ES=<b>AB9F</b> SS=073D CS=073D IP=0103 NV UP EI PL ZR NA PE NC
073D:0103 AB                STOSW
-t
AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=<b>0002</b>
DS=073D ES=AB9F SS=073D CS=073D IP=0104 NV UP EI PL ZR NA PE NC
</code></pre>
`STOSW` -- "Store (word) string" -- is a bit more complex.
It writes the word at `AX` to `ES:DI`, and increments[^df] `DI` by two --
the amount of bytes written.

This instruction will be run over and over again, with `DI` taking on
every even value and overflowing every once in a while,
overwriting everything in `ES` -- including the text buffer -- over and over again.

Each character in the text buffer is represented by a word,
so each `STOSW` writes a complete character to the screen.
`AH=02` sets the color to dark green,
and `AL` (which changes each iteration) chooses the character

[^df]: If the direction flag was set, it would instead decrement it.

### skipping a column, misaligned jump
<pre><code>AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0002
DS=073D ES=AB9F SS=073D CS=073D IP=0104 NV UP EI PL ZR NA PE NC
073D:0104 47                INC     DI
-t
AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=<b>0003</b>
DS=073D ES=AB9F SS=073D CS=073D IP=0105 NV UP EI PL NZ NA PE NC
073D:0105 47                INC     DI
-t
AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
DS=073D ES=AB9F SS=073D CS=073D IP=0106 NV UP EI PL NZ NA PO NC
073D:0106 EBF9              JMP     0101
-t
AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
DS=073D ES=AB9F SS=073D CS=073D IP=0101 NV UP EI PL NZ NA PO NC
073D:0101 1C9F              SBB     AL,9F
</code></pre>
We don't want the columns to be packed too tightly together,
so we skip every other character by adding two bytes to `DI`.

We then jump to `0101`, uncovering a hidden `SBB`.

### misaligned jump, SBB
<pre><code>AX=02FF BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
DS=073D ES=AB9F SS=073D CS=073D IP=0101 NV UP EI PL NZ NA PO NC
073D:0101 1C9F              SBB     AL,9F
-t
AX=02<b>60</b> BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0004
DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL NZ NA PE NC
</code></pre>

This is the last instruction, and it's the one that modifies `AL` to animate the character.
It subtracts `9F` from `AL` with borrow, which is pretty much the grade-school approach.
That is --
if it underflows,
it will "borrow" a bit from the next byte by setting the carry flag.
The next `SBB` will see that the carry flag is set,
subtract an additional `1`,
and unset the carry flag (unless it also underflowed).

Let's see that in practice:
<pre><code>-rax 028F
-r
AX=02<b>8F</b> BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0008
DS=073D ES=AB9F SS=073D CS=073D IP=0101 NV UP EI PL NZ NA PO <b>NC</b>
073D:0101 1C9F              SBB     AL,9F
-t
AX=02<b>F0</b> BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0008
DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI NG NZ NA PE <b>CY</b>
-rip 0101 <i>; i don't care about the rest of the loop, just run the SBB again</i>
-t
AX=02<b>50</b> BX=1CC4 CX=0008 DX=0000 SP=FFFE BP=0000 SI=0100 DI=0008
DS=073D ES=AB9F SS=073D CS=073D IP=0103 NV UP EI PL NZ AC PE <b>NC</b>
</code></pre>
Notice how the second `SBB` subtracted `A0` instead of `9F` because of the carry flag.

Why does that matter?
Let's imagine this was a regular `SUB` instead, without a borrow.
`9F` is odd (coprime to `100`),
so it would take `100` iterations for `AL` to loop around
(remember, we're working with hexadecimal here).
The loop runs for `10000/2=8000` iterations before `DI` repeats,
and `8000` is divisible by `100`,
so each pass would have the exact same `AL` values for each character.
Instead of an animation we'd get a much less impressive static screen.

Instead, `AL` repeats every `55` (decimal 85) `SBB` calls,
which is coprime to `100`,
so the `AL` values will differ from pass to pass.
There's probably a way to determine the period by hand but I just used Python.
Not all operands work for this, but `9F` seems to be one of the good ones.

To quote the author,
"without CR flag, there would be no animation :)".

### ending remarks
I think I've explained every aspect of how m8trix works by now.
I don't think I need to tell you how brilliant it is.

Notice how the third byte has three different meanings!
At first it's read as the low byte of the segment offset,
then it's part of the `LAHF` instruction,
and then it's the operand for the `SBB`.

`STOSW` is not only the perfect instruction for writing characters in text mode,
it also works as the high byte of the segment offset that you need to write those
characters in the first place.

Everything fits together so nicely :)

## m7trix
Soon after m8trix was published,
several people tried coming up with ideas to shrink it down even more.
What follows is the final version HellMood published:

<pre><code>C:\M8TRIX>debug M7TRIX.COM
-U
073D:0100 C41C              <a href="//www.felixcloutier.com/x86/lds:les:lfs:lgs:lss">LES</a>     BX,[SI]
073D:0102 9F                <a href="//www.felixcloutier.com/x86/lahf">LAHF</a>
073D:0103 AB                <a href="//www.felixcloutier.com/x86/stos:stosb:stosw:stosd:stosq">STOSW</a>
073D:0104 91                <a href="//www.felixcloutier.com/x86/xchg">XCHG</a>    AX,CX
073D:0105 EBFA              <a href="//www.felixcloutier.com/x86/jmp">JMP</a>     0101
-U 101
073D:0101 1C9F              <a href="//www.felixcloutier.com/x86/sbb">SBB</a>     AL,9F
073D:0103 AB                <a href="//www.felixcloutier.com/x86/stos:stosb:stosw:stosd:stosq">STOSW</a>
073D:0104 91                <a href="//www.felixcloutier.com/x86/xchg">XCHG</a>    AX,CX
073D:0105 EBFA              <a href="//https://www.felixcloutier.com/x86/jmp">JMP</a>     0101
</code></pre>

Not only is this version smaller, it also looks better, as it clears the screen!
It's also simple enough that I won't bother tracing through it again.

In short -- instead of skipping over every other column,
we swap `AX` and `CX` back and forth.
Both are running the same character animation, but, as `CH=00`,
every other column is rendered as black or black,
so the characters are invisible.
This takes care both of skipping columns AND clearing the screen.

The character cycle is apparently[^apparently] different
because the carry flag gets reused between odd and even columns,
but the period still works out to be 85 --
which I find interesting but I don't really feel like researching why that is.

[^apparently]: The Python script I'm using for testing says so, but I can't really tell if that's true by just looking at the output.

## bonus: simplified version
This is a slightly modified version
that works under DEBUG and doesn't use misaligned jumps.
It's easy to experiment with
as you can just load it into DEBUG,
use the assembler to change a single instruction,
and see what happens.
```
073D:0100 BB9FAB            MOV     BX,AB9F
073D:0103 8EC3              MOV     ES,BX
073D:0105 B402              MOV     AH,02
073D:0107 AB                STOSW
073D:0108 47                INC     DI
073D:0109 47                INC     DI
073D:010A 1C9F              SBB     AL,9F
073D:010C EBF9              JMP     0107
```