dreamcast.wiki - User contributions [en]

SH4 FIPR Optimizations

2026-02-16T18:42:47Z

BBHoodsta:

Yo, guys. At like 1AM @ian micheal got me looking at pl_mpeg's audio decoder to see if I could see any potential gainz... So here is its innermost hottest audio synthesis loop:

<pre>
for (int i = 32; i; --i) {
float u;
u = pl_fipr(d[0], d[1], d[2], d[3], v1[0], v2[0], v1[128], v2[128]);
u += pl_fipr(d[4], d[5], d[6], d[7], v1[256], v2[256], v1[384], v2[384]);
u += pl_fipr(d[8], d[9], d[10], d[11], v1[512], v2[512], v1[640], v2[640]);
u += pl_fipr(d[12], d[13], d[14], d[15], v1[768], v2[768], v1[896], v2[896]);
d += 32;
v1++;
v2++;
*out++ = (short)((int)u >> 16);
}
</pre>
Which... you'd think would be preeeetty efficient, right? 4 back-to-back FIPRs? I mean, it is hella gainzy compared to not using FIPR.

But there are two problems with back-to-back FIPR-y, I wanna teach anyone interested:

1) Very often one of the vector arguments stays constant between FIPR calls, but unfortunately the compiler is too dumb to not reload all 8 registers between calls regardless.
* LUCKILY every argument to these FIPRs is unique so this is not applicable, but... very often that's a perf destroyer.

2) THE COMPILER CANNOT PIPELINE FIPR FOR SHIT.
* VERY applicable here. You know what the ASM looks like for these FIPR calls? Something like this:
<pre>
! load first vector arg into fv0 (nothing wrong with this)
fmov.s @%[d]+, fr0
fmov.s @%[d}+, fr1
fmov.s @%[d]+, fr2
fmov.s @%[d]+, fr3

! load second vector arg into fv4 (nothing wrong with this)
fmov.s @%[v1], fr4
add %[offset], @[v1]
fmov.s @%[v2], fr5
add %[offset], @[v2]
fmov.s @%[v1], fr6
fmov.s @%[v2], fr7

! issue actual FIPR calculation
fipr fv0, fv4

! VERY NEXT INSTRUCTION TRY TO STORE THE RESULT
fmov.s fr7, @%[result] ! PIPELINE STALL!!!!
</pre>
Now this is very very bad. FIPR has 4-5 cycles of latency, so every fucking call to FIPR, since the very next instruction tries to use the result before its been calculated, the entire pipeline must stall waiting for the result... FOR EVERY FIPR CALL.
So you're losing MASSIVE perf benefits there.
The solution? You have to pipeline your FIPRs so that while the previous FIPR call is still calculating, you're loading up and issuing the next FIPR call.

So I wrote a new routine that replaces that inner loop body doing manually pipelined FIPR calls... This should be way better:
<pre>
for (int i = 32; i; --i) {
#if 0 // Old FIPR path which didn't pipeline for shit.
float u;
u = pl_fipr(d[0], d[1], d[2], d[3], v1[0], v2[0], v1[128], v2[128]);
u += pl_fipr(d[4], d[5], d[6], d[7], v1[256], v2[256], v1[384], v2[384]);
u += pl_fipr(d[8], d[9], d[10], d[11], v1[512], v2[512], v1[640], v2[640]);
u += pl_fipr(d[12], d[13], d[14], d[15], v1[768], v2[768], v1[896], v2[896]);
#else // New hand-written FIPR path with manual pipelining
float u = shz_pl_inner_loop(d, v1, v2);
#endif
d += 32;
v1++;
v2++;
*out++ = (short)((int)u >> 16);
}
</pre>

Where the new implementation is this inline ASM:
<pre>
__always_inline
float shz_pl_inner_loop(const float *d, const float *v1, const float *v2) {
const float *td = d;
const float *tv1 = v1;
const float *tv2 = v2;
uint32_t stride;
float result;

asm volatile(R"(
! Swap to back-bank so we don't need to clobber any FP regs.
frchg

! s = 512 (stride: 128 floats * 4 bytes)
mov #2, %[s]
shll8 %[s] ! 2 << 8 = 512

! Load first vector into fv0 for first FIPR.
fmov.s @%[td]+, fr0 ! fr0 = d[0]
fmov.s @%[td]+, fr1 ! fr1 = d[1]
fmov.s @%[td]+, fr2 ! fr2 = d[2]
fmov.s @%[td]+, fr3 ! fr3 = d[3]

! Load second vector into fv4 for first FIPR
fmov.s @%[tv1], fr4 ! fr4 = v1[0]
add %[s], %[tv1] ! tv1 -> v1[128]
fmov.s @%[tv2], fr5 ! fr5 = v2[0]
add %[s], %[tv2] ! tv2 -> v2[128]
fmov.s @%[tv1], fr6 ! fr6 = v1[128]
add %[s], %[tv1] ! tv1 -> v1[256]
fmov.s @%[tv2], fr7 ! fr7 = v2[128]
add %[s], %[tv2] ! tv2 -> v2[256]

! Issue first FIPR
fipr fv0, fv4 ! fr7 = FIPR1 result

! Load first vector into fv8 for second FIPR.
fmov.s @%[td]+, fr8 ! fr8 = d[4]
fmov.s @%[td]+, fr9 ! fr9 = d[5]
fmov.s @%[td]+, fr10 ! fr10 = d[6]
fmov.s @%[td]+, fr11 ! fr11 = d[7]

! Load second vector into fv12 for second FIPR.
fmov.s @%[tv1], fr12 ! fr12 = v1[256]
add %[s], %[tv1] ! tv1 -> v1[384]
fmov.s @%[tv2], fr13 ! fr13 = v2[256]
add %[s], %[tv2] ! tv2 -> v2[384]
fmov.s @%[tv1], fr14 ! fr14 = v1[384]
add %[s], %[tv1] ! tv1 -> v1[512]
fmov.s @%[tv2], fr15 ! fr15 = v2[384]
add %[s], %[tv2] ! tv2 -> v2[512]

! Issue second FIPR
fipr fv8, fv12 ! fr15 = FIPR2 result
fmov.s fr7, @-r15 ! push FIPR1 result onto stack

! Load first vector into fv0 for third FIPR
fmov.s @%[td]+, fr0 ! fr0 = d[8]
fmov.s @%[td]+, fr1 ! fr1 = d[9]
fmov.s @%[td]+, fr2 ! fr2 = d[10]
fmov.s @%[td]+, fr3 ! fr3 = d[11]

! Load second vector into fv4 for third FIPR
fmov.s @%[tv1], fr4 ! fr4 = v1[512]
add %[s], %[tv1] ! tv1 -> v1[640]
fmov.s @%[tv2], fr5 ! fr5 = v2[512]
add %[s], %[tv2] ! tv2 -> v2[640]
fmov.s @%[tv1], fr6 ! fr6 = v1[640]
add %[s], %[tv1] ! tv1 -> v1[768]
fmov.s @%[tv2], fr7 ! fr7 = v2[640]
add %[s], %[tv2] ! tv2 -> v2[768]

! Issue third FIPR
fipr fv0, fv4 ! fr7 = FIPR3 result
fmov.s fr15, @-r15 ! push FIPR2 result onto stack

! Load first vector into fv8 for fourth FIPR
fmov.s @%[td]+, fr8 ! fr8 = d[12]
fmov.s @%[td]+, fr9 ! fr9 = d[13]
fmov.s @%[td]+, fr10 ! fr10 = d[14]
fmov.s @%[td]+, fr11 ! fr11 = d[15]

! Load second vector into fv12 for fourth FIPR
fmov.s @%[tv1], fr12 ! fr12 = v1[768]
add %[s], %[tv1] ! tv1 -> v1[896]
fmov.s @%[tv2], fr13 ! fr13 = v2[768]
add %[s], %[tv2] ! tv2 -> v2[896]
fmov.s @%[tv1], fr14 ! fr14 = v1[896]
fmov.s @%[tv2], fr15 ! fr15 = v2[896]

! Issue fourth FIPR
fipr fv8, fv12 ! fr15 = FIPR4 result

! Add up results from previous FIPRs while we wait
fmov.s @r15+, fr0 ! pop FIPR2 result
fmov.s @r15+, fr1 ! pop FIPR1 result
fadd fr1, fr0 ! fr0 = FIPR1 + FIPR2
fadd fr7, fr0 ! fr0 += FIPR3

! Add result from fourth FIPR now that it's ready
fadd fr15, fr0 ! fr0 += FIPR4

! Transfer result to primary bank via FPUL
flds fr0, FPUL ! secondary fr0 -> FPUL
frchg ! Switch back to primary FP bank
fsts FPUL, %[result] ! FPUL -> result register (primary bank)
)"
: [td] "+r" (td), [tv1] "+r" (tv1), [tv2] "+r" (tv2),
[s] "=r" (stride), [result] "=f" (result)
:
: "memory");

return result;
}
</pre>

SH4 FIPR Optimizations

2026-02-16T18:40:26Z

BBHoodsta: Updated the function because it crashed before

Twiddling

2024-08-19T21:57:12Z

BBHoodsta: /* How and why the Dreamcast uses Twiddling */

== General Idea ==

Twiddling, sometimes referred to as Swizzling in Playstation communities, and better known as Morton Encoding or a [https://en.wikipedia.org/wiki/Z-order_curve#l134 Z/N-Ordered curve], is a method of data organization that retains [https://en.wikipedia.org/wiki/Locality_of_reference#L146 Locality of Reference], which means that elements that reside physically close together in space, will be grouped together in memory. In the context of texture organization, this means that twiddling an image will make adjacent pixels to the right and below any given pixel reside close together in memory. This yields numerous benefits, such as easier calculation for AA and a texel configuration necessary for [https://en.wikipedia.org/wiki/Vector_quantization#L135 Vector Quantization] compression.

== Origins and Classical Implementation ==

The term "Twiddling" comes from the hacker term "bit-twiddling" owing to the classical way to calculate a Z-Ordered curve by manipulating the bits that make up the data (texel) index. The bit-twiddling way to arrive at a morton code is to take the binary representation of the X and Y coordinates of a texel and interleave them into one bitstring. The resultant bitstring will be twice the size of each individual input bitstring. For example, say you have a 4-bit number representing the X position of a Texel in a texture (e.g. XXXX) and you had a 4-bit number representing the Y position of a Texel (eg. YYYY), then your Z-Order position would be XYXY-XYXY (8-bit). This number is the index of where this texel lies in a new array that constitutes all the twiddled texels in the texture. If you convert every texel in the source texture into this new twiddled texture array, then iterating through the index will be the equivalent of navigating the source texture in a Z-pattern.

Whether one is a Z-ordered curve or an N-ordered curve depends on whether you shift the X or Y bitstring, effectively making the traversal width by height (Z) or height by width (N). Technically, the dreamcast uses an N-ordered curve.

A problem with using Z-ordered curves is that it's expensive to compute every frame because it uses division and multiplication heavily. Thus there exists numerous bit-twiddling hacks to speed up this operation, covered below.

== Conceptualizing Twiddling ==

Lets start with a recap of what Twiddled textures even are. Twiddled textures is just a particular way of re-organising pixels in an image so they're quicker to render.

[[File:Twiddle.png|thumb]]

The example image where the numbers represent the original un-twiddled indexes and the "inverted Ns" show the original flow of indexes. Indexes from the original image were calculated from left to right, top to bottom (Scanline order). So we can see after index 0, number 1 is just below, 2 is to the right of 0 and 3 is just below 2. Then if we go to the next biggest inverted N we can see the order '''{0,1,2,3}, {4,5,6,7}, {8,9,10,11}, {12,13,14,15}''' following the same inverted N pattern.

So if we are given index '''i''' from an untwiddled image and wished to find the twiddled index, then its a process of recursively narrowing down what part of the twiddled image that pixel now lives in.

Example:

Original: Twiddled:

0 1 2 3 0 2 8 A
4 5 6 7 1 3 9 B
8 9 A B 4 6 C E
C D E F 5 7 D F
G H I J G I O Q
K L M N H J P R
O P Q R K M S U
S T U V L N T V
W X Y Z W Y % &
~ ! # $ X Z ^ *
% ^ & * ~ # ( _
( ) _ + ! $ ) +

== Look-up Table Hack ==
Taken from [https://graphics.stanford.edu/~seander/bithacks.html#InterleaveTableLookup#Stanford Stanford Bit-twiddling hacks page].

<syntaxhighlight lang="c">static const unsigned short MortonTable256[256] =
{
0x0000, 0x0001, 0x0004, 0x0005, 0x0010, 0x0011, 0x0014, 0x0015,
0x0040, 0x0041, 0x0044, 0x0045, 0x0050, 0x0051, 0x0054, 0x0055,
0x0100, 0x0101, 0x0104, 0x0105, 0x0110, 0x0111, 0x0114, 0x0115,
0x0140, 0x0141, 0x0144, 0x0145, 0x0150, 0x0151, 0x0154, 0x0155,
0x0400, 0x0401, 0x0404, 0x0405, 0x0410, 0x0411, 0x0414, 0x0415,
0x0440, 0x0441, 0x0444, 0x0445, 0x0450, 0x0451, 0x0454, 0x0455,
0x0500, 0x0501, 0x0504, 0x0505, 0x0510, 0x0511, 0x0514, 0x0515,
0x0540, 0x0541, 0x0544, 0x0545, 0x0550, 0x0551, 0x0554, 0x0555,
0x1000, 0x1001, 0x1004, 0x1005, 0x1010, 0x1011, 0x1014, 0x1015,
0x1040, 0x1041, 0x1044, 0x1045, 0x1050, 0x1051, 0x1054, 0x1055,
0x1100, 0x1101, 0x1104, 0x1105, 0x1110, 0x1111, 0x1114, 0x1115,
0x1140, 0x1141, 0x1144, 0x1145, 0x1150, 0x1151, 0x1154, 0x1155,
0x1400, 0x1401, 0x1404, 0x1405, 0x1410, 0x1411, 0x1414, 0x1415,
0x1440, 0x1441, 0x1444, 0x1445, 0x1450, 0x1451, 0x1454, 0x1455,
0x1500, 0x1501, 0x1504, 0x1505, 0x1510, 0x1511, 0x1514, 0x1515,
0x1540, 0x1541, 0x1544, 0x1545, 0x1550, 0x1551, 0x1554, 0x1555,
0x4000, 0x4001, 0x4004, 0x4005, 0x4010, 0x4011, 0x4014, 0x4015,
0x4040, 0x4041, 0x4044, 0x4045, 0x4050, 0x4051, 0x4054, 0x4055,
0x4100, 0x4101, 0x4104, 0x4105, 0x4110, 0x4111, 0x4114, 0x4115,
0x4140, 0x4141, 0x4144, 0x4145, 0x4150, 0x4151, 0x4154, 0x4155,
0x4400, 0x4401, 0x4404, 0x4405, 0x4410, 0x4411, 0x4414, 0x4415,
0x4440, 0x4441, 0x4444, 0x4445, 0x4450, 0x4451, 0x4454, 0x4455,
0x4500, 0x4501, 0x4504, 0x4505, 0x4510, 0x4511, 0x4514, 0x4515,
0x4540, 0x4541, 0x4544, 0x4545, 0x4550, 0x4551, 0x4554, 0x4555,
0x5000, 0x5001, 0x5004, 0x5005, 0x5010, 0x5011, 0x5014, 0x5015,
0x5040, 0x5041, 0x5044, 0x5045, 0x5050, 0x5051, 0x5054, 0x5055,
0x5100, 0x5101, 0x5104, 0x5105, 0x5110, 0x5111, 0x5114, 0x5115,
0x5140, 0x5141, 0x5144, 0x5145, 0x5150, 0x5151, 0x5154, 0x5155,
0x5400, 0x5401, 0x5404, 0x5405, 0x5410, 0x5411, 0x5414, 0x5415,
0x5440, 0x5441, 0x5444, 0x5445, 0x5450, 0x5451, 0x5454, 0x5455,
0x5500, 0x5501, 0x5504, 0x5505, 0x5510, 0x5511, 0x5514, 0x5515,
0x5540, 0x5541, 0x5544, 0x5545, 0x5550, 0x5551, 0x5554, 0x5555
};

unsigned short x; // Interleave bits of x and y, so that all of the
unsigned short y; // bits of x are in the even positions and y in the odd;
unsigned int z; // z gets the resulting 32-bit Morton Number.

z = MortonTable256[y >> 8] << 17 |
MortonTable256[x >> 8] << 16 |
MortonTable256[y & 0xFF] << 1 |
MortonTable256[x & 0xFF];</syntaxhighlight>

For more speed, use an additional table with values that are MortonTable256 pre-shifted one bit to the left. This second table could then be used for the y lookups, thus reducing the operations by two, but almost doubling the memory required. Extending this same idea, four tables could be used, with two of them pre-shifted by 16 to the left of the previous two, so that we would only need 11 operations total.

== Binary Magic Numbers Hack ==
Taken from [https://graphics.stanford.edu/~seander/bithacks.html#InterleaveTableLookup#Stanford Stanford Bit-twiddling hacks page].

<syntaxhighlight lang="c">static const unsigned int B[] = {0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF};
static const unsigned int S[] = {1, 2, 4, 8};

unsigned int x; // Interleave lower 16 bits of x and y, so the bits of x
unsigned int y; // are in the even positions and bits from y in the odd;
unsigned int z; // z gets the resulting 32-bit Morton Number. x and y must initially be less than 65536.

x = (x | (x << S[3])) & B[3];
x = (x | (x << S[2])) & B[2];
x = (x | (x << S[1])) & B[1];
x = (x | (x << S[0])) & B[0];
y = (y | (y << S[3])) & B[3];
y = (y | (y << S[2])) & B[2];
y = (y | (y << S[1])) & B[1];
y = (y | (y << S[0])) & B[0];
z = x | (y << 1);</syntaxhighlight>

== Non-Dreamcast Bit-twiddling Hacks ==
On modern x86_64 processors from Intel actually have instructions built in to handle z-ordered curves. These are the Parallel Bit Deposit (PDEP) and Parallel bit Extraction (PEXT) instructions, which can be used in conjunction to interleave a bitstring. As the Dreamcast lacks these instructions, this is not a viable dreamcast solution.

Another method for twiddling comes from multiplication without carry. A number multiplied upon itself using a carry-less multiplication will yield the original bitstring of the number interleaved with 0s. For example, given that the number 255 is 1111-1111 in binary, 255 multiplied-without-carry by 255 reveals a 16-bit number that is (1010-1010 1010-1010). Thus, if you use multiply without carry on the X and Y position of the texel in the texture, you'll arrive at two 16-bit numbers, e.g. X0X0-X0X0 X0X0-X0X0 and Y0Y0-Y0Y0 Y0Y0-Y0Y0. If you bitshift the X value to the right by 1, and then OR the X bitstring by the Y bitstring, the resultant 16-bit bitstring will be twiddled.

The Dreamcast's SH4 CPU lacks a multiply-without-carry instruction, although you could create one that uses only addition like so:

<syntaxhighlight lang="c">int multiplyWithoutCarry(int a, int b) {
int result = 0;
int multiplier = 1;

while (b != 0) {
int digit = b;
int temp = a;

while (digit > 9) {
digit -= 10;
temp += a;
}

while (digit > 0) {
result += temp;
digit--;
}

int divisor = 10;
int tempMultiplier = multiplier;

while (divisor > 1) {
if (divisor <= tempMultiplier) {
tempMultiplier -= divisor;
divisor = divisor << 1;
multiplier = multiplier << 1;
}
else {
divisor = divisor >> 1;
tempMultiplier = tempMultiplier >> 1;
}
}

b -= tempMultiplier;
}

return result;
}</syntaxhighlight>

== Protofall's Implementation ==

How to generated the twiddled index from an untwiddled texture:

Original: Twiddled:

0 1 2 3 0 2 8 A
4 5 6 7 1 3 9 B
8 9 A B 4 6 C E
C D E F 5 7 D F
G H I J G I O Q
K L M N H J P R
O P Q R K M S U
S T U V L N T V
W X Y Z W Y % &
~ ! # $ X Z ^ *
% ^ & * ~ # ( _
( ) _ + ! $ ) +

The matching characters between the two images represent the same pixel, just relocated. These images would be 4 * 12 pixel images, but the steps work for any valid '''2^x * 2^'''y sizes, where x and y are whole numbers.

Now lets say we want to find the twiddled index of the untwiddled '''"O"''' pixel (index 24). By hand we can work it out and tell the twiddle index should be "18", but what algorithm/logic can we use to find this automatically for any '''i'''?

Here are my steps:
* We first need to start by figuring out the "Biggest-Order Inverted-N" ('''BOIN''') that fits in this image.
* Now if our starting image was a square, then the BOIN is the same size as the image
* For rectangles like this, we have to find the smallest side first (width) then our BOIN is width * width
* If we start off with a rectangle, then we need to do an extra step that squares can skip.
* Notice how we can completely encapsulate the whole image with '''(bigger_side / smaller_side) == 3''' BOINs? Our first step is to determine which of these BOINs our index '''i''' belongs in.
* We can take advantage of a quirk I mentioned earlier. Notice how the first BOIN contains the first 1/3 of the original pixels, the 2nd BOIN contains the next 1/3 and the 3rd BOIN contains the last 1/3.
* Therefore using the formula '''k = floor(i / (BOIN area == 4 * 4 = 16)) == 1''' we can determine that our twiddled index is somewhere in the middle/2nd BOIN (Since '''k''' is of the set '''{0,1,2}''')
* Note the index where our BOIN starts according to the original texture. The first index in the 2nd BOIN is "16". Keep track of this value, lets call it '''d'''
* Also keep track of the index where our BOIN starts according to the twiddled texture, this is also '''16''' in this case. Lets add this to a running sum '''s'''
* Forget about the other two BOINs and subtract '''d''' from the indexes in our new BOIN as well as '''i'''

So now we have:

i == 8
0 2 8 A
1 3 9 B
4 6 C E
5 7 D F

Great! We can already see by hand that this still looks right, but how do we automatically solve square BOINs?
* In order to solve a square BOIN, we need to determine what quadrant our pixel is in
* So we determine how many pixels are in each quadrant (4 per quadrant here, '''== a'''), Then calculate '''k = floor(i / a) == 2''' to know its in the 3rd quadrant ('''k''' is in the set '''{0,1,2,3}''').
* That means its in the top right. So we need to set '''d = a * k'''), add our new '''s''' value to the running sum, discard the other quadrants, then subtract '''i''' and the new BOIN's indexes by '''d'''
* The easy way to calculate the new part of '''s''' is that:
** top left quadrant is '''0'''
** top right quad is '''BOIN-width / 2'''
** bottom left is '''BOIN-width * (BOIN-height / 2)'''
** bottom right is '''(BOIN-width * (BOIN-height / 2)) + (BOIN-width / 2)'''

Now we have:

i == 0
0 2
1 3

You would repeat until we have a single pixel. Once we have the last pixel, our new twiddled index should be the running sum '''s''' (16 + 2 + 0 == 18)

== How and why the Dreamcast uses Twiddling ==

Twiddling is used on the Dreamcast for two major purposes, both owing to the same mechanism. Inside the Dreamcast, there exists a form of hardware compression in the PVR called [https://en.wikipedia.org/wiki/Vector_quantization#L135 Vector Quantization], or VQ for short. VQ works by taking an image, and splitting the image up into tiles made of 2x2 pixel patterns. Each pattern is stored in a special bit of memory on the Dreamcast known as the VQ Dictionary. The VQ dictionary contains enough space to hold 1024 of these 2x2 pixel patterns. The purpose of this tiling is to reduce the ultimate size in memory of the original textured image, as the new texture, instead of containing RGB values for each texel, instead stores a single index value that references the VQ dictionary for every 4 texels. This is considered a form of [https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch*L136 Lempel Ziv Welch compression].

One can use VQ to compress textures in two ways. Firstly, one can do basic image compression as described above. However, the Dreamcast has a secondary use for the same VQ hardware, which allows it to mimic dynamic palettes of older game consoles. In this case, the original textured image is treated as though it was scaled up, so each texel comprises a 2x2 pixel region. When this is done, the VQ dictionary entries become 2x2 pixel patterns of solid color. This effectively maps single texel colors in the original texture, to VQ 2x2 solor color entries, like a palette. By changing the definition of a VQ dictionary 2x2 pixel pattern to a different solid color, you can alter every texel in the texture which references that color index. In this palette mode, 1024 entries is broken up into two formats which can be selected by the user. The first divides the palette up into sixty-four banks of 16 colors each, which makes the original texture behave like a [https://en.wikipedia.org/wiki/Color_depth#L137 4-bits per pixel] image. The other format divides the palette up into four 256 color banks, which is an 8bpp texture format.

Interestingly, it has been discovered that applying the twiddle operation three times in succession on a twiddled texture can effectively untwiddle it.

A last, side benefit of Twiddling on the Dreamcast is that is provides the precise pixels needed to do a 2x2 area Anti-Alaiasing filter in hardware at no extra cost.

== DISCLAIMER ==

This theorized solution has only been tested on a few examples by hand, so I might have missed something. But I believe at least the general logic of this is sound. Also note for implementation, some of the divisions could be replaced with bit-shifting since some of those numbers are guaranteed to be powers of 2.

For an example of an algorithm that does the reverse (Convert twiddled index to untwiddled), you can refer to [https://github.com/Protofall/Crayon-Utilities/blob/master/DtexToRGBA8888/DtexToRGBA8888.c#L146 this code made by JamoHTP]

Twiddling

2024-08-19T21:50:21Z

BBHoodsta: /* How and why the Dreamcast uses Twiddling */

== General Idea ==

Twiddling, sometimes referred to as Swizzling in Playstation communities, and better known as Morton Encoding or a [https://en.wikipedia.org/wiki/Z-order_curve#l134 Z/N-Ordered curve], is a method of data organization that retains [https://en.wikipedia.org/wiki/Locality_of_reference#L146 Locality of Reference], which means that elements that reside physically close together in space, will be grouped together in memory. In the context of texture organization, this means that twiddling an image will make adjacent pixels to the right and below any given pixel reside close together in memory. This yields numerous benefits, such as easier calculation for AA and a texel configuration necessary for [https://en.wikipedia.org/wiki/Vector_quantization#L135 Vector Quantization] compression.

== Origins and Classical Implementation ==

The term "Twiddling" comes from the hacker term "bit-twiddling" owing to the classical way to calculate a Z-Ordered curve by manipulating the bits that make up the data (texel) index. The bit-twiddling way to arrive at a morton code is to take the binary representation of the X and Y coordinates of a texel and interleave them into one bitstring. The resultant bitstring will be twice the size of each individual input bitstring. For example, say you have a 4-bit number representing the X position of a Texel in a texture (e.g. XXXX) and you had a 4-bit number representing the Y position of a Texel (eg. YYYY), then your Z-Order position would be XYXY-XYXY (8-bit). This number is the index of where this texel lies in a new array that constitutes all the twiddled texels in the texture. If you convert every texel in the source texture into this new twiddled texture array, then iterating through the index will be the equivalent of navigating the source texture in a Z-pattern.

Whether one is a Z-ordered curve or an N-ordered curve depends on whether you shift the X or Y bitstring, effectively making the traversal width by height (Z) or height by width (N). Technically, the dreamcast uses an N-ordered curve.

A problem with using Z-ordered curves is that it's expensive to compute every frame because it uses division and multiplication heavily. Thus there exists numerous bit-twiddling hacks to speed up this operation, covered below.

== Conceptualizing Twiddling ==

Lets start with a recap of what Twiddled textures even are. Twiddled textures is just a particular way of re-organising pixels in an image so they're quicker to render.

[[File:Twiddle.png|thumb]]

The example image where the numbers represent the original un-twiddled indexes and the "inverted Ns" show the original flow of indexes. Indexes from the original image were calculated from left to right, top to bottom (Scanline order). So we can see after index 0, number 1 is just below, 2 is to the right of 0 and 3 is just below 2. Then if we go to the next biggest inverted N we can see the order '''{0,1,2,3}, {4,5,6,7}, {8,9,10,11}, {12,13,14,15}''' following the same inverted N pattern.

So if we are given index '''i''' from an untwiddled image and wished to find the twiddled index, then its a process of recursively narrowing down what part of the twiddled image that pixel now lives in.

Example:

Original: Twiddled:

0 1 2 3 0 2 8 A
4 5 6 7 1 3 9 B
8 9 A B 4 6 C E
C D E F 5 7 D F
G H I J G I O Q
K L M N H J P R
O P Q R K M S U
S T U V L N T V
W X Y Z W Y % &
~ ! # $ X Z ^ *
% ^ & * ~ # ( _
( ) _ + ! $ ) +

== Look-up Table Hack ==
Taken from [https://graphics.stanford.edu/~seander/bithacks.html#InterleaveTableLookup#Stanford Stanford Bit-twiddling hacks page].

<syntaxhighlight lang="c">static const unsigned short MortonTable256[256] =
{
0x0000, 0x0001, 0x0004, 0x0005, 0x0010, 0x0011, 0x0014, 0x0015,
0x0040, 0x0041, 0x0044, 0x0045, 0x0050, 0x0051, 0x0054, 0x0055,
0x0100, 0x0101, 0x0104, 0x0105, 0x0110, 0x0111, 0x0114, 0x0115,
0x0140, 0x0141, 0x0144, 0x0145, 0x0150, 0x0151, 0x0154, 0x0155,
0x0400, 0x0401, 0x0404, 0x0405, 0x0410, 0x0411, 0x0414, 0x0415,
0x0440, 0x0441, 0x0444, 0x0445, 0x0450, 0x0451, 0x0454, 0x0455,
0x0500, 0x0501, 0x0504, 0x0505, 0x0510, 0x0511, 0x0514, 0x0515,
0x0540, 0x0541, 0x0544, 0x0545, 0x0550, 0x0551, 0x0554, 0x0555,
0x1000, 0x1001, 0x1004, 0x1005, 0x1010, 0x1011, 0x1014, 0x1015,
0x1040, 0x1041, 0x1044, 0x1045, 0x1050, 0x1051, 0x1054, 0x1055,
0x1100, 0x1101, 0x1104, 0x1105, 0x1110, 0x1111, 0x1114, 0x1115,
0x1140, 0x1141, 0x1144, 0x1145, 0x1150, 0x1151, 0x1154, 0x1155,
0x1400, 0x1401, 0x1404, 0x1405, 0x1410, 0x1411, 0x1414, 0x1415,
0x1440, 0x1441, 0x1444, 0x1445, 0x1450, 0x1451, 0x1454, 0x1455,
0x1500, 0x1501, 0x1504, 0x1505, 0x1510, 0x1511, 0x1514, 0x1515,
0x1540, 0x1541, 0x1544, 0x1545, 0x1550, 0x1551, 0x1554, 0x1555,
0x4000, 0x4001, 0x4004, 0x4005, 0x4010, 0x4011, 0x4014, 0x4015,
0x4040, 0x4041, 0x4044, 0x4045, 0x4050, 0x4051, 0x4054, 0x4055,
0x4100, 0x4101, 0x4104, 0x4105, 0x4110, 0x4111, 0x4114, 0x4115,
0x4140, 0x4141, 0x4144, 0x4145, 0x4150, 0x4151, 0x4154, 0x4155,
0x4400, 0x4401, 0x4404, 0x4405, 0x4410, 0x4411, 0x4414, 0x4415,
0x4440, 0x4441, 0x4444, 0x4445, 0x4450, 0x4451, 0x4454, 0x4455,
0x4500, 0x4501, 0x4504, 0x4505, 0x4510, 0x4511, 0x4514, 0x4515,
0x4540, 0x4541, 0x4544, 0x4545, 0x4550, 0x4551, 0x4554, 0x4555,
0x5000, 0x5001, 0x5004, 0x5005, 0x5010, 0x5011, 0x5014, 0x5015,
0x5040, 0x5041, 0x5044, 0x5045, 0x5050, 0x5051, 0x5054, 0x5055,
0x5100, 0x5101, 0x5104, 0x5105, 0x5110, 0x5111, 0x5114, 0x5115,
0x5140, 0x5141, 0x5144, 0x5145, 0x5150, 0x5151, 0x5154, 0x5155,
0x5400, 0x5401, 0x5404, 0x5405, 0x5410, 0x5411, 0x5414, 0x5415,
0x5440, 0x5441, 0x5444, 0x5445, 0x5450, 0x5451, 0x5454, 0x5455,
0x5500, 0x5501, 0x5504, 0x5505, 0x5510, 0x5511, 0x5514, 0x5515,
0x5540, 0x5541, 0x5544, 0x5545, 0x5550, 0x5551, 0x5554, 0x5555
};

unsigned short x; // Interleave bits of x and y, so that all of the
unsigned short y; // bits of x are in the even positions and y in the odd;
unsigned int z; // z gets the resulting 32-bit Morton Number.

z = MortonTable256[y >> 8] << 17 |
MortonTable256[x >> 8] << 16 |
MortonTable256[y & 0xFF] << 1 |
MortonTable256[x & 0xFF];</syntaxhighlight>

For more speed, use an additional table with values that are MortonTable256 pre-shifted one bit to the left. This second table could then be used for the y lookups, thus reducing the operations by two, but almost doubling the memory required. Extending this same idea, four tables could be used, with two of them pre-shifted by 16 to the left of the previous two, so that we would only need 11 operations total.

== Binary Magic Numbers Hack ==
Taken from [https://graphics.stanford.edu/~seander/bithacks.html#InterleaveTableLookup#Stanford Stanford Bit-twiddling hacks page].

<syntaxhighlight lang="c">static const unsigned int B[] = {0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF};
static const unsigned int S[] = {1, 2, 4, 8};

unsigned int x; // Interleave lower 16 bits of x and y, so the bits of x
unsigned int y; // are in the even positions and bits from y in the odd;
unsigned int z; // z gets the resulting 32-bit Morton Number. x and y must initially be less than 65536.

x = (x | (x << S[3])) & B[3];
x = (x | (x << S[2])) & B[2];
x = (x | (x << S[1])) & B[1];
x = (x | (x << S[0])) & B[0];
y = (y | (y << S[3])) & B[3];
y = (y | (y << S[2])) & B[2];
y = (y | (y << S[1])) & B[1];
y = (y | (y << S[0])) & B[0];
z = x | (y << 1);</syntaxhighlight>

== Non-Dreamcast Bit-twiddling Hacks ==
On modern x86_64 processors from Intel actually have instructions built in to handle z-ordered curves. These are the Parallel Bit Deposit (PDEP) and Parallel bit Extraction (PEXT) instructions, which can be used in conjunction to interleave a bitstring. As the Dreamcast lacks these instructions, this is not a viable dreamcast solution.

Another method for twiddling comes from multiplication without carry. A number multiplied upon itself using a carry-less multiplication will yield the original bitstring of the number interleaved with 0s. For example, given that the number 255 is 1111-1111 in binary, 255 multiplied-without-carry by 255 reveals a 16-bit number that is (1010-1010 1010-1010). Thus, if you use multiply without carry on the X and Y position of the texel in the texture, you'll arrive at two 16-bit numbers, e.g. X0X0-X0X0 X0X0-X0X0 and Y0Y0-Y0Y0 Y0Y0-Y0Y0. If you bitshift the X value to the right by 1, and then OR the X bitstring by the Y bitstring, the resultant 16-bit bitstring will be twiddled.

The Dreamcast's SH4 CPU lacks a multiply-without-carry instruction, although you could create one that uses only addition like so:

<syntaxhighlight lang="c">int multiplyWithoutCarry(int a, int b) {
int result = 0;
int multiplier = 1;

while (b != 0) {
int digit = b;
int temp = a;

while (digit > 9) {
digit -= 10;
temp += a;
}

while (digit > 0) {
result += temp;
digit--;
}

int divisor = 10;
int tempMultiplier = multiplier;

while (divisor > 1) {
if (divisor <= tempMultiplier) {
tempMultiplier -= divisor;
divisor = divisor << 1;
multiplier = multiplier << 1;
}
else {
divisor = divisor >> 1;
tempMultiplier = tempMultiplier >> 1;
}
}

b -= tempMultiplier;
}

return result;
}</syntaxhighlight>

== Protofall's Implementation ==

How to generated the twiddled index from an untwiddled texture:

Original: Twiddled:

0 1 2 3 0 2 8 A
4 5 6 7 1 3 9 B
8 9 A B 4 6 C E
C D E F 5 7 D F
G H I J G I O Q
K L M N H J P R
O P Q R K M S U
S T U V L N T V
W X Y Z W Y % &
~ ! # $ X Z ^ *
% ^ & * ~ # ( _
( ) _ + ! $ ) +

The matching characters between the two images represent the same pixel, just relocated. These images would be 4 * 12 pixel images, but the steps work for any valid '''2^x * 2^'''y sizes, where x and y are whole numbers.

Now lets say we want to find the twiddled index of the untwiddled '''"O"''' pixel (index 24). By hand we can work it out and tell the twiddle index should be "18", but what algorithm/logic can we use to find this automatically for any '''i'''?

Here are my steps:
* We first need to start by figuring out the "Biggest-Order Inverted-N" ('''BOIN''') that fits in this image.
* Now if our starting image was a square, then the BOIN is the same size as the image
* For rectangles like this, we have to find the smallest side first (width) then our BOIN is width * width
* If we start off with a rectangle, then we need to do an extra step that squares can skip.
* Notice how we can completely encapsulate the whole image with '''(bigger_side / smaller_side) == 3''' BOINs? Our first step is to determine which of these BOINs our index '''i''' belongs in.
* We can take advantage of a quirk I mentioned earlier. Notice how the first BOIN contains the first 1/3 of the original pixels, the 2nd BOIN contains the next 1/3 and the 3rd BOIN contains the last 1/3.
* Therefore using the formula '''k = floor(i / (BOIN area == 4 * 4 = 16)) == 1''' we can determine that our twiddled index is somewhere in the middle/2nd BOIN (Since '''k''' is of the set '''{0,1,2}''')
* Note the index where our BOIN starts according to the original texture. The first index in the 2nd BOIN is "16". Keep track of this value, lets call it '''d'''
* Also keep track of the index where our BOIN starts according to the twiddled texture, this is also '''16''' in this case. Lets add this to a running sum '''s'''
* Forget about the other two BOINs and subtract '''d''' from the indexes in our new BOIN as well as '''i'''

So now we have:

i == 8
0 2 8 A
1 3 9 B
4 6 C E
5 7 D F

Great! We can already see by hand that this still looks right, but how do we automatically solve square BOINs?
* In order to solve a square BOIN, we need to determine what quadrant our pixel is in
* So we determine how many pixels are in each quadrant (4 per quadrant here, '''== a'''), Then calculate '''k = floor(i / a) == 2''' to know its in the 3rd quadrant ('''k''' is in the set '''{0,1,2,3}''').
* That means its in the top right. So we need to set '''d = a * k'''), add our new '''s''' value to the running sum, discard the other quadrants, then subtract '''i''' and the new BOIN's indexes by '''d'''
* The easy way to calculate the new part of '''s''' is that:
** top left quadrant is '''0'''
** top right quad is '''BOIN-width / 2'''
** bottom left is '''BOIN-width * (BOIN-height / 2)'''
** bottom right is '''(BOIN-width * (BOIN-height / 2)) + (BOIN-width / 2)'''

Now we have:

i == 0
0 2
1 3

You would repeat until we have a single pixel. Once we have the last pixel, our new twiddled index should be the running sum '''s''' (16 + 2 + 0 == 18)

== How and why the Dreamcast uses Twiddling ==

Twiddling is used on the Dreamcast for two major purposes, both owing to the same mechanism. Inside the Dreamcast, there exists a form of hardware compression in the PVR called [https://en.wikipedia.org/wiki/Vector_quantization#L135 Vector Quantization], or VQ for short. VQ works by taking an image, and splitting the image up into tiles made of 2x2 pixel patterns. Each pattern is stored in a special bit of memory on the Dreamcast known as the VQ Dictionary. The VQ dictionary contains enough space to hold 1024 of these 2x2 pixel patterns. The purpose of this tiling is to reduce the ultimate size in memory of the original textured image, as the new texture, instead of containing RGB values for each texel, instead stores a single index value that references the VQ dictionary for every 4 texels. This is considered a form of [https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch*L136 Lempel Ziv Welch compression].

One can use VQ to compress textures in two ways. Firstly, one can do basic image compression as described above. However, the Dreamcast has a secondary use for the same VQ hardware, which allows it to mimic dynamic palettes of older game consoles. In this case, the original textured image is treated as though it was scaled up, so each texel comprises a 2x2 pixel region. When this is done, the VQ dictionary entries become 2x2 pixel patterns of solid color. This effectively maps single texel colors in the original texture, to VQ 2x2 solor color entries, like a palette. By changing the definition of a VQ dictionary 2x2 pixel pattern to a different solid color, you can alter every texel in the texture which references that color index. In this palette mode, 1024 entries is broken up into two formats which can be selected by the user. The first divides the palette up into sixty-four banks of 16 colors each, which makes the original texture behave like a [https://en.wikipedia.org/wiki/Color_depth#L137 4-bits per pixel] image. The other format divides the palette up into four 256 color banks, which is an 8bpp texture format.

Interestingly, it has been discovered that applying the twiddle operation two times in succession on a twiddled texture can effectively untwiddle it.

A last, side benefit of Twiddling on the Dreamcast is that is provides the precise pixels needed to do a 2x2 area Anti-Alaiasing filter in hardware at no extra cost.

== DISCLAIMER ==

This theorized solution has only been tested on a few examples by hand, so I might have missed something. But I believe at least the general logic of this is sound. Also note for implementation, some of the divisions could be replaced with bit-shifting since some of those numbers are guaranteed to be powers of 2.

For an example of an algorithm that does the reverse (Convert twiddled index to untwiddled), you can refer to [https://github.com/Protofall/Crayon-Utilities/blob/master/DtexToRGBA8888/DtexToRGBA8888.c#L146 this code made by JamoHTP]

Twiddling

2024-08-19T18:34:00Z

BBHoodsta: /* How and why the Dreamcast uses Twiddling */

== General Idea ==

Twiddling, sometimes referred to as Swizzling in Playstation communities, and better known as Morton Encoding or a [https://en.wikipedia.org/wiki/Z-order_curve#l134 Z/N-Ordered curve], is a method of data organization that retains [https://en.wikipedia.org/wiki/Locality_of_reference#L146 Locality of Reference], which means that elements that reside physically close together in space, will be grouped together in memory. In the context of texture organization, this means that twiddling an image will make adjacent pixels to the right and below any given pixel reside close together in memory. This yields numerous benefits, such as easier calculation for AA and a texel configuration necessary for [https://en.wikipedia.org/wiki/Vector_quantization#L135 Vector Quantization] compression.

== Origins and Classical Implementation ==

The term "Twiddling" comes from the hacker term "bit-twiddling" owing to the classical way to calculate a Z-Ordered curve by manipulating the bits that make up the data (texel) index. The bit-twiddling way to arrive at a morton code is to take the binary representation of the X and Y coordinates of a texel and interleave them into one bitstring. The resultant bitstring will be twice the size of each individual input bitstring. For example, say you have a 4-bit number representing the X position of a Texel in a texture (e.g. XXXX) and you had a 4-bit number representing the Y position of a Texel (eg. YYYY), then your Z-Order position would be XYXY-XYXY (8-bit). This number is the index of where this texel lies in a new array that constitutes all the twiddled texels in the texture. If you convert every texel in the source texture into this new twiddled texture array, then iterating through the index will be the equivalent of navigating the source texture in a Z-pattern.

Whether one is a Z-ordered curve or an N-ordered curve depends on whether you shift the X or Y bitstring, effectively making the traversal width by height (Z) or height by width (N). Technically, the dreamcast uses an N-ordered curve.

A problem with using Z-ordered curves is that it's expensive to compute every frame because it uses division and multiplication heavily. Thus there exists numerous bit-twiddling hacks to speed up this operation, covered below.

== Conceptualizing Twiddling ==

Lets start with a recap of what Twiddled textures even are. Twiddled textures is just a particular way of re-organising pixels in an image so they're quicker to render.

[[File:Twiddle.png|thumb]]

The example image where the numbers represent the original un-twiddled indexes and the "inverted Ns" show the original flow of indexes. Indexes from the original image were calculated from left to right, top to bottom (Scanline order). So we can see after index 0, number 1 is just below, 2 is to the right of 0 and 3 is just below 2. Then if we go to the next biggest inverted N we can see the order '''{0,1,2,3}, {4,5,6,7}, {8,9,10,11}, {12,13,14,15}''' following the same inverted N pattern.

So if we are given index '''i''' from an untwiddled image and wished to find the twiddled index, then its a process of recursively narrowing down what part of the twiddled image that pixel now lives in.

Example:

Original: Twiddled:

0 1 2 3 0 2 8 A
4 5 6 7 1 3 9 B
8 9 A B 4 6 C E
C D E F 5 7 D F
G H I J G I O Q
K L M N H J P R
O P Q R K M S U
S T U V L N T V
W X Y Z W Y % &
~ ! # $ X Z ^ *
% ^ & * ~ # ( _
( ) _ + ! $ ) +

== Look-up Table Hack ==
Taken from [https://graphics.stanford.edu/~seander/bithacks.html#InterleaveTableLookup#Stanford Stanford Bit-twiddling hacks page].

<syntaxhighlight lang="c">static const unsigned short MortonTable256[256] =
{
0x0000, 0x0001, 0x0004, 0x0005, 0x0010, 0x0011, 0x0014, 0x0015,
0x0040, 0x0041, 0x0044, 0x0045, 0x0050, 0x0051, 0x0054, 0x0055,
0x0100, 0x0101, 0x0104, 0x0105, 0x0110, 0x0111, 0x0114, 0x0115,
0x0140, 0x0141, 0x0144, 0x0145, 0x0150, 0x0151, 0x0154, 0x0155,
0x0400, 0x0401, 0x0404, 0x0405, 0x0410, 0x0411, 0x0414, 0x0415,
0x0440, 0x0441, 0x0444, 0x0445, 0x0450, 0x0451, 0x0454, 0x0455,
0x0500, 0x0501, 0x0504, 0x0505, 0x0510, 0x0511, 0x0514, 0x0515,
0x0540, 0x0541, 0x0544, 0x0545, 0x0550, 0x0551, 0x0554, 0x0555,
0x1000, 0x1001, 0x1004, 0x1005, 0x1010, 0x1011, 0x1014, 0x1015,
0x1040, 0x1041, 0x1044, 0x1045, 0x1050, 0x1051, 0x1054, 0x1055,
0x1100, 0x1101, 0x1104, 0x1105, 0x1110, 0x1111, 0x1114, 0x1115,
0x1140, 0x1141, 0x1144, 0x1145, 0x1150, 0x1151, 0x1154, 0x1155,
0x1400, 0x1401, 0x1404, 0x1405, 0x1410, 0x1411, 0x1414, 0x1415,
0x1440, 0x1441, 0x1444, 0x1445, 0x1450, 0x1451, 0x1454, 0x1455,
0x1500, 0x1501, 0x1504, 0x1505, 0x1510, 0x1511, 0x1514, 0x1515,
0x1540, 0x1541, 0x1544, 0x1545, 0x1550, 0x1551, 0x1554, 0x1555,
0x4000, 0x4001, 0x4004, 0x4005, 0x4010, 0x4011, 0x4014, 0x4015,
0x4040, 0x4041, 0x4044, 0x4045, 0x4050, 0x4051, 0x4054, 0x4055,
0x4100, 0x4101, 0x4104, 0x4105, 0x4110, 0x4111, 0x4114, 0x4115,
0x4140, 0x4141, 0x4144, 0x4145, 0x4150, 0x4151, 0x4154, 0x4155,
0x4400, 0x4401, 0x4404, 0x4405, 0x4410, 0x4411, 0x4414, 0x4415,
0x4440, 0x4441, 0x4444, 0x4445, 0x4450, 0x4451, 0x4454, 0x4455,
0x4500, 0x4501, 0x4504, 0x4505, 0x4510, 0x4511, 0x4514, 0x4515,
0x4540, 0x4541, 0x4544, 0x4545, 0x4550, 0x4551, 0x4554, 0x4555,
0x5000, 0x5001, 0x5004, 0x5005, 0x5010, 0x5011, 0x5014, 0x5015,
0x5040, 0x5041, 0x5044, 0x5045, 0x5050, 0x5051, 0x5054, 0x5055,
0x5100, 0x5101, 0x5104, 0x5105, 0x5110, 0x5111, 0x5114, 0x5115,
0x5140, 0x5141, 0x5144, 0x5145, 0x5150, 0x5151, 0x5154, 0x5155,
0x5400, 0x5401, 0x5404, 0x5405, 0x5410, 0x5411, 0x5414, 0x5415,
0x5440, 0x5441, 0x5444, 0x5445, 0x5450, 0x5451, 0x5454, 0x5455,
0x5500, 0x5501, 0x5504, 0x5505, 0x5510, 0x5511, 0x5514, 0x5515,
0x5540, 0x5541, 0x5544, 0x5545, 0x5550, 0x5551, 0x5554, 0x5555
};

unsigned short x; // Interleave bits of x and y, so that all of the
unsigned short y; // bits of x are in the even positions and y in the odd;
unsigned int z; // z gets the resulting 32-bit Morton Number.

z = MortonTable256[y >> 8] << 17 |
MortonTable256[x >> 8] << 16 |
MortonTable256[y & 0xFF] << 1 |
MortonTable256[x & 0xFF];</syntaxhighlight>

For more speed, use an additional table with values that are MortonTable256 pre-shifted one bit to the left. This second table could then be used for the y lookups, thus reducing the operations by two, but almost doubling the memory required. Extending this same idea, four tables could be used, with two of them pre-shifted by 16 to the left of the previous two, so that we would only need 11 operations total.

== Binary Magic Numbers Hack ==
Taken from [https://graphics.stanford.edu/~seander/bithacks.html#InterleaveTableLookup#Stanford Stanford Bit-twiddling hacks page].

<syntaxhighlight lang="c">static const unsigned int B[] = {0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF};
static const unsigned int S[] = {1, 2, 4, 8};

unsigned int x; // Interleave lower 16 bits of x and y, so the bits of x
unsigned int y; // are in the even positions and bits from y in the odd;
unsigned int z; // z gets the resulting 32-bit Morton Number. x and y must initially be less than 65536.

x = (x | (x << S[3])) & B[3];
x = (x | (x << S[2])) & B[2];
x = (x | (x << S[1])) & B[1];
x = (x | (x << S[0])) & B[0];
y = (y | (y << S[3])) & B[3];
y = (y | (y << S[2])) & B[2];
y = (y | (y << S[1])) & B[1];
y = (y | (y << S[0])) & B[0];
z = x | (y << 1);</syntaxhighlight>

== Non-Dreamcast Bit-twiddling Hacks ==
On modern x86_64 processors from Intel actually have instructions built in to handle z-ordered curves. These are the Parallel Bit Deposit (PDEP) and Parallel bit Extraction (PEXT) instructions, which can be used in conjunction to interleave a bitstring. As the Dreamcast lacks these instructions, this is not a viable dreamcast solution.

Another method for twiddling comes from multiplication without carry. A number multiplied upon itself using a carry-less multiplication will yield the original bitstring of the number interleaved with 0s. For example, given that the number 255 is 1111-1111 in binary, 255 multiplied-without-carry by 255 reveals a 16-bit number that is (1010-1010 1010-1010). Thus, if you use multiply without carry on the X and Y position of the texel in the texture, you'll arrive at two 16-bit numbers, e.g. X0X0-X0X0 X0X0-X0X0 and Y0Y0-Y0Y0 Y0Y0-Y0Y0. If you bitshift the X value to the right by 1, and then OR the X bitstring by the Y bitstring, the resultant 16-bit bitstring will be twiddled.

The Dreamcast's SH4 CPU lacks a multiply-without-carry instruction, although you could create one that uses only addition like so:

<syntaxhighlight lang="c">int multiplyWithoutCarry(int a, int b) {
int result = 0;
int multiplier = 1;

while (b != 0) {
int digit = b;
int temp = a;

while (digit > 9) {
digit -= 10;
temp += a;
}

while (digit > 0) {
result += temp;
digit--;
}

int divisor = 10;
int tempMultiplier = multiplier;

while (divisor > 1) {
if (divisor <= tempMultiplier) {
tempMultiplier -= divisor;
divisor = divisor << 1;
multiplier = multiplier << 1;
}
else {
divisor = divisor >> 1;
tempMultiplier = tempMultiplier >> 1;
}
}

b -= tempMultiplier;
}

return result;
}</syntaxhighlight>

== Protofall's Implementation ==

How to generated the twiddled index from an untwiddled texture:

Original: Twiddled:

0 1 2 3 0 2 8 A
4 5 6 7 1 3 9 B
8 9 A B 4 6 C E
C D E F 5 7 D F
G H I J G I O Q
K L M N H J P R
O P Q R K M S U
S T U V L N T V
W X Y Z W Y % &
~ ! # $ X Z ^ *
% ^ & * ~ # ( _
( ) _ + ! $ ) +

The matching characters between the two images represent the same pixel, just relocated. These images would be 4 * 12 pixel images, but the steps work for any valid '''2^x * 2^'''y sizes, where x and y are whole numbers.

Now lets say we want to find the twiddled index of the untwiddled '''"O"''' pixel (index 24). By hand we can work it out and tell the twiddle index should be "18", but what algorithm/logic can we use to find this automatically for any '''i'''?

Here are my steps:
* We first need to start by figuring out the "Biggest-Order Inverted-N" ('''BOIN''') that fits in this image.
* Now if our starting image was a square, then the BOIN is the same size as the image
* For rectangles like this, we have to find the smallest side first (width) then our BOIN is width * width
* If we start off with a rectangle, then we need to do an extra step that squares can skip.
* Notice how we can completely encapsulate the whole image with '''(bigger_side / smaller_side) == 3''' BOINs? Our first step is to determine which of these BOINs our index '''i''' belongs in.
* We can take advantage of a quirk I mentioned earlier. Notice how the first BOIN contains the first 1/3 of the original pixels, the 2nd BOIN contains the next 1/3 and the 3rd BOIN contains the last 1/3.
* Therefore using the formula '''k = floor(i / (BOIN area == 4 * 4 = 16)) == 1''' we can determine that our twiddled index is somewhere in the middle/2nd BOIN (Since '''k''' is of the set '''{0,1,2}''')
* Note the index where our BOIN starts according to the original texture. The first index in the 2nd BOIN is "16". Keep track of this value, lets call it '''d'''
* Also keep track of the index where our BOIN starts according to the twiddled texture, this is also '''16''' in this case. Lets add this to a running sum '''s'''
* Forget about the other two BOINs and subtract '''d''' from the indexes in our new BOIN as well as '''i'''

So now we have:

i == 8
0 2 8 A
1 3 9 B
4 6 C E
5 7 D F

Great! We can already see by hand that this still looks right, but how do we automatically solve square BOINs?
* In order to solve a square BOIN, we need to determine what quadrant our pixel is in
* So we determine how many pixels are in each quadrant (4 per quadrant here, '''== a'''), Then calculate '''k = floor(i / a) == 2''' to know its in the 3rd quadrant ('''k''' is in the set '''{0,1,2,3}''').
* That means its in the top right. So we need to set '''d = a * k'''), add our new '''s''' value to the running sum, discard the other quadrants, then subtract '''i''' and the new BOIN's indexes by '''d'''
* The easy way to calculate the new part of '''s''' is that:
** top left quadrant is '''0'''
** top right quad is '''BOIN-width / 2'''
** bottom left is '''BOIN-width * (BOIN-height / 2)'''
** bottom right is '''(BOIN-width * (BOIN-height / 2)) + (BOIN-width / 2)'''

Now we have:

i == 0
0 2
1 3

You would repeat until we have a single pixel. Once we have the last pixel, our new twiddled index should be the running sum '''s''' (16 + 2 + 0 == 18)

== How and why the Dreamcast uses Twiddling ==

Twiddling is used on the Dreamcast for two major purposes, both owing to the same mechanism. Inside the Dreamcast, there exists a form of hardware compression in the PVR called [https://en.wikipedia.org/wiki/Vector_quantization#L135 Vector Quantization], or VQ for short. VQ works by taking an image, and splitting the image up into tiles made of 2x2 pixel patterns. Each pattern is stored in a special bit of memory on the Dreamcast known as the VQ Dictionary. The VQ dictionary contains enough space to hold 1024 of these 2x2 pixel patterns. The purpose of this tiling is to reduce the ultimate size in memory of the original textured image, as the new texture, instead of containing RGB values for each texel, instead stores a single index value that references the VQ dictionary for every 4 texels. This is considered a form of [https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch*L136 Lempel Ziv Welch compression].

One can use VQ to compress textures in two ways. Firstly, one can do basic image compression as described above. However, the Dreamcast has a secondary use for the same VQ hardware, which allows it to mimic dynamic palettes of older game consoles. In this case, the original textured image is treated as though it was scaled up, so each texel comprises a 2x2 pixel region. When this is done, the VQ dictionary entries become 2x2 pixel patterns of solid color. This effectively maps single texel colors in the original texture, to VQ 2x2 solor color entries, like a palette. By changing the definition of a VQ dictionary 2x2 pixel pattern to a different solid color, you can alter every texel in the texture which references that color index. In this palette mode, 1024 entries is broken up into two formats which can be selected by the user. The first divides the palette up into sixty-four banks of 16 colors each, which makes the original texture behave like a [https://en.wikipedia.org/wiki/Color_depth#L137 4-bits per pixel] image. The other format divides the palette up into four 256 color banks, which is an 8bpp texture format.

Interestingly, it has been discovered that applying the twiddle operation three times in succession can effectively untwiddle a texture.

A last, side benefit of Twiddling on the Dreamcast is that is provides the precise pixels needed to do a 2x2 area Anti-Alaiasing filter in hardware at no extra cost.

== DISCLAIMER ==

This theorized solution has only been tested on a few examples by hand, so I might have missed something. But I believe at least the general logic of this is sound. Also note for implementation, some of the divisions could be replaced with bit-shifting since some of those numbers are guaranteed to be powers of 2.

For an example of an algorithm that does the reverse (Convert twiddled index to untwiddled), you can refer to [https://github.com/Protofall/Crayon-Utilities/blob/master/DtexToRGBA8888/DtexToRGBA8888.c#L146 this code made by JamoHTP]

Codespaces

2024-07-14T15:30:10Z

BBHoodsta: /* Example 1: Build an .elf from a KallistiOS example */

Github [https://www.youtube.com/watch?v=sYJ3CHtT6WM Codespaces] lets you spawn a complete Dreamcast development environment in your browser in a matter of minutes.

The only things you need are:
* a browser
* a github login.

No need for a complex installation process anymore !

== Steps Overview ==
The main steps to get a Codespace working, are:
* Login into [https://github.com github]
* Create your code repository, or fork one
* Add a .devcontainer/devcontainer.json file to that repository
* Create & launch your codespace, and enjoy the IDE in your browser !

That's all there is to it.

Free github accounts get [https://docs.github.com/en/billing/managing-billing-for-github-codespaces/about-billing-for-github-codespaces#monthly-included-storage-and-core-hours-for-personal-accounts 120 free core hours per month].

== Example 1: Build an .elf from a KallistiOS example ==
To compile the executable .elf file from a KallistiOS examples in a Codespace:
* Login into [https://github.com github]
* Got to [https://github.com/KallistiOS/KallistiOS the KallistiOS repository]
* Click on the "Fork" button, this will create a KallistiOS repository inside your account
* Click on the "Add File" button, then "Create New File"
* Name the file: ".devcontainer/devcontainer.json", and paste the following contents:
<syntaxhighlight lang="json">
// For format details, see https://aka.ms/devcontainer.json.
// For config options, see the README at: https://github.com/devcontainers/templates/tree/main/src/alpine
{
"name": "My_Codespace",

// Either use a pre-built image (= a Docker container)...
"image": "ghcr.io/kos-builds/kos-ports-dc:sha-656a397-14.1.0",
// ... or use a Dockerfile or Docker Compose file. More info: https://containers.dev/guide/dockerfile
//"build": { // Path is relative to the devcontainer.json file.
// "dockerfile": "Dockerfile"
//},

// Features to add to the dev container. More info: https://containers.dev/features.
// "features": {},

// Use 'forwardPorts' to make a list of ports inside the container available locally.
// "forwardPorts": [],

// Use 'postCreateCommand' to run commands after the container is created.
//"postCreateCommand": "source /opt/toolchains/dc/kos/environ.sh",

// Configure tool-specific properties.
"customizations": {
"vscode": {
"extensions": [
"ms-vscode.cpptools"
]
}
}

// Uncomment to connect as root instead. More info: https://aka.ms/dev-containers-non-root.
// "remoteUser": "root"
}
</syntaxhighlight>
* Click on "Commit changes", then "Commit Changes" again to save the file
* Go back to the root directory of your repository
* Launch your Codespace by clicking on the "<> Code" button, then "Codespaces" - "Create codespace on master".
[[File:Screen Shot 2024-07-14 at 8.28.13 AM.png|thumb]]
* This will launch Visual Studio Code in your browser. The first time it will take a couple of minutes to launch, after that it will be faster.
* 3-bars-Menu at the top left - Terminal - New Terminal
* cd examples/dreamcast/2ndmix
* make clean
* make
* You should now have a "2ndmix.elf" in that folder
* Navigate to that folder (examples/dreamcast/2ndmix) on the file tree on the left, right-click on the file, and choose "Download..."
* Congratulations ! You successfully built an executable file for the Dreamcast. You can now upload that file in your favorite emulator, or send it to a real Dreamcast via a [https://dreamcast.wiki/Coder%27s_cable Coder's Cable] or a [https://dreamcast.wiki/Broadband_adapter Broadband Adapter]

== Example 2: create a .cdi from the .elf of Example 1 ==
Having an .elf executable file is nice for small tests, but often you'll find yourself needing to build a .cdi disc image file:
* If you closed your codespace, you can reopen it by going to your code repository, click on the "<> Code" button, then "Codespaces", then on the auto-generated name of your codespace.
* Since our codespace does not contain mkdcdisc (the tool to build .cdi files), we'll add that to our codespace:
** Open a terminal in your codespace (3-bars-Menu at the top left - Terminal - New Terminal)
** cd /opt/toolchains/dc
** git clone https://gitlab.com/simulant/mkdcdisc
** cd mkdcdisc
** meson setup builddir
** meson compile -C builddir
** cp ./builddir/mkdcdisc /opt/toolchains/dc/bin
* Build the .cdi file for 2ndmix.elf
** cd /opt/toolchains/dc/kos/examples/dreamcast/2ndmix
** mkdcdisc -e 2ndmix.elf -o 2ndmix.cdi -n "2ndmix"
* Compress the .cdi file into a zip file with parts of max 25 MegaBytes (otherwise your browser will have problems downloading the .cdi):
** zip -s 25M 2ndmix.zip 2ndmix.cdi
* Right-click on the generated files, and download them into your local folders
* Unzip the files in your local folder to reconstruct 2ndmix.cdi
* Launch 2ndmix.cdi in your favorite emulator, or on a real Dreamcast

== Example 3: Configuring a more complex Codespace ==
If you find yourself always adding the same extra application into the codespace provided in Example 1 (eg: always having to add mkdcdisc, ...), you can simplify your setup by specifying your own Dockerfile, and add setup commands in there:

* Modify .devcontainer/devcontainer.json so that it points to a Dockerfile:
<syntaxhighlight lang="json">
// For format details, see https://aka.ms/devcontainer.json.
// For config options, see the README at: https://github.com/devcontainers/templates/tree/main/src/alpine
{
"name": "My_Codespace",

// Either use a pre-built image (= a Docker container)...
//"image": "ghcr.io/kos-builds/kos-ports-dc:sha-656a397-14.1.0",
// ... or use a Dockerfile or Docker Compose file. More info: https://containers.dev/guide/dockerfile
"build": { // Path is relative to the devcontainer.json file.
"dockerfile": "Dockerfile"
},

// Features to add to the dev container. More info: https://containers.dev/features.
// "features": {},

// Use 'forwardPorts' to make a list of ports inside the container available locally.
// "forwardPorts": [],

// Use 'postCreateCommand' to run commands after the container is created.
//"postCreateCommand": "source /opt/toolchains/dc/kos/environ.sh",

// Configure tool-specific properties.
"customizations": {
"vscode": {
"extensions": [
"ms-vscode.cpptools"
]
}
}

// Uncomment to connect as root instead. More info: https://aka.ms/dev-containers-non-root.
// "remoteUser": "root"
}
</syntaxhighlight>

* ... and add a new file: ./devcontainer/Dockerfile
<syntaxhighlight lang="bash">
# Stage 1
FROM "ghcr.io/kos-builds/kos-ports-dc:sha-656a397-14.1.0" as build

# Add mkdcdisc
RUN git clone https://gitlab.com/simulant/mkdcdisc /opt/toolchains/dc/mkdcdisc \
&& cd /opt/toolchains/dc/mkdcdisc \
&& meson setup builddir \
&& meson compile -C builddir \
&& cp ./builddir/mkdcdisc /opt/toolchains/dc/bin
</syntaxhighlight>
* The next time you create a repository & copy the above 2 files in it, you will get a codespace with mkdcdisc already correctly installed !

== Example 4: Build something not from KallistiOS ==
All the previous examples started by forking the KallistiOS repository, but most often you'll just want to work on your own code / another Dreamcast project, so you'll work from a repository other than KallistiOS:
* login into [https://www.github.com github]
* create/go to your own repository, or fork an existing project (eg doom64-dc, ...)
* add at least ./devcontainer/devcontainer.json, and ./devcontainer/Dockerfile if needed
* create/launch your codespace
* follow the build instructions of that project
* Notes:
** Everything for KallistiOS should be available in the terminal, you'll find KallistiOS stuff in the folder /opt/toolchains/dc
** In the file-tree on the left side, you'll see the files of your repository (folder: /workspaces). If you also want to add other folders there, eg /opt/toolchains/dc, execute something like this in the terminal: code -a /opt/toolchains/dc

== Tips ==
* When you're finished with your Codespace, go to
** the 3-bars-Menu at the top left - "My Codespaces"
** on the left, select the codespace you were just running
** click on the 3 dots next to "Active"
** select "Stop codespace"
Doing this pro-actively will save you some free minutes, since the default timeout is 30 minutes.

File:Screen Shot 2024-07-14 at 8.28.13 AM.png

2024-07-14T15:29:30Z

BBHoodsta:

Screenshot of Codespace button

Building KOS on MinGW-w64/MSYS2

2023-09-19T18:41:05Z

BBHoodsta: /* Toolchain (cross-compiler and libraries) */

==Overview==

This tutorial is a step-by-step guide on how to setup a toolchain and KOS environment on your Windows system.

The toolchain consists of a C/C++ compiler (GCC), assembler and linker (binutils), and C library (newlib). As the Dreamcast has two processors - the SH4 CPU and the AICA (ARM) sound processor - the toolchain includes compilers for both.

KOS consists of the operating system core (kos) and a set of nicely integrated libraries (kos-ports).

Since KOS was developed for Unix-compatible systems (like Linux, BSD, etc.), a Unix-compatible development environment must be installed. The available choices are Cygwin, MSYS and MSYS2. MSYS is unmaintained and out-dated. Cygwin and MSYS2 both work, but MSYS2 seems to be maintained more actively, work better and also offers a better package management system, so it is preferred.

==Preparations==
Install MSYS2 from http://repo.msys2.org/distrib/i686 (this tutorial used http://repo.msys2.org/distrib/i686/msys2-i686-20160205.exe).

''Please make sure to use partition C:\''. A user reported issues of git, wget, etc. not working at all when installing to partition D:\. On the MSys2 website it's mentioned that FAT filesystems don't work, so that's an alternative explanation.

As the setup completes, it will ask whether you want to open a shell. Don't. Open ''C:\msys32\mingw32_shell.bat'' instead (mingw shell instead of msys2 shell).

==Install script==
At this point, please consider trying the install script first. It will perform the remaining steps below automatically.

Download the install script: [[File:Kos_setup_script.zip]].
Then change to the directory of the script and execute it (uses Unix paths instead of Windows paths, ''C:\'' becomes ''/c/'')
$ cd /c/Documents\ and\ Settings (''find your Download folder here..'')
$ sh kos_setup.sh

The script should perform all the remaining steps. If something goes wrong, you can try to continue the steps manually or ask for help on the forums/IRC.

==Install required packages==
MSYS2 uses the ''pacman'' package manager. The following command should download all required programs.

$ pacman -Sy --needed mingw-w64-i686-binutils mingw-w64-i686-gcc mingw-w64-i686-pkg-config mingw-w64-i686-libpng mingw-w64-i686-libjpeg-turbo diffutils git make subversion patch python tar texinfo wget

==Downloading KOS==

KOS is available through a Git repository at SourceForge.
The standard install directory assumed in the configuration files is /opt/toolchains/dc/{kos, kos-ports}.

$ git clone git://git.code.sf.net/p/cadcdev/kallistios /opt/toolchains/dc/kos

==Toolchain (cross-compiler and libraries)==
After cloning the KOS repository, navigate to dc-chain directory:

$ cd /opt/toolchains/dc/kos/utils/dc-chain

Then compile the cross-compiler and system libraries.
The erase=1 will delete temporary files after a successful build.

$ make erase=1

After this command completes successfully you have a working cross-compiler for Dreamcast and can compile KOS next.

==Setting up KOS==

You should read the documentation in the kos/doc directory for details, but here are the basic steps required to set up the KOS environment:

Go into the kos directory and copy the template configuration:

$ cp /opt/toolchains/dc/kos/doc/environ.sh.sample /opt/toolchains/dc/kos/environ.sh

Now edit environ.sh to match your installation. If you use the default installation directory you don't need to change anything.

Execute the following command to set the KOS environment variables:

$ source /opt/toolchains/dc/kos/environ.sh

Remember to do this every time you want to use the KOS environment in a newly opened shell.
Dont't forget to run the above command again when editing environ.sh.

Now we are finally ready to compile KOS itself. In the kos directory, run:

$ cd /opt/toolchains/dc/kos
$ make

==KOS-Ports==
KOS-Ports is a repository with commonly used libraries for development on the DC, like PNG or MP3 loading.

Clone the repository:

$ git clone git://git.code.sf.net/p/cadcdev/kos-ports /opt/toolchains/dc/kos-ports

Compile all KOS-ports using the build-all script

$ sh /opt/toolchains/dc/kos-ports/utils/build-all.sh

Now you should have a working Dreamcast development environment :-)

Check out the examples in the KallistiOS directory to find out how to use KOS in your own projects!

Building KOS on MinGW-w64/MSYS2

2023-09-19T18:39:20Z

BBHoodsta: /* Toolchain (cross-compiler and libraries) */

==Overview==

This tutorial is a step-by-step guide on how to setup a toolchain and KOS environment on your Windows system.

The toolchain consists of a C/C++ compiler (GCC), assembler and linker (binutils), and C library (newlib). As the Dreamcast has two processors - the SH4 CPU and the AICA (ARM) sound processor - the toolchain includes compilers for both.

KOS consists of the operating system core (kos) and a set of nicely integrated libraries (kos-ports).

Since KOS was developed for Unix-compatible systems (like Linux, BSD, etc.), a Unix-compatible development environment must be installed. The available choices are Cygwin, MSYS and MSYS2. MSYS is unmaintained and out-dated. Cygwin and MSYS2 both work, but MSYS2 seems to be maintained more actively, work better and also offers a better package management system, so it is preferred.

==Preparations==
Install MSYS2 from http://repo.msys2.org/distrib/i686 (this tutorial used http://repo.msys2.org/distrib/i686/msys2-i686-20160205.exe).

''Please make sure to use partition C:\''. A user reported issues of git, wget, etc. not working at all when installing to partition D:\. On the MSys2 website it's mentioned that FAT filesystems don't work, so that's an alternative explanation.

As the setup completes, it will ask whether you want to open a shell. Don't. Open ''C:\msys32\mingw32_shell.bat'' instead (mingw shell instead of msys2 shell).

==Install script==
At this point, please consider trying the install script first. It will perform the remaining steps below automatically.

Download the install script: [[File:Kos_setup_script.zip]].
Then change to the directory of the script and execute it (uses Unix paths instead of Windows paths, ''C:\'' becomes ''/c/'')
$ cd /c/Documents\ and\ Settings (''find your Download folder here..'')
$ sh kos_setup.sh

The script should perform all the remaining steps. If something goes wrong, you can try to continue the steps manually or ask for help on the forums/IRC.

==Install required packages==
MSYS2 uses the ''pacman'' package manager. The following command should download all required programs.

$ pacman -Sy --needed mingw-w64-i686-binutils mingw-w64-i686-gcc mingw-w64-i686-pkg-config mingw-w64-i686-libpng mingw-w64-i686-libjpeg-turbo diffutils git make subversion patch python tar texinfo wget

==Downloading KOS==

KOS is available through a Git repository at SourceForge.
The standard install directory assumed in the configuration files is /opt/toolchains/dc/{kos, kos-ports}.

$ git clone git://git.code.sf.net/p/cadcdev/kallistios /opt/toolchains/dc/kos

==Toolchain (cross-compiler and libraries)==
After cloning the KOS repository, run the toolchain download+unpack+compile scripts:

$ cd /opt/toolchains/dc/kos/utils/dc-chain

Then compile the cross-compiler and system libraries.
The erase=1 will delete temporary files after a successful build.

$ make erase=1

After this command completes successfully you have a working cross-compiler for Dreamcast and can compile KOS next.

==Setting up KOS==

You should read the documentation in the kos/doc directory for details, but here are the basic steps required to set up the KOS environment:

Go into the kos directory and copy the template configuration:

$ cp /opt/toolchains/dc/kos/doc/environ.sh.sample /opt/toolchains/dc/kos/environ.sh

Now edit environ.sh to match your installation. If you use the default installation directory you don't need to change anything.

Execute the following command to set the KOS environment variables:

$ source /opt/toolchains/dc/kos/environ.sh

Remember to do this every time you want to use the KOS environment in a newly opened shell.
Dont't forget to run the above command again when editing environ.sh.

Now we are finally ready to compile KOS itself. In the kos directory, run:

$ cd /opt/toolchains/dc/kos
$ make

==KOS-Ports==
KOS-Ports is a repository with commonly used libraries for development on the DC, like PNG or MP3 loading.

Clone the repository:

$ git clone git://git.code.sf.net/p/cadcdev/kos-ports /opt/toolchains/dc/kos-ports

Compile all KOS-ports using the build-all script

$ sh /opt/toolchains/dc/kos-ports/utils/build-all.sh

Now you should have a working Dreamcast development environment :-)

Check out the examples in the KallistiOS directory to find out how to use KOS in your own projects!

Getting Started with Dreamcast development

2023-09-17T15:18:02Z

BBHoodsta: /* Configuring and compiling KOS and kos-ports */

<div style="float:right;">__TOC__</div>

===This article is actively being worked on===
''Work in progress - items to be added and/or edited'': Setting up debug link, building and burning a CD-R, setting up a first project, setting up an IDE, etc.

=Introduction=
This article will cover the entire beginning process: starting from zero to having a working dev environment with debug link (serial or IP) and self-booting CD-R. This guide will cover the process for the following platforms:
* '''Microsoft Windows 10''' via [https://learn.microsoft.com/en-us/windows/wsl/about Windows Subsystem for Linux]
** Users desiring a native Windows approach, see [[DreamSDK]] instead
* '''macOS''' on Intel or Apple Silicon systems with the [https://brew.sh/ Homebrew] package manager installed
* '''Linux'''-based systems
** '''Debian'''- and '''Ubuntu'''-based distributions using the default '''apt''' package manger
** '''Fedora'''-based distributions using the default '''dnf''' package manager
** '''Arch'''-based distributions using the default '''pacman''' package manager
** '''Alpine'''-based distributions using the default '''apk''' package manager

===Need help?===
Important note: ''This guide aims to remain up to date and work on all of the above platforms, but keeping instructions for such a variety of platforms up-to-date can be difficult. If you run into any errors or other challenges while following this tutorial, or simply need clarification on any of the steps, feel free to ask for assistance on the [https://dcemulation.org/phpBB/viewforum.php?f=29 message board] and we would be happy to aid you and update the guide for the benefit of future readers and others in the community.''

===Terms===
Before we get started, let's define several terms:

The '''toolchain''' is a set of programs which turns your code into an executable file for your Dreamcast console. The toolchain includes:
* '''GCC''', a C/C++/Objective-C compiler (with Rust support coming soon)
* '''binutils''', an assembler and linker
* '''newlib''', a C library
* '''gdb''', a debugger
The toolchain includes a compiler for the Dreamcast's main SH4 CPU, and optionally a compiler for the ARM-based AICA sound processor. Your operating system may already have versions of these programs installed to compile code for your computer, but we will need to build a "cross-compiler" for compiling specifically for the Dreamcast.

'''KallistiOS''' or ''KOS'' is an open source development library and pseudo-operating system for the Dreamcast console. It is the best documented and most widely used development kit in the homebrew community. KallistiOS's very flexible license allows both homebrew and commercial use with no restrictions other than a requirement to include credit for its use in your project, and indeed almost all commercially sold indie Dreamcast titles use it. There are others in existence, like [[libronin]] and [[libdream]], as well as the older development kits [[Katana]] and [[Windows CE]] created by Sega and Microsoft for use in retail games, but this guide will only cover the setup and use of KallistiOS.

'''kos-ports''' is a repository including various libraries which integrate with KallistiOS. We will download and compile these libraries as well.

The '''debug link''' is a generic term referring to a hardware accessory to facilitate quickly running and debugging your programs. IP-based links include the Dreamcast's '''[[Broadband adapter]]''' and '''[[LAN adapter]]''' accessories, and serial-based links include the [[Coder's cable]], which is a cable that can connect the Dreamcast's serial port to your computer via USB or serial. This guide includes instructions for setting up and using the the Broadband adapter and a USB-based coder's cable.

'''dc-tool''' and '''dcload''' are a pair of programs to facilitate using a debug link. ''dc-tool'' runs on your computer and links to a Dreamcast running ''dcload-ip'' or ''dcload-serial''. With this setup, you can quickly load programs, read console feedback, load assets, transfer data, redirect I/O, handle exceptions, debug problems, and so forth.

=Choosing a debug link solution=
If you are building the toolchain for the purpose of building existing programs from source with little to no modifications, then a debug link setup might not be necessary for you. You may simply build programs to burn directly to CD-R. However, if you are planning to actively develop for the Dreamcast, then a debug link is a critical component. While Dreamcast emulators are mature and accurate enough to play the vast majority of the system's games library without issue, many critical bugs may show up on a real Dreamcast system, but not on a Dreamcast emulator. Therefore, it is highly recommended to test on a real system as much as possible. It's also possible to load software off of a [[Serial SD card adapter]], but without an active link to a computer, debugging and stepping through programs as they execute is significantly more challenging.

Presented below is a table comparing the different options available for a debug link. Due to the cost, potential buyers may want to factor in the ability to play multiplayer games with their purchase. Thus, for comparison, we have included information about the [[Modem]] with [[DreamPi]] as well, but understand that the Modem with DreamPi cannot be used as a debug link.

{| class="wikitable"
!colspan="6" |Comparison of various Dreamcast connectivity options
|-
|style="background-color:#c0c0c0;" width="150" | Device:
|style="background-color:#d0d0d0;" width="400" | [[Broadband adapter]] (HIT-400 or HIT-401) <br />Realtek RTL8139C chipset
|style="background-color:#d0d0d0;" width="400" | [[LAN adapter]] (HIT-300) <br />Fujitsu MB86967 chipset
|style="background-color:#d0d0d0;" width="400" | [[Modem]] with [[DreamPi]]
|style="background-color:#d0d0d0;" width="400" | USB [[Coder's cable]]
|style="background-color:#d0d0d0;" width="400" | Serial [[Coder's cable]]
|-
|style="background-color:#d0d0d0;" | Useful for dev? || Yes, supports dcload-ip || Yes, supports dcload-ip,<br/>but BBA is superior and cheaper || No, only useful for online multiplayer gaming || Yes, supports dcload-serial || Yes, supports dcload-serial
|-
|style="background-color:#d0d0d0;" | Cost || $100 - $200 and up on used markets || $200 and up on used markets,<br/>due to extreme rarity || Kit prices vary, around $100 || Varies on used markets, uncommonly sold<br />RetroOnyx sells for $85 || Varies on used markets, uncommonly sold
|-
|style="background-color:#d0d0d0;" | Can make DIY? || No || No || Yes || Yes || Yes
|-
|style="background-color:#d0d0d0;" | Performance || Up to 100 megabits/s || Up to 10 megabits/s || Up to 56 kilobits/s || Up to 1500 kilobits/s || Up to 120 kilobits/s
|-
|style="background-color:#d0d0d0;" | Games support || Some games: Phantasy Star Online, Quake III Arena, Toy Racer, POD SpeedZone, Propellor Arena, Unreal Tournament<br />Some browsers: Broadband Passport, PlanetWeb 3.0 || No games<br />One browser: Dream Passport for LAN || All multiplayer games with network support<br />All web browsers || NO multiplayer games support || NO multiplayer games support
|-
|style="background-color:#d0d0d0;" | Homebrew support || Homebrew utilities like dcload-ip || Homebrew utilities like dcload-ip || Homebrew utilities don't support, only multiplayer games || Homebrew utilities like dcload-serial || Homebrew utilities like dcload-serial
|}

=Setting up and compiling the toolchain with the dc-chain script=
===Dependencies===
First, we'll need to install dependencies before building the toolchain. Below we have provided commands to install these dependencies on various systems. Many of the packages will likely already be installed on your system, but we have provided an exhaustive list for good measure.
====macOS 13 Ventura on an Intel or Apple Silicon processor====
First, make sure you install Apple Xcode, including the Command Line tools. You will also need to install several other packages for which we'll include instructions assuming you have installed the [https://brew.sh/ Homebrew] package manager on your system.
brew install wget gettext texinfo gmp mpfr libmpc libelf jpeg-turbo libpng meson libisofs

''Important Note for Apple Silicon users'': On Apple Silicon, Homebrew installs libraries to a path not included by default by the compiler. If you haven't added these to your '''~/.zprofile''', then add the following lines now and reload your session (or run them in your Terminal session whenever you compile KOS):
export CPATH=/opt/homebrew/include
export LIBRARY_PATH=/opt/homebrew/lib

====Debian/Ubuntu-based Linux====
sudo apt-get update
sudo apt install gawk patch bzip2 tar make libgmp-dev libmpfr-dev libmpc-dev gettext wget libelf-dev texinfo bison flex sed git build-essential diffutils curl libjpeg-dev libpng-dev python3 pkg-config libisofs-dev meson ninja-build

====Fedora-based Linux====
sudo dnf install gawk patch bzip2 tar make gmp-devel mpfr-devel libmpc-devel gettext wget elfutils-libelf-devel texinfo bison flex sed git diffutils curl libjpeg-turbo-devel libpng-devel gcc-c++ python3 meson ninja-build

====Arch-based Linux====
sudo pacman -S --needed gawk patch bzip2 tar make gmp mpfr libmpc gettext wget libelf texinfo bison flex sed git diffutils curl libjpeg-turbo libpng python3 meson

====Alpine-based Linux====
sudo apk --update add build-base patch bash texinfo gmp-dev libjpeg-turbo-dev libpng-dev elfutils-dev curl wget python3 git

====Other Linux distributions====
If you're using a different Linux- or Unix-based system besides the one above, you may need to reference your distribution's package database and package manager documentation for the equivalent package names and commands necessary for your system.

===Creating a space for your toolchain installation===
Create the path where we'll install the toolchain and KOS, and grant it the proper permissions:
sudo mkdir -p /opt/toolchains/dc
sudo chmod -R 755 /opt/toolchains/dc
sudo chown -R $(id -u):$(id -g) /opt/toolchains/dc
===Cloning the KOS git repository===
Clone the KOS git repository to your system:
git clone https://github.com/KallistiOS/KallistiOS.git /opt/toolchains/dc/kos

===Configuring the dc-chain script===
Enter the dc-chain directory:
cd /opt/toolchains/dc/kos/utils/dc-chain

We will choose the default '''stable''' configuration for the toolchain, which currently uses GCC 13.2. For advanced users, other configurations are available to you; read the <code>README.md</code> file in the dc-chain directory for more information if you are interested.
cp config/config.mk.stable.sample config.mk

Now, you may configure config.mk options to your liking by using a text editor. You may alter the <code>makeopts</code> parameter to the number of threads available on your CPU to speed up the compilation, if desired. However, if you run into errors during compilation, you may want to set <code>makeopts=-j1</code>, as on some operating systems the toolchain may fail to build with a higher setting.

===Downloading and compiling the toolchain===
Now we will run a script to download files and compile the toolchain. At this point, we have the option of building both the main CPU SH4 compiler and the AICA sound processor ARM compiler, or we can skip the ARM compiler and just build the SH4 compiler. Thankfully, KallistiOS includes a prebuilt sound driver, so the ARM compiler is only necessary if you're wanting to make changes to the sound driver or write custom code to run on the sound processor.
To build '''only the SH4 compiler''':
make build-sh4
To build '''both''' the SH4 and the ARM compilers:
make
This will download and unpack the relevant necessary files and then begin the compilation process. The compilation can take anywhere from minutes to a few hours depending on your CPU and number of threads available. When successfully finished, the toolchains will be ready.

Afterwards, if desired, you may also compile the GNU Debugger (gdb) as well:
make gdb

The GNU Debugger is now installed along with your toolchains.

===Cleaning up temporary files===
After building everything, you can clean up the extraneous files in your dc-chain directory by entering:
make clean

=Configuring and compiling KOS and kos-ports=
===Setting up the environment settings===
Enter the KOS directory:
cd /opt/toolchains/dc/kos
Copy the pre-made environment script into place:
cp doc/environ.sh.sample environ.sh
For most users, the default settings will suffice. However, advanced users may edit the environ.sh to your liking if you'd like to change compile flags or alter paths. If you'd like to have multiple KOS versions installed or multiple toolchain versions installed, you can set up different environ.sh files corresponding to these different configurations by altering the paths. Run the source command on the desired environ.sh file to select that configuration prior to compiling your project.

You will need to run the source command to apply the KOS environment settings to your currently running shell. Run the following now, '''and''' ''whenever'' you open a new shell to work on Dreamcast projects:
source /opt/toolchains/dc/kos/environ.sh

===Building KOS===
Build KOS:
make
KOS is now built.
===Building kos-ports===
Clone the kos-ports repository to your system:
git clone --recursive https://github.com/KallistiOS/kos-ports /opt/toolchains/dc/kos-ports
Run the script to build all of the included ports:
/opt/toolchains/dc/kos-ports/utils/build-all.sh
kos-ports is now built.

===Building the KOS examples===
Enter the KOS examples directory:
cd /opt/toolchains/dc/kos/examples/dreamcast
Build the examples:
make
All of the example programs provided with KallistiOS are now built.

=Running an example program through a debug link=
'''TODO''': ''Give a tutorial on compiling dcload/dc-tool, setting up a serial, USB, or IP debug link, and running 2ndmix demo on real hardware.''

Download and burn the [[:File:Dcload-2023-06-22.zip|latest versions of dcload-ip or dcload-serial]] -- the IP version includes improved DHCP support, so there is no longer a need to configure things beforehand.

Run one of the examples from the <code>kos/examples/dreamcast</code> directory with the following command:
dc-tool-ip -t <dreamcast IP address> -x example.elf
Run <code>dc-tool-ip</code> without any parameters to get additional options.

=Burning an example program to CD-R=
'''TODO''': ''Explain how to build mkdcdisc and write 2ndmix demo to CD-R and run on a Dreamcast console''

[https://gitlab.com/simulant/mkdcdisc mkdcdisc] can be used to easily generate a burnable self-boot CDI image.
Build <code>mkdcdisc</code>:
git clone https://gitlab.com/simulant/mkdcdisc.git
cd mkdcdisc
meson setup builddir
meson compile -C builddir
./builddir/mkdcdisc -h
and create a CDI image from your compiled ELF like so:
mkdcdisc -e MyProgram.elf -o MyProgram.cdi
Then you can burn the CDI file using DiscJuggler (Windows-only, but also works through [https://www.winehq.org/ WINE]), ImgBurn with the CDI plugin, or the cdiburn *nix script floating around out there. (document this better)

=Creating your first project=
'''TODO''': ''Explain how to create a new DC project folder with Makefile, adding an external library, create a basic program, and compile it''

=Further reading=
'''TODO''': Links to articles for using gdb, integrating the dev setup with an IDE, etc.

e.g.
* [[Qt Creator Dreamcast Development Environment]]
* [[CLion Debugging]]
* [[Visual Studio Code]]

Development

2023-09-06T14:10:41Z

BBHoodsta: /* General */

=== Getting started ===
* [[Getting Started with Dreamcast development]] -- start here!
====Ready-to-use environments====
* [[Docker images]]
* [[DreamSDK]] (Windows only)

====[[KallistiOS]]====
* Building on Linux, macOS, Windows Subsystem for Linux
** see [[Getting Started with Dreamcast development|''Getting Started with Dreamcast development'']]
* [[Building KOS on Cygwin]]
* [[Building KOS on MinGW/MSYS]]
* [[Building KOS on MinGW-w64/MSYS2]]
* [https://kos-docs.dreamcast.wiki/ KallistiOS Doxygen documentation]

====Other====
* [[Using Ruby for Sega Dreamcast development]] (experimental)

=== Build & test ===
* [[Building your project]]
* [[Emulators]]
* [[Broadband adapter]] / [[LAN adapter]]
** [[Using dcload-ip with Linux]]
** [[Using dcload-ip with Windows Subsystem for Linux|Using dcload-ip with Windows 10]] (via Windows Subsystem for Linux)
* [[Coder's cable]]

=== Environments and IDEs ===
* [[Qt Creator Dreamcast Development Environment]]
* [[CLion Debugging]]
* [[Visual Studio Code]]

=== Tools & utilities ===
* [[Debugging throught GNU Debugger (GDB) and dcload/dc-tool]]
* [[Using dcprof]]

=== Releasing your project ===
* Plain files
* Disc image
* Selfboot Inducer package

=== Engines ===
* [[Simulant]]
** [[Windows WSL2 Setup]]
** [[Generate profiling data]]

=== General ===
* [[Filesystem]]
* [[Store Queues]]
* [[Romdisk Swapping]]
* [https://mc.pp.se/dc/hw.html Marcus Comstedt's Dreamcast Hardware Reference]

=== Graphics ===
* [[Texture Formats]]
* [[Graphics APIs]]
* [[Paletted Textures]]
* [[2D Rendering Without PVR]]
* [[Twiddling]]

* PVR
** [[PowerVR Introduction]]
** [[PVR Spritesheets]]
* [[GLdc]]
** [[Drawing 2D sprites using GLdc]]
** [[Drawing 3D shapes using GLdc]]
** [https://hkowsoftware.com/articles/gldc-vertex-formats-from-vec3f-to-fastpath-to-map_buffer/ GLdc Vertex Formats: From vec3f to fastpath to map_buffer]
* Others
** [http://www.numechanix.com/blog/index.php/2015/10/03/20/ Procedural texture]
** [[Notes on fillrate and drawing large textures]]
** [[KMG Textures]]
** [[Loading PNG images as OpenGL textures]]

=== Audio ===
* [[Playing SFX]]
* [[Streaming audio]]

=== Maple ===
* Controller input

=== VMU ===
* [[File Types]]
* [[Save/Load file]]
* [[Show icon]]
* [[Play tone]]
* [[VMU_development|Game Development]]

=== Optimization ===
* [[GCC-SH4 tips]]
* [[SH4 in Compiler Explorer]]
* [[Fast SH4 Vertex Processing]]
* [[Useful programming tips]]
* [[Efficient usage of the Dreamcast RAM]]
* Registers
* DMA
* TA
* PVR
=== Website Development ===
*[[Development Resources]]

=== Random Snippets ===
* [[Objdump]]

Development

2023-09-06T14:10:30Z

BBHoodsta: /* General */

=== Getting started ===
* [[Getting Started with Dreamcast development]] -- start here!
====Ready-to-use environments====
* [[Docker images]]
* [[DreamSDK]] (Windows only)

====[[KallistiOS]]====
* Building on Linux, macOS, Windows Subsystem for Linux
** see [[Getting Started with Dreamcast development|''Getting Started with Dreamcast development'']]
* [[Building KOS on Cygwin]]
* [[Building KOS on MinGW/MSYS]]
* [[Building KOS on MinGW-w64/MSYS2]]
* [https://kos-docs.dreamcast.wiki/ KallistiOS Doxygen documentation]

====Other====
* [[Using Ruby for Sega Dreamcast development]] (experimental)

=== Build & test ===
* [[Building your project]]
* [[Emulators]]
* [[Broadband adapter]] / [[LAN adapter]]
** [[Using dcload-ip with Linux]]
** [[Using dcload-ip with Windows Subsystem for Linux|Using dcload-ip with Windows 10]] (via Windows Subsystem for Linux)
* [[Coder's cable]]

=== Environments and IDEs ===
* [[Qt Creator Dreamcast Development Environment]]
* [[CLion Debugging]]
* [[Visual Studio Code]]

=== Tools & utilities ===
* [[Debugging throught GNU Debugger (GDB) and dcload/dc-tool]]
* [[Using dcprof]]

=== Releasing your project ===
* Plain files
* Disc image
* Selfboot Inducer package

=== Engines ===
* [[Simulant]]
** [[Windows WSL2 Setup]]
** [[Generate profiling data]]

=== General ===
* [[Filesystem]]
* [[Romdisk Swapping]]
* [[Store Queues]]
* [https://mc.pp.se/dc/hw.html Marcus Comstedt's Dreamcast Hardware Reference]

=== Graphics ===
* [[Texture Formats]]
* [[Graphics APIs]]
* [[Paletted Textures]]
* [[2D Rendering Without PVR]]
* [[Twiddling]]

* PVR
** [[PowerVR Introduction]]
** [[PVR Spritesheets]]
* [[GLdc]]
** [[Drawing 2D sprites using GLdc]]
** [[Drawing 3D shapes using GLdc]]
** [https://hkowsoftware.com/articles/gldc-vertex-formats-from-vec3f-to-fastpath-to-map_buffer/ GLdc Vertex Formats: From vec3f to fastpath to map_buffer]
* Others
** [http://www.numechanix.com/blog/index.php/2015/10/03/20/ Procedural texture]
** [[Notes on fillrate and drawing large textures]]
** [[KMG Textures]]
** [[Loading PNG images as OpenGL textures]]

=== Audio ===
* [[Playing SFX]]
* [[Streaming audio]]

=== Maple ===
* Controller input

=== VMU ===
* [[File Types]]
* [[Save/Load file]]
* [[Show icon]]
* [[Play tone]]
* [[VMU_development|Game Development]]

=== Optimization ===
* [[GCC-SH4 tips]]
* [[SH4 in Compiler Explorer]]
* [[Fast SH4 Vertex Processing]]
* [[Useful programming tips]]
* [[Efficient usage of the Dreamcast RAM]]
* Registers
* DMA
* TA
* PVR
=== Website Development ===
*[[Development Resources]]

=== Random Snippets ===
* [[Objdump]]

Getting Started with Dreamcast development

2023-06-07T11:50:06Z

BBHoodsta: /* Cleaning up temporary files */

<div style="float:right;">__TOC__</div>

===This article is actively being worked on===
''Work in progress - items to be added and/or edited'': Setting up debug link, building and burning a CD-R, setting up a first project, setting up an IDE, etc.

=Introduction=
This article will cover the entire beginning process: starting from zero to having a working dev environment with debug link (serial or IP) and self-booting CD-R. This guide will cover the process for the following platforms:
* Microsoft Windows 10 via [https://learn.microsoft.com/en-us/windows/wsl/about Windows Subsystem for Linux]
* macOS on Intel or Apple Silicon systems with the [https://brew.sh/ Homebrew] package manager installed
* Debian- and Ubuntu-based Linux distributions using the default apt package manager
* Fedora-based Linux distributions using the default dnf package manager
* Arch-based Linux distributions using the default pacman package manager

===Need help?===
Important note: ''This guide aims to remain up to date and work on all of the above platforms, but keeping instructions for such a variety of platforms up-to-date can be difficult. If you run into any errors or other challenges while following this tutorial, or simply need clarification on any of the steps, feel free to ask for assistance on the [https://dcemulation.org/phpBB/viewforum.php?f=29 message board] and we would be happy to aid you and update the guide for the benefit of future readers and others in the community.''

===Terms===
Before we get started, let's define several terms:

The '''toolchain''' is a set of programs which turns your code into an executable file for your Dreamcast console. The toolchain includes:
* '''GCC''', a C/C++/Objective-C compiler
* '''binutils''', an assembler and linker
* '''newlib''', a C library
* '''gdb''', a debugger
The toolchain includes a compiler for the Dreamcast's main SH4 CPU, and optionally a compiler for the ARM-based AICA sound processor. Your operating system may already have versions of these programs installed to compile code for your computer, but we will need to build a "cross-compiler" for compiling specifically for the Dreamcast.

'''KallistiOS''' or ''KOS'' is an open source development library and pseudo-operating system for the Dreamcast console. It is the best documented and most widely used development kit in the homebrew community. KallistiOS's very flexible license allows both homebrew and commercial use with no restrictions other than a requirement to include credit for its use in your project, and indeed almost all commercially sold indie Dreamcast titles use it. There are others in existence, like [[libronin]] and [[libdream]], as well as the older development kits [[Katana]] and [[Windows CE]] created by Sega and Microsoft for use in retail games, but this guide will only cover the setup and use of KallistiOS.

'''kos-ports''' is a repository including various libraries which integrate with KallistiOS. We will download and compile these libraries as well.

The '''debug link''' is a generic term referring to a hardware accessory to facilitate quickly running and debugging your programs. IP-based links include the Dreamcast's '''[[Broadband adapter]]''' and '''[[LAN adapter]]''' accessories, and serial-based links include the [[Coder's cable]], which is a cable that can connect the Dreamcast's serial port to your computer via USB or serial. This guide includes instructions for setting up and using the the Broadband adapter and a USB-based coder's cable.

'''dc-tool''' and '''dcload''' are a pair of programs to facilitate using a debug link. ''dc-tool'' runs on your computer and links to a Dreamcast running ''dcload-ip'' or ''dcload-serial''. With this setup, you can quickly load programs, read console feedback, load assets, transfer data, redirect I/O, handle exceptions, debug problems, and so forth.

=Choosing a debug link solution=
If you are building the toolchain for the purpose of building existing programs from source with little to no modifications, then a debug link setup might not be necessary for you. You may simply build programs to burn directly to CD-R. However, if you are planning to actively develop for the Dreamcast, then a debug link is a critical component. While Dreamcast emulators are mature and accurate enough to play the vast majority of the system's games library without issue, many critical bugs may show up on a real Dreamcast system, but not on a Dreamcast emulator. Therefore, it is highly recommended to test on a real system as much as possible. It's also possible to load software off of a [[Serial SD card adapter]], but without an active link to a computer, debugging and stepping through programs as they execute is significantly more challenging.

Presented below is a table comparing the different options available for a debug link. Due to the cost, potential buyers may want to factor in the ability to play multiplayer games with their purchase. Thus, for comparison, we have included information about the [[Modem]] with [[DreamPi]] as well, but understand that the Modem with DreamPi cannot be used as a debug link.

{| class="wikitable"
!colspan="6" |Comparison of various Dreamcast connectivity options
|-
|style="background-color:#c0c0c0;" width="150" | Device:
|style="background-color:#d0d0d0;" width="400" | [[Broadband adapter]] (HIT-400 or HIT-401) <br />Realtek RTL8139C chipset
|style="background-color:#d0d0d0;" width="400" | [[LAN adapter]] (HIT-300) <br />Fujitsu MB86967 chipset
|style="background-color:#d0d0d0;" width="400" | [[Modem]] with [[DreamPi]]
|style="background-color:#d0d0d0;" width="400" | USB [[Coder's cable]]
|style="background-color:#d0d0d0;" width="400" | Serial [[Coder's cable]]
|-
|style="background-color:#d0d0d0;" | Useful for dev? || Yes, supports dcload-ip || Yes, supports dcload-ip,<br/>but BBA is superior and cheaper || No, only useful for online multiplayer gaming || Yes, supports dcload-serial || Yes, supports dcload-serial
|-
|style="background-color:#d0d0d0;" | Cost || $100 - $200 and up on used markets || $200 and up on used markets,<br/>due to extreme rarity || Kit prices vary, around $100 || Varies on used markets, uncommonly sold<br />RetroOnyx sells for $85 || Varies on used markets, uncommonly sold
|-
|style="background-color:#d0d0d0;" | Can make DIY? || No || No || Yes || Yes || Yes
|-
|style="background-color:#d0d0d0;" | Performance || Up to 100 megabits/s || Up to 10 megabits/s || Up to 56 kilobits/s || Up to 1500 kilobits/s || Up to 120 kilobits/s
|-
|style="background-color:#d0d0d0;" | Games support || Some games: Phantasy Star Online, Quake III Arena, Toy Racer, POD SpeedZone, Propellor Arena, Unreal Tournament<br />Some browsers: Broadband Passport, PlanetWeb 3.0 || No games<br />One browser: Dream Passport for LAN || All multiplayer games with network support<br />All web browsers || NO multiplayer games support || NO multiplayer games support
|-
|style="background-color:#d0d0d0;" | Homebrew support || Homebrew utilities like dcload-ip || Homebrew utilities like dcload-ip || Homebrew utilities don't support, only multiplayer games || Homebrew utilities like dcload-serial || Homebrew utilities like dcload-serial
|}

=Setting up and compiling the toolchain with the dc-chain script=
===Dependencies===
First, we'll need to install dependencies before building the toolchain. Below we have provided commands to install these dependencies on various systems. Many of the packages will likely already be installed on your system, but we have provided an exhaustive list for good measure.
====macOS 13 Ventura on an Intel or Apple Silicon processor====
First, make sure you install Apple Xcode, including the Command Line tools. You will also need to install several other packages for which we'll include instructions assuming you have installed the [https://brew.sh/ Homebrew] package manager on your system.
brew install wget gettext texinfo gmp mpfr libmpc libelf jpeg-turbo libpng meson libisofs

''Important Note for Apple Silicon users'': On Apple Silicon, Homebrew installs libraries to a path not included by default by the compiler. If you haven't added these to your '''~/.zprofile''', then add the following lines now and reload your session (or run them in your Terminal session whenever you compile KOS):
export CPATH=/opt/homebrew/include
export LIBRARY_PATH=/opt/homebrew/lib

====Debian/Ubuntu-based Linux====
sudo apt install gawk patch bzip2 tar make libgmp-dev libmpfr-dev libmpc-dev gettext wget libelf-dev texinfo bison flex sed git build-essential diffutils curl libjpeg-dev libpng-dev python3 pkg-config libisofs-dev meson ninja-build

====Fedora-based Linux====
sudo dnf install gawk patch bzip2 tar make gmp-devel mpfr-devel libmpc-devel gettext wget elfutils-libelf-devel texinfo bison flex sed git diffutils curl libjpeg-turbo-devel libpng-devel gcc-c++ python3 meson ninja-build

====Arch-based Linux====
sudo pacman -S --needed gawk patch bzip2 tar make gmp mpfr libmpc gettext wget libelf texinfo bison flex sed git diffutils curl libjpeg-turbo libpng python3 meson

====Other Linux distributions====
If you're using a different Linux- or Unix-based system besides the one above, you may need to reference your distribution's package database and package manager documentation for the equivalent package names and commands necessary for your system.

===Creating a space for your toolchain installation===
Create the path where we'll install the toolchain and KOS, and grant it the proper permissions:
sudo mkdir -p /opt/toolchains/dc
sudo chmod -R 755 /opt/toolchains/dc
sudo chown -R $(id -u):$(id -g) /opt/toolchains/dc
===Cloning the KOS git repository===
Clone the KOS git repository to your system:
git clone https://github.com/KallistiOS/KallistiOS.git /opt/toolchains/dc/kos

===Configuring the dc-chain script===
Enter the dc-chain directory:
cd /opt/toolchains/dc/kos/utils/dc-chain
You'll need to choose one of the following pre-made toolchain configurations. The '''testing''' version uses GCC 13.1.0 with Newlib 4.3.0, the '''stable''' version uses GCC 9.3.0 and Newlib 3.3.0, and the '''legacy''' version uses GCC 4.7.4 and Newlib 2.2.0.
We suggest trying to use the ''testing'' or ''stable'' version. The latest version has more features, while the stable version has long been used by the community and is known to work well. Run one of the following commands to make your choice:
'''(for GCC 13.1):''' mv config.mk.testing.sample config.mk
'''(for GCC 9.3):''' mv config.mk.stable.sample config.mk
'''(for GCC 4.7):''' mv config.mk.legacy.sample config.mk
Now, you may configure config.mk options to your liking by using a text editor. You may alter the <code>makeopts</code> parameter to the number of threads available on your CPU to speed up the compilation, if desired. However, if you run into errors during compilation, you may want to set <code>makeopts=-j1</code>, as on some operating systems the toolchain may fail to build with a higher setting.

===Downloading and compiling the toolchain===
Now we will run a script to download files and compile the toolchain. At this point, we have the option of building both the main CPU SH4 compiler and the AICA sound processor ARM compiler, or we can skip the ARM compiler and just build the SH4 compiler. Thankfully, KallistiOS includes a prebuilt sound driver, so the ARM compiler is only necessary if you're wanting to make changes to the sound driver or write custom code to run on the sound processor.
To build '''only the SH4 compiler''':
make build-sh4
To build '''both''' the SH4 and the ARM compilers:
make
This will download and unpack the relevant necessary files and then begin the compilation process. The compilation can take anywhere from minutes to a few hours depending on your CPU and number of threads available. When successfully finished, the toolchains will be ready.

Afterwards, if desired, you may also compile the GNU Debugger (gdb) as well:
make gdb

The GNU Debugger is now installed along with your toolchains.

===Cleaning up temporary files===
After building everything, you can clean up the extraneous files in your dc-chain directory by entering:
make clean

=Configuring and compiling KOS and kos-ports=
===Setting up the environment settings===
Enter the KOS directory:
cd /opt/toolchains/dc/kos
Copy the pre-made environment script into place:
cp doc/environ.sh.sample environ.sh
For most users, the default settings will suffice. However, advanced users may the environ.sh to your liking if you'd like to change compile flags or alter paths. If you'd like to have multiple KOS versions installed or multiple toolchain versions installed, you can set up different environ.sh files corresponding to these different configurations by altering the paths. Run the source command on the desired environ.sh file to select that configuration prior to compiling your project.

You will need to run the source command to apply the KOS environment settings to your currently running shell. Run the following now, '''and''' ''whenever'' you open a new shell to work on Dreamcast projects:
source /opt/toolchains/dc/kos/environ.sh

===Building KOS===
Build KOS:
make
KOS is now built.
===Building kos-ports===
Clone the kos-ports repository to your system:
git clone --recursive https://github.com/KallistiOS/kos-ports /opt/toolchains/dc/kos-ports
Run the script to build all of the included ports:
/opt/toolchains/dc/kos-ports/utils/build-all.sh
kos-ports is now built.

===Building the KOS examples===
Enter the KOS examples directory:
cd /opt/toolchains/dc/kos/examples/dreamcast
Build the examples:
make
All of the example programs provided with KallistiOS are now built.

=Running an example program through a debug link=
'''TODO''': ''Give a tutorial on compiling dcload/dc-tool, setting up a serial, USB, or IP debug link, and running 2ndmix demo on real hardware.''

Download and burn the [[:File:Dcload-2022-12-17.zip|latest versions of dcload-ip or dcload-serial]] -- the IP version includes improved DHCP support, so there is no longer a need to configure things beforehand.

Run one of the examples from the <code>kos/examples/dreamcast</code> directory with the following command:
dc-tool-ip -t <dreamcast IP address> -x example.elf
Run <code>dc-tool-ip</code> without any parameters to get additional options.

=Burning an example program to CD-R=
'''TODO''': ''Explain how to build mkdcdisc and write 2ndmix demo to CD-R and run on a Dreamcast console''

[https://gitlab.com/simulant/mkdcdisc mkdcdisc] can be used to easily generate a burnable self-boot CDI image.
Build <code>mkdcdisc</code>:
git clone https://gitlab.com/simulant/mkdcdisc.git
cd mkdcdisc
meson setup builddir
meson compile -C builddir
./builddir/mkdcdisc -h
and create a CDI image from your compiled ELF like so:
mkdcdisc -e MyProgram.elf -o MyProgram.cdi
Then you can burn the CDI file using DiscJuggler (Windows-only, but also works through [https://www.winehq.org/ WINE]), ImgBurn with the CDI plugin, or the cdiburn *nix script floating around out there. (document this better)

=Creating your first project=
'''TODO''': ''Explain how to create a new DC project folder with Makefile, adding an external library, create a basic program, and compile it''

=Further reading=
'''TODO''': Links to articles for using gdb, integrating the dev setup with an IDE, etc.