GCC-SH4 tips: Difference between revisions
m (BBHoodsta moved page Dreamcast optimization to GCC-SH4 tips: Title was too general. Content was related to GCC and SH4 optimizations. Its already under Optimizations so no need to add that to the title) |
No edit summary |
||
(9 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
The original documents can be found [here.](https://dreamcast.wiki/wiki/images/f/f6/Gcc_asm_sh4_tips.txt) | The original documents can be found [here.](https://dreamcast.wiki/wiki/images/f/f6/Gcc_asm_sh4_tips.txt) | ||
== | ==Megan Potter's GCC-SH4 tips== | ||
===Use local variables=== | ===Use local variables=== | ||
Global variables are slow - to retrieve the value, the SH4 typically must execute: | Global variables are slow - to retrieve the value, the SH4 typically must execute: | ||
< | <syntaxhighlight lang="asm"> | ||
mov.l L2,r1 | |||
mov.l @r1,r1 | |||
</syntaxhighlight> | |||
Local variables are faster - it's stack-relative, and '''function parameters are even faster because the first four integers parameters are passed in R4-R7 and first eight floating-point parameters in FR4-FR11'''.<ref>''[https://gcc-renesas.com/manuals/SH-ABI-Specification.html]'', SH ABI(Application Binary Interface) for GCC</ref> | Local variables are faster - it's stack-relative, and '''function parameters are even faster because the first four integers parameters are passed in R4-R7 and first eight floating-point parameters in FR4-FR11'''.<ref>''[https://gcc-renesas.com/manuals/SH-ABI-Specification.html]'', SH ABI(Application Binary Interface) for GCC</ref> | ||
Line 27: | Line 28: | ||
== Division == | == Division == | ||
Using division on the Dreamcast is very costly. | Using division on the Dreamcast is very costly. Division('''fdiv''') takes ~13 cycles.<ref name="sh4asm">''[http://www.shared-ptr.com/sh_insns.html]'', SH4 Assembly Instructions</ref> | ||
=== Integer === | === Integer === | ||
If you're using integer and want to divide by a power of two, you're better off using bit shifting. | If you're using integer and want to divide by a power of two, you're better off using bit shifting. (Note: GCC does this conversion automatically, and has done so for at least 10 years now, so you don't need to worry about it) | ||
<syntaxhighlight lang="asm"> | |||
< | int result = var1 >> 1; //is the same as var1 / 2 but way faster | ||
int result = var1 >> 2; // = var1 / 4 | |||
int result = var1 >> 3; // = var1 / 8 | |||
</syntaxhighlight> | |||
=== Float === | === Float === | ||
If you're using float, transforming your division into a multiplication is also faster. | If you're using float, transforming your division into a multiplication is also faster. Multiplication('''fmul''') only takes ~3 cycles.<ref name="sh4asm"/> | ||
<syntaxhighlight lang="asm"> | |||
< | float result = var1 * 0.5; // same as / 2 but way faster | ||
float result = var1 * 0.25; // same as / 4 but way faster | |||
float result = var1 * 0.1; // same as / 10 but way faster | |||
</syntaxhighlight> | |||
==References== | ==References== | ||
<references/> | <references/> |
Latest revision as of 19:26, 1 April 2023
These notes is mostly taken from the good advice of all the Dreamcast scholars (Ian Micheal, Moop, MrNeo, and more). The original documents can be found [here.](https://dreamcast.wiki/wiki/images/f/f6/Gcc_asm_sh4_tips.txt)
Megan Potter's GCC-SH4 tips
Use local variables
Global variables are slow - to retrieve the value, the SH4 typically must execute:
mov.l L2,r1
mov.l @r1,r1
Local variables are faster - it's stack-relative, and function parameters are even faster because the first four integers parameters are passed in R4-R7 and first eight floating-point parameters in FR4-FR11.[1]
Write small functions
We've noticed GCC generates very pessimal code when it starts to spill registers, so try to avoid doing too much in one function.
A function which exceeds more than about a hundred lines should be broken into smaller functions.
Use struct copies (instead of copying individual elements of a struct)
GCC and G++ generate code with weak scheduling when copying a struct by individual elements. GCC and G++ generate code with better instruction scheduling when copying a struct via struct assignment.
Division
Using division on the Dreamcast is very costly. Division(fdiv) takes ~13 cycles.[2]
Integer
If you're using integer and want to divide by a power of two, you're better off using bit shifting. (Note: GCC does this conversion automatically, and has done so for at least 10 years now, so you don't need to worry about it)
int result = var1 >> 1; //is the same as var1 / 2 but way faster
int result = var1 >> 2; // = var1 / 4
int result = var1 >> 3; // = var1 / 8
Float
If you're using float, transforming your division into a multiplication is also faster. Multiplication(fmul) only takes ~3 cycles.[2]
float result = var1 * 0.5; // same as / 2 but way faster
float result = var1 * 0.25; // same as / 4 but way faster
float result = var1 * 0.1; // same as / 10 but way faster