GCC-SH4 tips: Difference between revisions

From dreamcast.wiki
Jump to navigation Jump to search
No edit summary
No edit summary
(9 intermediate revisions by the same user not shown)
Line 8: Line 8:
Global variables are slow - to retrieve the value, the SH4 typically must execute:
Global variables are slow - to retrieve the value, the SH4 typically must execute:


<code>mov.l L2,r1</code>
<syntaxhighlight lang="asm">
 
mov.l L2,r1
<code>mov.l @r1,r1</code>
mov.l @r1,r1
</syntaxhighlight>


Local variables are faster - it's stack-relative, and '''function parameters are even faster because the first four integers parameters are passed in R4-R7 and first eight floating-point parameters in FR4-FR11'''.<ref>''[https://gcc-renesas.com/manuals/SH-ABI-Specification.html]'', SH ABI(Application Binary Interface) for GCC</ref>
Local variables are faster - it's stack-relative, and '''function parameters are even faster because the first four integers parameters are passed in R4-R7 and first eight floating-point parameters in FR4-FR11'''.<ref>''[https://gcc-renesas.com/manuals/SH-ABI-Specification.html]'', SH ABI(Application Binary Interface) for GCC</ref>
Line 27: Line 28:
== Division ==
== Division ==


Using division on the Dreamcast is very costly.
Using division on the Dreamcast is very costly. Division('''fdiv''') takes ~13 cycles.<ref name="sh4asm">''[http://www.shared-ptr.com/sh_insns.html]'', SH4 Assembly Instructions</ref>


=== Integer ===
=== Integer ===
If you're using integer and want to divide by a power of two, you're better off using bit shifting.
If you're using integer and want to divide by a power of two, you're better off using bit shifting. (Note: GCC does this conversion automatically, and has done so for at least 10 years now, so you don't need to worry about it)
 
<syntaxhighlight lang="asm">
<code>int result = var1 >> 1; //is the same as var1 / 2 but way faster</code>
int result = var1 >> 1; //is the same as var1 / 2 but way faster
 
int result = var1 >> 2; // = var1 / 4
<code>int result = var1 >> 2; // = var1 / 4</code>
int result = var1 >> 3; // = var1 / 8
 
</syntaxhighlight>
<code>int result = var1 >> 3; // = var1 / 8</code>


=== Float ===
=== Float ===
If you're using float, transforming your division into a multiplication is also faster. Multiplying takes 3 cycles while division takes around 13 cycles.<ref>''[http://www.shared-ptr.com/sh_insns.html]'', SH4 Assembly Instructions</ref>
If you're using float, transforming your division into a multiplication is also faster. Multiplication('''fmul''') only takes ~3 cycles.<ref name="sh4asm"/>
 
<syntaxhighlight lang="asm">
<code>float result = var1 * 0.5; // same as / 2 but way faster</code>
float result = var1 * 0.5; // same as / 2 but way faster
 
float result = var1 * 0.25; // same as / 4 but way faster
<code>float result = var1 * 0.25; // same as / 4 but way faster</code>
float result = var1 * 0.1; // same as / 10 but way faster
 
</syntaxhighlight>
<code>float result = var1 * 0.1; // same as / 10 but way faster</code>


==References==
==References==
<references/>
<references/>

Revision as of 12:59, 30 May 2020

These notes is mostly taken from the good advice of all the Dreamcast scholars (Ian Micheal, Moop, MrNeo, and more). The original documents can be found [here.](https://dreamcast.wiki/wiki/images/f/f6/Gcc_asm_sh4_tips.txt)

Dan Potter's GCC-SH4 tips

Use local variables

Global variables are slow - to retrieve the value, the SH4 typically must execute:

mov.l L2,r1
mov.l @r1,r1

Local variables are faster - it's stack-relative, and function parameters are even faster because the first four integers parameters are passed in R4-R7 and first eight floating-point parameters in FR4-FR11.[1]

Write small functions

We've noticed GCC generates very pessimal code when it starts to spill registers, so try to avoid doing too much in one function.

A function which exceeds more than about a hundred lines should be broken into smaller functions.

Use struct copies (instead of copying individual elements of a struct)

GCC and G++ generate code with weak scheduling when copying a struct by individual elements. GCC and G++ generate code with better instruction scheduling when copying a struct via struct assignment.

Division

Using division on the Dreamcast is very costly. Division(fdiv) takes ~13 cycles.[2]

Integer

If you're using integer and want to divide by a power of two, you're better off using bit shifting. (Note: GCC does this conversion automatically, and has done so for at least 10 years now, so you don't need to worry about it)

int result = var1 >> 1; //is the same as var1 / 2 but way faster
int result = var1 >> 2; // = var1 / 4
int result = var1 >> 3; // = var1 / 8

Float

If you're using float, transforming your division into a multiplication is also faster. Multiplication(fmul) only takes ~3 cycles.[2]

float result = var1 * 0.5; // same as / 2 but way faster
float result = var1 * 0.25; // same as / 4 but way faster
float result = var1 * 0.1; // same as / 10 but way faster

References

  1. [1], SH ABI(Application Binary Interface) for GCC
  2. 2.0 2.1 [2], SH4 Assembly Instructions