GCC-SH4 tips: Difference between revisions

From dreamcast.wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(14 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{MARKDOWN}}
 
These notes is mostly taken from the good advice of all the Dreamcast scholars (Ian Micheal, Moop, Mrneo, and more).  
These notes is mostly taken from the good advice of all the Dreamcast scholars (Ian Micheal, Moop, MrNeo, and more).  
The original documents can be found [here.](https://dreamcast.wiki/wiki/images/f/f6/Gcc_asm_sh4_tips.txt)
The original documents can be found [here.](https://dreamcast.wiki/wiki/images/f/f6/Gcc_asm_sh4_tips.txt)


## Dan Potter's GCC-SH4 tips
==Megan Potter's GCC-SH4 tips==
### Use local variables.
===Use local variables===


Global variables are slow - to retrieve the value, the SH4 typically must execute:
Global variables are slow - to retrieve the value, the SH4 typically must execute:


<syntaxhighlight lang="asm">
mov.l L2,r1
mov.l @r1,r1
</syntaxhighlight>


`mov.l L2,r1` 
Local variables are faster - it's stack-relative, and '''function parameters are even faster because the first four integers parameters are passed in R4-R7 and first eight floating-point parameters in FR4-FR11'''.<ref>''[https://gcc-renesas.com/manuals/SH-ABI-Specification.html]'', SH ABI(Application Binary Interface) for GCC</ref>
`mov.l @r1,r1` 


Local variables are faster - it's stack-relative, and **function parameters are even faster because the first four integers parameters are passed in r4-r7 and first eight floating-point parameters in fr4-fr11**.
===Write small functions===
 
### Write small functions.


We've noticed GCC generates very pessimal code when it starts to spill registers, so try to avoid doing too much in one function.
We've noticed GCC generates very pessimal code when it starts to spill registers, so try to avoid doing too much in one function.
Line 20: Line 21:
A function which exceeds more than about a hundred lines should be broken into smaller functions.
A function which exceeds more than about a hundred lines should be broken into smaller functions.


### Use struct copies (instead of copying individual elements of a struct).
===Use struct copies (instead of copying individual elements of a struct)===


GCC and G++ generate code with weak scheduling when copying a struct by individual elements.   
GCC and G++ generate code with weak scheduling when copying a struct by individual elements.   
GCC and G++ generate code with better instruction scheduling when copying a struct via struct assignment.
GCC and G++ generate code with better instruction scheduling when copying a struct via struct assignment.


## Division  
== Division ==
 
Using division on the Dreamcast is very costly.


### Integer
Using division on the Dreamcast is very costly. Division('''fdiv''') takes ~13 cycles.<ref name="sh4asm">''[http://www.shared-ptr.com/sh_insns.html]'', SH4 Assembly Instructions</ref>
If you're using integer and want to divide by a power of two, you're better off using bit shifting.


`int result = var1 >> 1; //is the same as var1 / 2 but was faster
=== Integer ===
`int result = var1 >> 2; // = var1 / 4
If you're using integer and want to divide by a power of two, you're better off using bit shifting. (Note: GCC does this conversion automatically, and has done so for at least 10 years now, so you don't need to worry about it)
`int result = var1 >> 3; // = var1 / 8`
<syntaxhighlight lang="asm">
int result = var1 >> 1; //is the same as var1 / 2 but way faster
int result = var1 >> 2; // = var1 / 4
int result = var1 >> 3; // = var1 / 8
</syntaxhighlight>


### Float
=== Float ===
If you're using float, transforming your division into a multiplication is also faster. Multiplying takes 3 cycles while division takes around 13 cycles.
If you're using float, transforming your division into a multiplication is also faster. Multiplication('''fmul''') only takes ~3 cycles.<ref name="sh4asm"/>
`float result = var1 * 0.5; // same as / 2 but way faster
<syntaxhighlight lang="asm">
`float result = var1 * 0.25; // same as / 4 but way faster
float result = var1 * 0.5; // same as / 2 but way faster
`float result = var1 * 0.1; // same as / 10 but way faster`
float result = var1 * 0.25; // same as / 4 but way faster
float result = var1 * 0.1; // same as / 10 but way faster
</syntaxhighlight>


==References==
==References==
<references/>
<references/>

Latest revision as of 19:26, 1 April 2023

These notes is mostly taken from the good advice of all the Dreamcast scholars (Ian Micheal, Moop, MrNeo, and more). The original documents can be found [here.](https://dreamcast.wiki/wiki/images/f/f6/Gcc_asm_sh4_tips.txt)

Megan Potter's GCC-SH4 tips

Use local variables

Global variables are slow - to retrieve the value, the SH4 typically must execute:

mov.l L2,r1
mov.l @r1,r1

Local variables are faster - it's stack-relative, and function parameters are even faster because the first four integers parameters are passed in R4-R7 and first eight floating-point parameters in FR4-FR11.[1]

Write small functions

We've noticed GCC generates very pessimal code when it starts to spill registers, so try to avoid doing too much in one function.

A function which exceeds more than about a hundred lines should be broken into smaller functions.

Use struct copies (instead of copying individual elements of a struct)

GCC and G++ generate code with weak scheduling when copying a struct by individual elements. GCC and G++ generate code with better instruction scheduling when copying a struct via struct assignment.

Division

Using division on the Dreamcast is very costly. Division(fdiv) takes ~13 cycles.[2]

Integer

If you're using integer and want to divide by a power of two, you're better off using bit shifting. (Note: GCC does this conversion automatically, and has done so for at least 10 years now, so you don't need to worry about it)

int result = var1 >> 1; //is the same as var1 / 2 but way faster
int result = var1 >> 2; // = var1 / 4
int result = var1 >> 3; // = var1 / 8

Float

If you're using float, transforming your division into a multiplication is also faster. Multiplication(fmul) only takes ~3 cycles.[2]

float result = var1 * 0.5; // same as / 2 but way faster
float result = var1 * 0.25; // same as / 4 but way faster
float result = var1 * 0.1; // same as / 10 but way faster

References

  1. [1], SH ABI(Application Binary Interface) for GCC
  2. 2.0 2.1 [2], SH4 Assembly Instructions