SH4 in Compiler Explorer: Difference between revisions

From dreamcast.wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 17: Line 17:
* The __builtin_prefetch intrinsic does seem to generate a single "pref" instruction and should be preferred.
* The __builtin_prefetch intrinsic does seem to generate a single "pref" instruction and should be preferred.
* The compiler does not seem smart enough to utilize the FIPR (inner/dot product), FMAC (multiply and accumulate), or FTRV (transform vector) instructions regardless of how embarrassingly vectorizable the supplied C code seems to be, so linear algebra routines are forced to use inline assembly to fully leverage the SH4's SIMD instructions.
* The compiler does not seem smart enough to utilize the FIPR (inner/dot product), FMAC (multiply and accumulate), or FTRV (transform vector) instructions regardless of how embarrassingly vectorizable the supplied C code seems to be, so linear algebra routines are forced to use inline assembly to fully leverage the SH4's SIMD instructions.
* Typically smaller code sizes and more tightly optimized code is seen with newer versions of GCC versus the older ones; however, this is not always the case.

Revision as of 00:38, 30 December 2022

Thanks to the effort of Matt Godbolt (who hilariously enough is a former Dreamcast developer himself), the SuperH GCC toolchain is now available for use with Compiler Explorer, along with all of the SH4-specific compiler flags and options typically used when targeting the Dreamcast. This gives us an invaluable tool for getting quick and immediate feedback on how well a given C or C++ source segment tends to translate into SH4 assembly, offering a little sandbox for testing and optimizing code.

Configuration

Dreamcast-like SuperH GCC Compiler Configuration

To arrive at a configuration mirroring a Dreamcast development environment, first select one of the GCC compiler versions for the SH architecture. Secondly, the following compiler options should be used as the baseline configuration:

  • -ml: Compile code for the processor in little-endian mode.
  • -m4-single-only: Generate code for the SH4 with a floating-point unit that only supports single-precision arithmetic.
  • -ffast-math: Breaks strict IEEE compliance and allows for faster floating point approximations
  • -O3: optimization level 3
  • -mfsrra: enables emission of the fsrra instruction for reciprocal square root approximations.
  • -mfsca: enables emission of the fsca instruction for sine and cosine approximations

Tips and Notes

  • It has been noted that while -O3 is claimed to be the highest optimization level according to recent GCC documentation, some code differences can still be scene under certain circumstances when using -O4 and beyond.
  • The compiler seems to ignore both -mfsrra and -mfsca without the -ffast-math and -m4-single-only options.
  • It is highly recommended that C code is written to use -mfsrra (1.0/sqrt(N)) and -mfsca (builtin sin/cos) over using inline assembly directly, as this seems to give the compiler more context for code optimization around these instructions.
  • The __builtin_prefetch intrinsic does seem to generate a single "pref" instruction and should be preferred.
  • The compiler does not seem smart enough to utilize the FIPR (inner/dot product), FMAC (multiply and accumulate), or FTRV (transform vector) instructions regardless of how embarrassingly vectorizable the supplied C code seems to be, so linear algebra routines are forced to use inline assembly to fully leverage the SH4's SIMD instructions.
  • Typically smaller code sizes and more tightly optimized code is seen with newer versions of GCC versus the older ones; however, this is not always the case.