SH4 in Compiler Explorer

Jump to navigation Jump to search

Thanks to the effort of Matt Godbolt (who hilariously enough is a former Dreamcast developer himself), the SuperH GCC toolchain is now available for use with Compiler Explorer, along with all of the SH4-specific compiler flags and options typically used when targeting the Dreamcast. This gives us an invaluable tool for getting quick and immediate feedback on how well a given C or C++ source segment tends to translate into SH4 assembly, offering a little sandbox for testing and optimizing code targeting the Dreamcast.


SH GCC Toolchain configured for the Dreamcast's SH4 CPU in Compiler Explorer

To arrive at a configuration mirroring a Dreamcast development environment, first select one of the GCC compiler versions for the SH architecture. Secondly, the following compiler options should be used as the baseline configuration:

  • -ml: Compile code for the processor in little-endian mode
  • -m4-single-only: Generate code for the SH4 with a floating-point unit that only supports single-precision arithmetic
  • -ffast-math: Breaks strict IEEE compliance and allows for faster floating point approximations
  • -O3: optimization level 3
  • -mfsrra: enables emission of the fsrra instruction for reciprocal square root approximations.
  • -mfsca: enables emission of the fsca instruction for sine and cosine approximations

Convenience Templates

The following are pre-configured templates you can use as sample Dreamcast build configurations:

Tips and Notes

  • It has been noted that while -O3 is claimed to be the highest optimization level according to recent GCC documentation, some code differences can still be seen under certain circumstances when using -O4 and beyond.
  • The compiler seems to ignore both -mfsrra and -mfsca without the -ffast-math and -m4-single-only options.
  • It is highly recommended that C code is written to use -mfsrra (1.0/sqrt(N)) and -mfsca (builtin sin/cos) over using inline assembly directly, as this seems to give the compiler more context for code optimization around these instructions.
  • The __builtin_prefetch intrinsic does seem to generate a single "pref" instruction and should be preferred over inline assembly.
  • The compiler does not seem smart enough to utilize the FIPR (inner/dot product), FMAC (multiply and accumulate), or FTRV (transform vector) instructions regardless of how embarrassingly vectorizable the supplied C code seems to be, so linear algebra routines are forced to use inline assembly to fully leverage the SH4's SIMD instructions.
  • Typically smaller code sizes and more tightly optimized code are seen with newer versions of GCC versus the older ones; however, this is not always the case.
  • Evidently, even without a branch predictor, the C++20 [[likely]] and [[unlikely]] attributes as well as the GCC intrinsic __builtin_expect() can have a fairly profound impact on code generation and optimization for conditionals and branches. More information can be found here.