SH4 FTRV Optimizations
Without a doubt, the single most computationally powerful instruction on the SuperH4 CPU in the Sega Dreamcast is FTRV, or the Floating-point TRansform Vector instruction. It is a single instruction which multiplies a 4D vector by the 4x4 matrix held within the back-bank of FPU registers, XMTRX. This article will teach you how to leverage this god instruction for FP performance gainz and introduce you to several example scenarios that have yielded fantastic gainz within the community.
Relationship to FIPR
Format | Function | Encoding | Group | Issue Cycles | Latency Cycles |
---|---|---|---|---|---|
fipr FVm,FVn | inner_product (FVm, FVn) -> FR[n+3] | 1111nnmm11101101 | FE | 1 | 4/5 |
ftrv XMTRX, FVn | transform_vector(XMTRX, FVn) -> FVn | 1111nn0111111101 | FE | 1 | 5/8 |
When to use FTRV
Real-World Examples
The following are real-world examples of FTRV-based optimizations used within games and applications for the Sega Dreamcast within the community.
Vertex Position Transformation
The first and most obvious use of the FTRV instruction is for doing position transform calculations on the incoming vertex stream, transforming from local to view-space, while submitting vertices to the PowerVR during T&L. This is the first and absolute most crucial area for leveraging FTRV and was its original intended purpose. If you do nothing else with the instruction, bear in mind that the only way to come even remotely close to pushing a considerable volume of polygons on the DC is by properly harnessing the SH4 by using FTRV to transform your vertices.
Diffuse Lighting
Collision and Physics
Bounding Sphere vs View Frustum Culling
The following code snippet is taken from the DCA3 codebase as part of its Renderware driver back-end for Dreamcast. To check for intersection between a bounding sphere and the view frustum, you must compute the dot product between the centroid of the bounding sphere against all 6 of the view frustum planes. The result of each dot product represents the sphere's distance from that plane.
The original algorithm was as follows:
int32 Camera::frustumTestSphere(const Sphere *s) const {
int32 res = SPHEREINSIDE;
// Iterate over each of the 6 frustum planes.
const FrustumPlane *p = this->frustumPlanes;
for(int32 i = 0; i < 6; i++){
// Compute dot product between each plane and the centroid.
// Your FIPR senses should be tingling, since we've got dot products!
float32 distance = dot(p->plane, s->center) - p->plane.distance;
if(s->radius < distance)
return SPHEREOUTSIDE; // No intersection
if(s->radius > -distance)
res = SPHEREBOUNDARY; // Intersection
p++;
}
return res;
}
Since we have a scenario where we have 4+ dot products being taken where one of the vectors remains constant for each, this is a textbook case for using FTRV to compute 4 of them in parallel.
So we use FIPR to accelerate two of the dot products and FTRV for the other 4 of them at once. The following code uses libSH4ZAM to achieve this:
int32 Camera::frustumTestSphere(const Sphere *s) const {
int32 res = SPHEREINSIDE;
const FrustumPlane *p = this->frustumPlanes;
// Use FIPR to accelerate first two dot products independently
for(unsigned i = 0; i < 2; ++i) {
float distance = shz_vec4_dot(s->center, p->plane);
if(s->radius < distance)
return SPHEREOUTSIDE;
else if(s->radius > -distance)
res = SPHEREBOUNDARY;
p++;
}
/* Since each plane is a 4D vector, we can load each one as a column vector
into XMTRX, creating a 4x4 matrix out of the 4 planes. */
shz_xmtrx_load_4x4_cols(&p[0].plane, &p[1].plane, &p[2].plane, &p[3].plane);
/* Now we transform our constant vector, the bounding sphere's center, against
our 4 plane vectors. This gives us a result vector where the value of each
component is equal to the dot product between the sphere's centroid and the
corresponding plane column vector. */
shz_vec4_t distances = shz_xmtrx_trans_vec4(s->center);
/* Now we simply iterate over each of our 4 result components to check
for intersection. */
for(unsigned i = 0; i < 4; ++i) {
if(s->radius < distances.elem[0])
return SPHEREOUTSIDE;
else if(s->radius > -distances.elem[0])
res = SPHEREBOUNDARY;
}
return res;
}