## Monday, June 28, 2010

### The Wonders of Fixed Point Math - Or how I got XNA to cull thousnands of objects fast

So what is fixed point math ? Well you could say that it's an old technique to approximate decimal numbers on computers without floating point processors.

So why should we care about it then ?

A good questions there are multiple instances where using a fixed point representation can help you considerably. One is if you are running into the problems of the inconsistencies that are at the core of how we handle floating point math on modern computers. Or if you are simply running on a platform where the floating point performance is way to low and you don't need 6-7 decimals of accuracy.

For me while working professionally it has been along time since I had to use fixed point. But then again there has been plenty of cases it has been under serious discussion to resolve issues with floating point inconsistencies and I know of numerous companies that have used them to battle the problems.

So what is fixed point math ? Well you could say that it's an old technique to approximate decimal numbers on computers without floating point processors.

So why should we care about it then ?

A good questions there are multiple instances where using a fixed point representation can help you considerably. One is if you are running into the problems of the inconsistencies that are at the core of how we handle floating point math on modern computers. Or if you are simply running on a platform where the floating point performance is way to low and you don't need 6-7 decimals of accuracy.

For me while working professionally it has been along time since I had to use fixed point. But then again there has been plenty of cases it has been under serious discussion to resolve issues with floating point inconsistencies and I know of numerous companies that have used them to battle the problems.

So first out what are the problems with floating point ? Well to represent it we have to understand how the computer works with floating point numbers. Detailed explanations of the exact formats are done at numerous places for example at Wikipedia instead of trying to act like an encyclopedia we will here use a simplified view of how floating point number works that despite being simplified works very well for all uses unless you are building the numbers yourself from byte data.

A regular float is 32 bits large on most computer systems, And a double is 64 bits, Half sometimes called single is 16 bits. So what does those bits means. Well do you remember exponents from school ?

12345 = 1.2345 *10^4

In computers we have a special term for the 10th exponent that would write it like this.

12345 = 1.2345E4

floating point numbers on a computer works like this too. you have two parts the first part called Significand which is a representation of the decimal number of the base. and then Exponent which is the exponent used to calculate the real number.

However since computers are binary there are some differences to what you are used to. The general formula is as follows.

RealNumber=Significand*Base^Exponent.

Where significand can never be a higher value than base. so the extra data bits are used to represent decimal information. In fact this part works exactly like a fixed point number except the bits you use for the exponent. Since computers works in binary systems the base of course is 2.

So the Formula is

RealNumber=Significand*2^Exponent

The interesting parts here is the max value of the significand and of the exponent. The exponent has a range from -127 to 128. while the significand ranges from -8388607 to 8388608 so this might seem meningless to you but this gives us two valuable observations first out we have the max size of a float  if we take 2^128 and then convert that to a 10 exponent we will get a value range of aproximately ± ~10-44.85 to ~1038.53 According to the standard specification. The more interesting value for me is 8388608. or 8.388608 as you can think of it as. because this gives you and idea of the number of decimals in a standard floating number as you can see we have 6 decimals here. But due to conversion effects etc we will have slightly less than this in reality but lets go with 6 for now. We should also mention here that performing a multiplication or division invalidates 1-2 decimals lets say one for the sake of simplicity leaving us with 5 decimals.

This is obviously aplenty if we deal with numbers between 1 and 10 then we have 5 decimals of free use and this should be enough for most things. but what happens if we work on numbers between 10 000 and 100 000 for example in a really large game world. What will happens is that we have only 1 decimal of precision left. which means if this is a map we only have 0.1 M as our smallest granularity. and this is discounting the problems inherent in the conversions between binary and decimal numbers.

So any advances calculations like intersecting lines with lines etc. Will have vastly different precision depending on where on the map it happens. This can't be good. A consistent precision is vital to a consistent game. So there is one of the biggest problems with floating point.

And is fixed point better ? Well theoretically you could create a pretty neat fixed point using 64 bit values but then again 64 bits doubles has a lot better precision too. So fixed point doesn't have a better precision in fact it's probably worse and even worse fixed point  numbers normally range from -32767 to 32768 only. But if you can live with this range (or well you can actually trade range for precision freely) it has a constant precision which means if you get your calculations to work somewhere in the world they will work everywhere.

And of course if you are developing for a platform with a very weak floating point performance it (like XNA on the Xbox360) it can be a real performance saver. Actually as an example for the rest of the article we will look at this from the perspective of a XNA programmer getting desperate when he gets CPU limited on xbox performing culling on a measly 10 000 objects (in a oct tree of course but they has culling costs too, and with cascaded shadow volumes you have to perform the culling test a lot of times.).

So lets look at how fixed point works. Well do you remember fractions from school, that is good cause they are going to be handy now.

A fixed point works by dividing the 32 bit number into two parts for general use the division I have proposed here with 16 bits is the most normal but any division that works for your case is good.

The idea is that the two parts of the numbers represent different parts of the number you want to represent. The integer part represent well the integer part of the number. While for the decimal part you are using fractions to represent the number. In this case the precision for my fractional part is 1/66536 so the decimal part of my number will be represented as this fraction. This makes it quite convenient to  transform between fixed and float (btw I normally use signed ints to keep my fixed point numbers due to the slowness of C# on xbox with function calls).

float fp=1.234567;

int fixed = (int)fp*65536;

And to convert it back

fp=(float)fixed/65536.0f;

if you do run through this procedure you will notice a hughe loss of precision. Of course this was the  bestcase for floating point. if we would have assigned fp to 12345.6789123 we would probabely gotten a higher precision form the fixed point. But the important point is that fixed points precision is consistent.

So how do we work with fixed point numbers ? First we are gonna go through the 4 usual calculation methods what you can do if you have assembler access and what you have to do else. And finally we will look at preshifted fixed point which is a special case optimization useful if you only work with your numbers in a certain way.

First out addition. This is an easy one it just works straight of the bat.

{
return(aNumber+aSecondNumber);
}

Why this works is pretty obvious of course you can add together the integer parts and if you add together the fractional part and they overflow into the integer part they just add 1 just as what should happen. Subtraction works just as fine too.

int FixedSubtract(int aNumber,aSecondNumber)
{
return(aNumber-aSecondNumber);

}

So far so good fixed is just as fast as regular int. So now lets look at multiplications this is where things get though for starter if you multiply together two 32 bits number the maximum resulting number is a 64 bit number. This can't be solved easily in a high level language without using a 64 bit number which would be slow. However in Assembly language there isn't a problem since they full 64 bits result is stored in registers.

But what happens if we multiply together 2 fixed point numbers let's look at a case where we call the numbers A and B.
The end result is A.Integer(*65536)*B.Integer*(65536) +A.Integer*(65536)*B.Fractional+B.Integer*(65536)*A.Fractional+A.Fractional*B.Fractional.

If you wonder about all those 65536 you have to remember that the integer part begins after the 16th bit so that is why. Lets first look at the integer parts. What we get is A.Integer(*65536)*B.Integer*(65536).. But what we want to get is A.Integer*B.Integer*(65536). We need one 65536 to keep the integer in the integer part of the number but we get an extra *65536 or as we will call it a leftshift  16. (<< 16 in most high level languages this equals a multiplication with 65536 for all integer numbers)

If we look at the other parts we will notice the same pattern  for a fractional we get A.Integer*(65536)*B.Fractional. Which means that the fractional also gets an extra 65536.

So what we want to to is to take the end result and divide it by 65536 (which is the same as rightshift 16 , >> 16)

If we do this in assembly we can do it quite neatly. (if you don't know assembly no worry just skip this part) btw this code isn't even trying to be optimized

mov eax,A  // places the number A in the register EAX
mov ebx,B // places B in the register EBX they are booth 32 bit registers
mul ebx
// multipliex whatever is in EAX with ebx, and stores the lower 32 bits in eax and the higher in
// EDX

shl edx,16
// shifts edx leftward so that the lower 16 bits are the higher 16 This means that if
// the end result of the multiplication is above 65536 it will be cut of.
shr eax 16.
// shifts right eax 16 bits meaning the higher part of eax is now in the lower par.
mov dx,ax
// moves the lower 16 bits of eax called ax into the lower 16 bits of edx called dx

Even the assembly version  can't handle numbers that ends above 65536 but it can perform the calculations at full precision what can we do if we work with a high level language ? Well we have to sacrifice precision so that we are sure that we stay in 32 bits unless we overflow.

int FixedMulint aNumber,aSecondNumber)
{
return((aNumber >> 8)*(aSecondNumber>>8));

}

So we left shift booth the numbers lowering our precision to 1/256 part which still is plenty for a lot of stuff like culling etc.

Division works similarely except that we need to shift the other way. This means that division for numbers above 256 won't work properly this is again a constraint of working in a high level language an assembly verision can be constructed that can handle the full numbers. you could also handle the the full numbers if you sacrifice the decimal part of the result or make any kind of balancing between. but the default verision is like this.

int FixeDiv(int aNumber,aSecondNumber)
{
return((aNumber << 8)/(aSecondNumber<<8));

}

you can also construct fixed point numbers that are 24:8 or 8:24 or any numbers in between or even using 64 bits. We used 16 bit fixed point back in the software rendering days so it all depends.

But I normally use 16:16 as it gives a good range and well I avoid using divisions if i can in any way. most divisions with a fixed number can be converted into multiplications so.

So out last now is Preshifted Fixed point. If you know that you are only ever gonna use additions,subtractions and multiplications for your fixed point and that there is only going to be one multiplication and you are going to use it last you can use preshifted fixed point. This is especially useful in cases like culling.

Preshifted means that you right shift your numbers 8 bits when you create them this works great for static numbers this way you won't have to perform the shift every frame and an even get multiplication at full integer speed (which is great for XNA).

struct FixedPointPlane
{
public FixedPointPlane(Plane aPlane)
{
myNormal = new Vector3DFixed(aPlane.Normal);
D = FixedPoints.FloatToFixed(aPlane.D);
myPreShiftedNormal.X = myNormal.X >> 8;
myPreShiftedNormal.Y = myNormal.Y >> 8;
myPreShiftedNormal.Z = myNormal.Z >> 8;
}

public int DistFromPlane(Vector3DFixed aPosition)
{
return (0);
}

public int DistFromPlanePreShifted(Vector3DFixed aPosition)
{
return (myPreShiftedNormal.X * aPosition.X + myPreShiftedNormal.Y * aPosition.Y + myPreShiftedNormal.Z * aPosition.Z + D);
}

Vector3DFixed myNormal;
Vector3DFixed myPreShiftedNormal;
int D;
}

using the above planes instead of XNA planes allowed my culling to go from 50% of a cpu to 1-2%. Which allowed me to for example use 6 cascades of 512x512 instead of 3 1024x1024 and even get a better visual quality and better framerate (if you are unfamiliar with cascade shadow maps I will cover them in a later article). As you can see I never even bothered to implement the non preshifted version because the preshifted is so much easier and well you only create those planes once per frame so the cost is nothing. Of course aPosition is preshifted also.  I'm actually gonna go aganist my instincts that says it's better for you guys to learn by creating and drop the full version of the naive sphere vs frustum code here (it can be optimized a bit more but this is best for readability)

class FixedPoints
{
public static int FloatToFixed(float aFloat)
{
return (int)(aFloat * 65536.0f);
}
public static int FixedMul(int aFirstFixed, int aSecondFixed)
{
aFirstFixed = aFirstFixed >> 8;
aSecondFixed = aSecondFixed >> 8;
aFirstFixed *= aSecondFixed;
return (aFirstFixed);
}
//      Only as References never meant to be used;
public static int FixedAdd(int aFirstFixed, int aSecondFixed)
{
return (aFirstFixed + aSecondFixed);
}

public static int FixedSub(int aFirstFixed, int aSecondFixed)
{
return (aFirstFixed - aSecondFixed);
}
}

public struct Vector3DFixed
{
public Vector3DFixed(Vector3 aVector)
{
X = FixedPoints.FloatToFixed(aVector.X);
Y = FixedPoints.FloatToFixed(aVector.Y);
Z = FixedPoints.FloatToFixed(aVector.Z);
}
public int X;
public int Y;
public int Z;
}

public struct FixedPointBoundingSphere
{
public Vector3DFixed myPosition;
public Vector3DFixed myPreShiftedPosition;

public FixedPointBoundingSphere(BoundingSphere aSphere)
{
myPosition = new Vector3DFixed(aSphere.Center);
myPreShiftedPosition.X = myPosition.X >> 8;
myPreShiftedPosition.Y = myPosition.Y >> 8;
myPreShiftedPosition.Z = myPosition.Z >> 8;

}
public void Move(Vector3 aPosition)
{
myPosition = new Vector3DFixed(aPosition);
myPreShiftedPosition.X = myPosition.X >> 8;
myPreShiftedPosition.Y = myPosition.Y >> 8;
myPreShiftedPosition.Z = myPosition.Z >> 8;
}
}

public class FixedPointFrustrum
{
FixedPointPlane[] myPlanes = new FixedPointPlane;
Plane[] myOrigPlanes = new Plane;

FixedPointFrustrum()
{
}

// To do Add Fast box vs frustrum test

public bool Intersects(FixedPointBoundingSphere aBoundingSphere)
{
// To Do
// For Shadow culling use a Special Paralell Frustrum;

if (myPlanes.DistFromPlanePreShifted(aBoundingSphere.myPreShiftedPosition) > aBoundingSphere.myRadius) return (false);
if (myPlanes.DistFromPlanePreShifted(aBoundingSphere.myPreShiftedPosition) > aBoundingSphere.myRadius) return (false);
if (myPlanes.DistFromPlanePreShifted(aBoundingSphere.myPreShiftedPosition) > aBoundingSphere.myRadius) return (false);
if (myPlanes.DistFromPlanePreShifted(aBoundingSphere.myPreShiftedPosition) > aBoundingSphere.myRadius) return (false);
if (myPlanes.DistFromPlanePreShifted(aBoundingSphere.myPreShiftedPosition) > aBoundingSphere.myRadius) return (false);
if (myPlanes.DistFromPlanePreShifted(aBoundingSphere.myPreShiftedPosition) > aBoundingSphere.myRadius) return (false);
return (true);
}

public void Set(BoundingFrustum aFrustrum)
{
myPlanes = new FixedPointPlane(aFrustrum.Near);
myPlanes = new FixedPointPlane(aFrustrum.Far);
myPlanes = new FixedPointPlane(aFrustrum.Bottom);
myPlanes = new FixedPointPlane(aFrustrum.Left);
myPlanes = new FixedPointPlane(aFrustrum.Right);
myPlanes = new FixedPointPlane(aFrustrum.Top);

myOrigPlanes = aFrustrum.Near;
myOrigPlanes = aFrustrum.Far;
myOrigPlanes = aFrustrum.Bottom;
myOrigPlanes = aFrustrum.Left;
myOrigPlanes = aFrustrum.Right;
myOrigPlanes = aFrustrum.Top;
}

public FixedPointFrustrum(BoundingFrustum aFrustrum)
{
myPlanes = new FixedPointPlane(aFrustrum.Near);
myPlanes = new FixedPointPlane(aFrustrum.Far);
myPlanes = new FixedPointPlane(aFrustrum.Bottom);
myPlanes = new FixedPointPlane(aFrustrum.Left);
myPlanes = new FixedPointPlane(aFrustrum.Right);
myPlanes = new FixedPointPlane(aFrustrum.Top);

myOrigPlanes = aFrustrum.Near;
myOrigPlanes = aFrustrum.Far;
myOrigPlanes = aFrustrum.Bottom;
myOrigPlanes = aFrustrum.Left;
myOrigPlanes = aFrustrum.Right;
myOrigPlanes = aFrustrum.Top;

}
As you can see from my comments I didn't even bother to move the box vs frustum to fixed point I got the performance necessary already. Once it becomes a problem I will do that too of course.

Remember this is not the fastest way of doing this but it was enough.Never spend time optimizing things that doesn't matter. I think there is a good chance I will have to fix this later but for now I can spend my time on better things.

Dropping code still feels weird but tell me if you like it better this way or not.

#### 1 comment:

1. Hi Niklas, funny I should stumble across you here as well, you seem to be on the end of several of my google searches lately. I wonder if you have benchmarked fixed point vs. floating point in an apples-to-apples test? I was under the impression that modern processors often perform floating point operations faster than integer, but perhaps I am mistaken. And thanks for posting!