Fixed-and-Floating-Point Representation


In real life, we deal with real numbers -- numbers with fractional part. Most modern computer have native (hardware) support for floating point numbers. However, the use of floating point is not necessarily the only way to represent fractional numbers. This article describes the fixed point representation of real numbers. The use of fixed point data type is used widely in digital signal processing (DSP) and game applications, where performance is sometimes more important than precision. As we will see later, fixed point arithmetic is much faster than floating point arithmetic.

Fixed Point Representation

The shifting process above is the key to understand fixed point number representation. To represent a real number in computers (or any hardware in general), we can define a fixed point number type simply by implicitly fixing the binary point to be at some position of a numeral. We will then simply adhere to this implicit convention when we represent numbers.

To define a fixed point type conceptually, all we need are two parameters:

  • width of the number representation, and
  • binary point position within the number

We will use the notation fixed<w,b> for the rest of this article, where w denotes the number of bits used as a whole (the Width of a number), and bdenotes the position of binary point counting from the least significant bit (counting from 0).



Depending on where the binary point is assumed to be, a given number can be interpreted as several different values. To make programming simpler, we generally use a fixed binary point throughout the algorithm. To easily specify how many bits are used to represent the integer and fractional parts of the number, we use a notation called the Q format. For example, to specify that we are using three bits for the integer part and four bits for the fractional part, we may say that the numbers are in Q3.4 format.

Another possible notation is to specify only the length of the fractional part. This is based on the assumption that the wordlength is known for a given processor. For example, when working with a processor which has a wordlength of 16 bits, we may simply say that we are using the Q15 format to represent the numbers. This means that we are putting 15 bits to the right of the binary point and one bit to its left. In this case, the Q15 format is equivalent to the Q1.15 format.   


Floating point Representation



  • WikiNote Foundation

Last modified: Friday, 20 September 2019, 3:41 PM