• 12 mins read
A computer is an electronic device, which is composed of many circuits inside it. Each circuit has only 2 states: ON and OFF, or 0 and 1, so it can only store binary values. As a result, computers can only store and process sequences of 0 and 1. So all kinds of data has to be converted into binary number before computers can understand them.
The value 0 and 1 are the smallest units of data. They are called bits. 8 bits combine to a byte and 4 bytes combine to a word.
Now, if you don't know how to convert decimal numbers to binary numbers, take a look at this video and do some exercises before continuing.
There are 3 common ways to represent a block of data, which are binary, octal, and hexadecimal. Let's try to represent number 129 in the 3 different ways.
First we convert number 129 from decimal to binary, which is
So 129 is represented as
10000001 in binary.
An octal number can represent number from
7, which is from
111. So an octal can
represent 3 binary digits. To represent
10000001, which is 8 binary digits, we need 3 octal
10000001 is now splited into 3 parts
001. After that, we convert each part
to octal number, which is
1. To distinguish it with decimal number, we prefix it with
0. So 129 is represented as
0201 in octal.
Hexadecimal number is similar to octal number, but it can represent number from
1111. So an hexadecimal can represent 4 binary digits. Now we're splitting the
10000001 into 2 parts of 4 bits, which is
0001. Then convert each part to
hexadecimal, we have the value of
81. To distinguish it with decimal number, we prefix it with
0x. So 129 is represented
0x81 in hexadecimal.
Next we're gonna see how each type of data: integer, rational, and irrational number are stored in memory.
2.1 Unsigned Integer
Unsigned integers are generally stored as simple binary numbers. For example,
1 is stored as
2 is stored as
129 is stored as
2.2 Signed Integer
There are 3 most common formats to store signed integer numbers.
a. Signed Magnitude
Signed Magnitude format uses the left most bit as the sign indication, and treat the remaining bit as they represent unsigned integer.
The convention for the left most bit (most significant bit):
0 for positive,
1 for negative. For
The problem with this system is that it does not support binary arithmetic, which is the most important for computers. For example:
100111 // -7 000111 // 7 100111 + 000111 = 101110 // -14, which is not 0 as we expected.
b. One's Complement
One's Complement of a binary number is just another binary number that when we add it to the original number, the result is a binary number with all bits are 1. To obtain binary number, you can simply flip all the bits.
7, then it's one's complement is
111000, which is
56. You can notice
56 + 7 = 63 = 2^6-1. So one's component of a number is the result of the max value that can
be represented with number of bits minus that number itself. With this system, we also have the
first bit as the sign indication.
To see which value a negative binary number represents, we just find it's one's complement. For
100011 is a negative number, it's one's complement is
011100, which is
We have two 0 values in this system.
000000 which is
111111 which is
In this system, the binary arithmetic is partially solved. If we add
-12, we get
001100, it's one's complement is
110011, when we add these 2 values, we get
111111, which is -0 as expected.
But there are still some special cases: For example, add
3 // 000011, -2 // 111101, 000011 + 111101 = 000000 // which is +0, not 1 as we expected.
In this case, we need to add the leftmost (the carry, which is
000001) to the result. After adding
the carry, we got
000001 which is 1 as expected.
Let's try another example, add
10 // 001010 -4 // 111011 001010 + 111011 = 000101 // 5 000101 + 000001 = 000110 // add the carry
After adding the carry, we got 6 as expected.
c. Two's Complement
Contrary with one's complement, a two's complement of a number is another number that when we add it to the original number, we get a number that all bits are 0. To find the two's complement of a number, we first find its one's complement then add 1 to it.
For example, we have
7 which is
000111, it's one's complement is
111000, then it's two's
111000 + 1 = 111001. When we add a number and it's two's complement, we get a number
which all bits are 0. In this example,
000111 + 111001 = 000000.
We still use the first bit as sign indicator. To find which value a negative number represents, we just find it's two's complement.
For example, we have a number
100111, it's one's complement is
011000, it's two's complement is
011001, which is 25. Then the original number is -25.
In this system, we have only one representation of 0, which is
000000. and the value of
-32. Now with 6 bits, we can represent values from
signed magnitude and one's complement, we can only have values from
31. Moreover, the
binary arithmetic is all solved in two's complement.
Integer is usually store in 16 bits, 32 bits or 64 bits, depending on how big the number is. They
are often called
long type respectively.
Unsigned integers are stored as simple binary numbers.
Signed integers are often stored as 2's complement so that it can store maximum number of values and leverage arithmetic calculation.
3. Rational number
3.1 What is floating point number?
There are 3 common ways to represent a rational number.
- Decimal representation, such as:
- Fractional representation, such as:
- Floating points number, such as:
Floating point number is a format to represent rational number with the point (.) can be moved (floating).
3.2 How floating point number is stored?
a. Scientific notation
First, we have to know how to represent a number in scientific notation. It's simple. We rewrite the
number in 2 parts. The first part is just the digits with the point right after the first digit.
Then multiply it with a number that put the point where it should be. Then rewrite that number in
base^x, which is the second part.
For example, let's write 12345 in scientific notation.
12345 = 1.2345 x 10.000 // so the first part is 1.2345 // decimal number -> the base is 10 10.000 = 10^4 // so the second part is 10^4 => 1.2345 * 10^4 is scientific notation
Do the same for
101.101, which is a binary number, we have
1.01101 * 2^2 as its scientific
b. IEEE 754
Nearly all hardwares and programming languages use floating point number in the same binary format, which is specified in IEEE Standard for Floating-Point Arithmetic (IEEE 754). Basically, it normalizes the number to the scientific notation that we already learn above, then use a binary number to represent that normalized number.
There are 2 most used formats that are 32 bit (float) and 64 bit (double). They are divided into 3 parts: sign bit, exponent and significand (or mantissa ).
Let's use the number
101.101 as our example. We normalize it to scientific notation first
101.101 = 1.01101x2^2
a. sign bit: The leftmost, or the most significant bit (MSB) is used to indicate the sign of the number, 0 is positive and 1 is negative. In our case, it's a positive number so the sign bit is 0.
b. the exponent: use to indicate the distance between the binary point of the original number and the binary point of the normalized number.
101.101 -.12--- 1.01101
In our example, the normalized number is
1.01101x2^2 so the exponent is 2.
To represent both negative and positive exponent, we use a bias to add to the actual exponent. For
example: if we use 6 bit to represent number, then we can represent only unsigned number from
111111, which is from
63. But we want to represent negative numbers too, so
we find a bias following the IEEE 754 formula.
bias = 2^(n - 1) - 1 = 2^(6 - 1) - 1 = 31
Then we can represent number from
(63-31), which is
In the above number, the actual value of the exponent that is stored is
2 + 31 = 33, which is
100001. In other words, to store the number 2 as exponent, we don't use
c. The mantissa: because in the normalized form (scientific notation form), the integer part
is always 1, so we only have to store the fraction part. That fraction part (the digits after the
point) is call the mantissa. In the above number:
1.01101x2^2, the mantissa we need to store is
So the number
101.101 (which is
5.625 in decimal) will be represent as the following using
floating point 32 bit:
|0||1000 0001||0110 1000 0000 0000 0000 000|
- number is positive → bit sign is 0
- next 8 bit is for exponent:
2^2, the exponent is
2, then we add it to the bias, which is
127for 8 bit then we get
129→ exponent is
- last 23 bits is for mantissa, as we talk above, the mantissa is
Floating point numbers using 32 bits are called
single. Floating point
numbers using 64 bits are called
double. It has 3 parts similar to
but it uses different numbers of bits to store the sign bit, exponent, and mantissa.
|single||1 bit||8 bits||23 bits|
|double||1 bit||11 bits||52 bits|
3.3 Why the exponent use bias, not one's complement or two's complement
If we use 1's complement or 2's complement, it will be hard for comparison of numbers.
For example, we want to compare the number 5.625 above with 5. we will have the representation of 5 in 1's complement and 2's complement and in biased
|1's complement:||0||1111 1101||0100 0000 0000 0000 0000 000|
|2's complement:||0||1111 1110||0100 0000 0000 0000 0000 000|
|biased:||0||1000 0001||0100 0000 0000 0000 0000 000|
When looking at 1's complement and 2's complement exponent numbers and compare it to 5.625 above, we can not determine which number is larger without additional computation. On the other hand, with biased exponent, we can easily compare bit by bit from left to right to determine which number is larger. This is faster for computer to compute.
4. Irrational Number
In general, computers cannot store or manipulate all irrational numbers. Most irrational numbers are not computable, and therefore can't be handled or evaluated at all by a computer. Since we cannot compute such numbers, there is never any need to represent them.
number. There are no different types
store as double-precision (64 bits) floating-point number, which is described in the section number
0.1 + 0.2; // 0.30000000000000004
Floating-point arithmetic can only produce approximate results, rounding to the nearest representable real number. When you perform a sequence of calculations, these rounding errors can accumulate, leading to less and less accurate results. Rounding also causes surprising deviations from the kind of properties we usually expect of arithmetic. For example, real numbers are associative, meaning that for any real numbers x, y, and z, it’s always the case that (x + y) + z = x + (y + z). But this is not always true of floating-point numbers:
(0.1 + 0.2) + 0.3; // 0.6000000000000001 0.1 + (0.2 + 0.3); // 0.6
Floating-point numbers offer a trade-off between accuracy and performance. When accuracy matters, it's critical to be aware of their limitations.
One useful workaround is to work with integer values wherever possible, since they can be represented without rounding. When doing calculations with money, programmers often scale numbers up to work with the currency’s smallest denomination so that they can compute with whole numbers. For example, if the above calculation were measured in dollars, we could work with whole numbers of cents instead:
(10 + 20) + 30; // 60 10 + (20 + 30); // 60
- Understand how computer stores numbers in general.
7. Sources and related reading
- Computer Memory
- How computers represent negative binary numbers?