Tabs

Monday, 9 January 2017

Counting bits

Counting bits is a very important task in many problems, like data compression. In a computer all the numeric values are stored in binary format. For example, if we want to encode the number 212 in a 32 bits register it will look like this:

0000 0000 0000 0000 0000 0000 1101 0100

The operation that counts the number of bits set to 1 in a register is called bitcount() or popcount(). We can also employ the definition of the Hamming weight, which is the number of elements in a vector that are different to the 0 symbol. In our example:

bitcount(212) = 4

The implementation of this operation depends on the architecture of the CPU in which we run our program. Sometimes, the CPU will have a instruction implementing the bitcount() at hardware level. The instructions POPCNT and LZCNT are part of the ABM (Advanced Bit Manipulation) set. These instructions are included in the x86 architecture in the SSE 4.2 instruction set.

If we have access to a hardware instruction this will be our best option, because we will obtain the result in few execution cycles. If we are programming in a high level language we must check the necessary steps to use the hardware instruction (normally we must call an specific function and compile the program with an special flag).

Nevertheless, there are situations in which we do not have access to a hardware instruction. For such cases there is a good general solution that employs bitwise operations (before going forward you should have knowledge of hexadecimal numeric representations). The following code applies for 32 bits registers:

inline uint32_t popcount(uint32_t i) {
    i = i - ((i >> 1) & 0x55555555);
    i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
    return (((i + (i >> 4)) & 0xF0F0F0F) * 0x1010101) >> 24;
}

This algorithm is known as HAKMEM, developed by Beeler, Gosper and Schroppel in 1972. As you can see the code is a bit complex, so you can add a comment stating "//Just works" or "//Magic follows". Another option is to keep reading a bit further, so you can blame me later if I do not explain it properly.

HAKMEM employs a divide and conquer approach to count the bits in parallel. Let's see how it works by following the example with the number 212.

1) Shift the initial register one bit to the right i >> 1:


0000 0000 0000 0000 0000 0000 1101 0100
----------------------------------------
0000 0000 0000 0000 0000 0000 0110 1010

2) Apply the AND function with the mask 0x55555555 ( i >> 1 ) & 0x55555555:

0000 0000 0000 0000 0000 0000 0110 1010
0101 0101 0101 0101 0101 0101 0101 0101
----------------------------------------
0000 0000 0000 0000 0000 0000 0100 0000

3) Subtract the result to the original value i - ( ( i >> 1 ) & 0x55555555 ):

0000 0000 0000 0000 0000 0000 1101 0100
0000 0000 0000 0000 0000 0000 0100 0000
----------------------------------------
0000 0000 0000 0000 0000 0000 1001 0100


With this three initial steps we have obtained the total number of bits set to one in every pair of bits of the original register. If we compare the last 8 bits of the original register with our current result:

11 01  01 00
10 01  01 00


We can see that the pair 11 has 2 bits set to one (10). It is followed by to pairs 01 with 1 bit set to one (01) and a pair 00 without any bit set (00).

The next operation adds this values in pairs, obtaining the number of bits set to one in each group of 4 bits, let's see how:

4) First we apply the bit mask 0x33333333 to our last value i & 0x33333333:

0000 0000 0000 0000 0000 0000 1001 0100
0011 0011 0011 0011 0011 0011 0011 0011
----------------------------------------
0000 0000 0000 0000 0000 0000 0001 0000

5) Then we shift the last value two bits to the right and apply the mask (i >> 2) & 0x33333333:


0000 0000 0000 0000 0000 0000 1001 0100
----------------------------------------
0000 0000 0000 0000 0000 0000 0010 0101
0011 0011 0011 0011 0011 0011 0011 0011
----------------------------------------
0000 0000 0000 0000 0000 0000 0010 0001

6) We add the two values we obtained:

0000 0000 0000 0000 0000 0000 0001 0000
0000 0000 0000 0000 0000 0000 0010 0001
----------------------------------------
0000 0000 0000 0000 0000 0000 0011 0001

Now we have the total number of ones in each group of four bits. If we compare the original value of the last 8 bits of 112 with our result:

1101 0100
0011 0001

We can see that the first group 1101 has 3 bits set to one (0011) and that the second group 0100 has only a bit set to one (0001).

The next steps obtain the total number of one in each group of 8 bits (a byte).

7) We add i + (i >> 4):

0000 0000 0000 0000 0000 0000 0011 0001
0000 0000 0000 0000 0000 0000 0000 0011
----------------------------------------
0000 0000 0000 0000 0000 0000 0011 0100

8) We apply the mask of bits 0xF0F0F0F (i + (i >> 4)) & 0xF0F0F0F:

0000 0000 0000 0000 0000 0000 0011 0100
0000 1111 0000 1111 0000 1111 0000 1111
----------------------------------------
0000 0000 0000 0000 0000 0000 0000 0100

If we compare the original value of the last byte of our original register:

11010100

00000100


We see that it is correct, 11010100 contains 4 bits (00000100).

Right now you may already understand how the algorithm works. For our example we already finished, because it only uses the 8 least significant bits. However, if you have managed to read until here there is a surprise. There is a mathematical property that can be used to calculate the bitcount of the 32 bit register in one step.

Multiplying by 0x1010101 has a special property. If the number that we multiply has four bytes (A, B, C, D), then the result of the multiplication will return a result whose 4 bytes will be (A+B+C+D, B+C+D, C+D, D). The first byte of the result contains the bit count of the 32 bit register. Now we only have to shift this result 24 bits to the right >> 24 to obtain the result.

We finished! I hope you enjoyed this as much as me preparing it. Comments, questions, knifes and shurikens ... are very welcome.

Some references:

Hackers delight, chapter 5