0000 0000 0000 0000 0000 0000 1101 0100

The operation that counts the number of bits set to

*1*in a register is called

*bitcount()*or popcount(). We can also employ the definition of the Hamming weight, which is the number of elements in a vector that are different to the 0 symbol. In our example:

bitcount(212) = 4

The implementation of this operation depends on the architecture of the CPU in which we run our program. Sometimes, the CPU will have a instruction implementing the

*bitcount()*at hardware level. The instructions POPCNT and LZCNT are part of the ABM (

*Advanced Bit Manipulation*) set. These instructions are included in the x86 architecture in the SSE 4.2 instruction set.

If we have access to a hardware instruction this will be our best option, because we will obtain the result in few execution cycles. If we are programming in a high level language we must check the necessary steps to use the hardware instruction (normally we must call an specific function and compile the program with an special flag).

Nevertheless, there are situations in which we do not have access to a hardware instruction. For such cases there is a good general solution that employs bitwise operations (before going forward you should have knowledge of hexadecimal numeric representations). The following code applies for 32 bits registers:

inline uint32_t popcount(uint32_t i) {

i = i - ((i >> 1) & 0x55555555);

i = (i & 0x33333333) + ((i >> 2) & 0x33333333);

return (((i + (i >> 4)) & 0xF0F0F0F) * 0x1010101) >> 24;

}

This algorithm is known as HAKMEM, developed by Beeler, Gosper and Schroppel in 1972. As you can see the code is a bit complex, so you can add a comment stating "//Just works" or "//Magic follows". Another option is to keep reading a bit further, so you can blame me later if I do not explain it properly.

HAKMEM employs a divide and conquer approach to count the bits in parallel. Let's see how it works by following the example with the number 212.

1) Shift the initial register one bit to the right

**i >> 1**:

0000 0000 0000 0000 0000 0000

----------------------------------------

0000 0000 0000 0000 0000 0000 0

**1101 0100**----------------------------------------

0000 0000 0000 0000 0000 0000 0

**110 1010**
2) Apply the

*AND*function with the mask*0x55555555***( i >> 1 ) & 0x55555555**:
0000 0000 0000 0000 0000 0000 0110 1010

0101 0101 0101 0101 0101 0101 0101 0101

----------------------------------------

0000 0000 0000 0000 0000 0000 0100 0000

----------------------------------------

0000 0000 0000 0000 0000 0000 0100 0000

3) Subtract the result to the original value

**i -****:***( ( i >> 1 ) & 0x55555555 )*
0000 0000 0000 0000 0000 0000 1101 0100

0000 0000 0000 0000 0000 0000 0100 0000

----------------------------------------

0000 0000 0000 0000 0000 0000

----------------------------------------

0000 0000 0000 0000 0000 0000

**10****01 0100**
With this three initial steps we have obtained the total number of bits set to one in every pair of bits of the original register. If we compare the last 8 bits of the original register with our current result:

11 01 01 00

10 01 01 00

We can see that the pair

*11*has 2 bits set to one (*10*). It is followed by to pairs*01*with 1 bit set to one (*01*) and a pair*00*without any bit set (*00*).
The next operation adds this values in pairs, obtaining the number of bits set to one in each group of 4 bits, let's see how:

4) First we apply the bit mask

*0x33333333*to our last value**i & 0x33333333**:
0000 0000 0000 0000 0000 0000

**10****01 0100**
0011 0011 0011 0011 0011 0011 0011 0011

----------------------------------------

0000 0000 0000 0000 0000 0000 0001 0000

5) Then we shift the last value two bits to the right and apply the mask

**(i >> 2) & 0x33333333**:

0000 0000 0000 0000 0000 0000

----------------------------------------

0000 0000 0000 0000 0000 0000 00

**10****01 0100**----------------------------------------

0000 0000 0000 0000 0000 0000 00

**10****0101**
0011 0011 0011 0011 0011 0011 0011 0011

----------------------------------------

0000 0000 0000 0000 0000 0000 0010 0001

The next steps obtain the total number of one in each group of 8 bits (a byte).

11010100

00000100

Right now you may already understand how the algorithm works. For our example we already finished, because it only uses the 8 least significant bits. However, if you have managed to read until here there is a surprise. There is a mathematical property that can be used to calculate the bitcount of the 32 bit register in one step.

Multiplying by

We finished! I hope you enjoyed this as much as me preparing it. Comments, questions, knifes and shurikens ... are very welcome.

6) We add the two values we obtained:

0000 0000 0000 0000 0000 0000 0001 0000

0000 0000 0000 0000 0000 0000 0010 0001

----------------------------------------

0000 0000 0000 0000 0000 0000

**0011 0001**
Now we have the total number of ones in each group of four bits. If we compare the original value of the last 8 bits of 112 with our result:

1101 0100

0011 0001

We can see that the first group

*1101*has 3 bits set to one (*0011*) and that the second group*0100*has only a bit set to one (*0001*).The next steps obtain the total number of one in each group of 8 bits (a byte).

7) We add

**i + (i >> 4)**:
0000 0000 0000 0000 0000 0000

**0011 0001**
0000 0000 0000 0000 0000 0000 0000

**0011**
----------------------------------------

0000 0000 0000 0000 0000 0000 0011 0100

8) We apply the mask of bits

*0xF0F0F0F***(i + (i >> 4)) & 0xF0F0F0F**:
0000 0000 0000 0000 0000 0000 0011 0100

0000 1111 0000 1111 0000 1111 0000 1111

----------------------------------------

0000 0000 0000 0000 0000 0000 0000 0100

If we compare the original value of the last byte of our original register:

11010100

00000100

We see that it is correct,

*11010100*contains 4 bits (*00000100*).Right now you may already understand how the algorithm works. For our example we already finished, because it only uses the 8 least significant bits. However, if you have managed to read until here there is a surprise. There is a mathematical property that can be used to calculate the bitcount of the 32 bit register in one step.

Multiplying by

*0x1010101*has a special property. If the number that we multiply has four bytes (

*A, B, C, D)*, then the result of the multiplication will return a result whose 4 bytes will be (

*A+B+C+D, B+C+D, C+D, D)*. The first byte of the result contains the bit count of the 32 bit register. Now we only have to shift this result 24 bits to the right

**>> 24**to obtain the result.

We finished! I hope you enjoyed this as much as me preparing it. Comments, questions, knifes and shurikens ... are very welcome.

Some references:

Hackers delight, chapter 5