I’ve been dealing a lot with bit operations lately. And doing lots of benchmarking, like here. As I was looking for a bit count method in Pharo (it used to be there but it no longer exists in Pharo 5.0), I got curious about the many different versions of bit counting algorithms I could find on the internet.
What’s so special about bit operations you ask? Not much. Except when you have to do it really fast on 64bit integers! Like in a chess program! Millions of times per position. So instead of copying the #bitCount method that was in Squeak, I decided I’d have a look at what is available on the net…
So I decided to share what I found. This could potentially be useful for people who have to deal with bit counting a lot. Especially if you deal with 14 bits or less!
Here’s a typical run of the different bit counting algorithms I have tested on Squeak 5.1 64bit.
Number of [myBitCount1 (128 bits)] per second: 0.061M Number of [myBitCount1 (14 bits)] per second: 1.417M Number of [myBitCount1 (16 bits)] per second: 1.271M Number of [myBitCount1 (30 bits)] per second: 0.698M Number of [myBitCount1 (32 bits)] per second: 0.651M Number of [myBitCount1 (60 bits)] per second: 0.362M Number of [myBitCount1 (64 bits)] per second: 0.131M Number of [myBitCount1 (8 bits)] per second: 2.255M Number of [myBitCount2 (128 bits)] per second: 0.286M Number of [myBitCount2 (14 bits)] per second: 3.623M Number of [myBitCount2 (16 bits)] per second: 3.630M Number of [myBitCount2 (30 bits)] per second: 2.320M Number of [myBitCount2 (32 bits)] per second: 2.336M Number of [myBitCount2 (60 bits)] per second: 1.415M Number of [myBitCount2 (64 bits)] per second: 1.208M Number of [myBitCount2 (8 bits)] per second: 4.950M Number of [myBitCount3 (128 bits)] per second: 0.498M Number of [myBitCount3 (14 bits)] per second: 4.556M Number of [myBitCount3 (16 bits)] per second: 4.673M Number of [myBitCount3 (30 bits)] per second: 3.401M Number of [myBitCount3 (32 bits)] per second: 3.401M Number of [myBitCount3 (60 bits)] per second: 2.130M Number of [myBitCount3 (64 bits)] per second: 1.674M Number of [myBitCount3 (8 bits)] per second: 4.938M Number of [myBitCount4 (128 bits)] per second: 0.041M Number of [myBitCount4 (14 bits)] per second: 5.333M Number of [myBitCount4 (16 bits)] per second: 4.819M Number of [myBitCount4 (30 bits)] per second: 2.841M Number of [myBitCount4 (32 bits)] per second: 2.674M Number of [myBitCount4 (60 bits)] per second: 1.499M Number of [myBitCount4 (64 bits)] per second: 0.270M Number of [myBitCount4 (8 bits)] per second: 7.435M Number of [myBitCount5 (128 bits)] per second: 0.377M Number of [myBitCount5 (14 bits)] per second: 3.937M Number of [myBitCount5 (16 bits)] per second: 3.035M Number of [myBitCount5 (30 bits)] per second: 2.137M Number of [myBitCount5 (32 bits)] per second: 2.035M Number of [myBitCount5 (60 bits)] per second: 1.386M Number of [myBitCount5 (64 bits)] per second: 1.188M Number of [myBitCount5 (8 bits)] per second: 4.167M Number of [myBitCount6 (128 bits)] per second: 0.381M Number of [myBitCount6 (14 bits)] per second: 5.195M Number of [myBitCount6 (16 bits)] per second: 3.552M Number of [myBitCount6 (30 bits)] per second: 2.488M Number of [myBitCount6 (32 bits)] per second: 2.364M Number of [myBitCount6 (60 bits)] per second: 1.555M Number of [myBitCount6 (64 bits)] per second: 1.284M Number of [myBitCount6 (8 bits)] per second: 5.571M Number of [myPopCount14bit (14 bits)] per second: 18.349M Number of [myPopCount14bit (8 bits)] per second: 18.519M Number of [myPopCount24bit (14 bits)] per second: 7.407M Number of [myPopCount24bit (16 bits)] per second: 7.463M Number of [myPopCount24bit (8 bits)] per second: 7.018M Number of [myPopCount32bit (14 bits)] per second: 4.963M Number of [myPopCount32bit (16 bits)] per second: 5.013M Number of [myPopCount32bit (30 bits)] per second: 4.608M Number of [myPopCount32bit (32 bits)] per second: 4.619M Number of [myPopCount32bit (8 bits)] per second: 4.608M Number of [myPopCount64a (14 bits)] per second: 2.778M Number of [myPopCount64a (16 bits)] per second: 2.793M Number of [myPopCount64a (30 bits)] per second: 2.751M Number of [myPopCount64a (32 bits)] per second: 2.703M Number of [myPopCount64a (60 bits)] per second: 2.809M Number of [myPopCount64a (64 bits)] per second: 1.385M Number of [myPopCount64a (8 bits)] per second: 2.755M Number of [myPopCount64b (14 bits)] per second: 3.063M Number of [myPopCount64b (16 bits)] per second: 3.096M Number of [myPopCount64b (30 bits)] per second: 3.106M Number of [myPopCount64b (32 bits)] per second: 3.053M Number of [myPopCount64b (60 bits)] per second: 3.008M Number of [myPopCount64b (64 bits)] per second: 1.444M Number of [myPopCount64b (8 bits)] per second: 3.091M Number of [myPopCount64c (14 bits)] per second: 1.625M Number of [myPopCount64c (16 bits)] per second: 1.600M Number of [myPopCount64c (30 bits)] per second: 1.542M Number of [myPopCount64c (32 bits)] per second: 1.529M Number of [myPopCount64c (60 bits)] per second: 1.566M Number of [myPopCount64c (64 bits)] per second: 1.082M Number of [myPopCount64c (8 bits)] per second: 3.945M
Now, since method #myBitCount2 is similar to the #bitCount method in Squeak, that means there is still place for improvement as far as a faster #bitCount is needed. Now the question is : do we optimize it for the usual usage (SmallInteger), for 64bit integer or we use an algorithm that performs relatively well in most cases? Obviously, since I will always be working with 64bit positive integers, I have the luxury to pick a method that precisely works best in my specific case!
All test code I have used can be found here.
Note: Rush fans have probably noticed the reference in the title…