1001001 SOS

10 février 2017

bitcounting

I’ve been dealing a lot with bit operations lately.  And doing lots of benchmarking, like here. As I was looking for a bit count method in Pharo (it used to be there but it no longer exists in Pharo 5.0), I got curious about the many different versions of bit counting algorithms I could find on the internet.

What’s so special about bit operations you ask?  Not much.  Except when you have to do it really fast on 64bit integers!  Like in a chess program!  Millions of times per position. So instead of copying the #bitCount method that was in Squeak, I decided I’d have a look at what is available on the net…

So I decided to share what I found.  This could potentially be useful for people who have to deal with bit counting a lot. Especially if you deal with 14 bits or less!

Here’s a typical run of the different bit counting algorithms I have tested on Squeak 5.1 64bit.

Number of [myBitCount1 (128 bits)] per second: 0.061M
Number of [myBitCount1 (14 bits)] per second: 1.417M
Number of [myBitCount1 (16 bits)] per second: 1.271M
Number of [myBitCount1 (30 bits)] per second: 0.698M
Number of [myBitCount1 (32 bits)] per second: 0.651M
Number of [myBitCount1 (60 bits)] per second: 0.362M
Number of [myBitCount1 (64 bits)] per second: 0.131M
Number of [myBitCount1 (8 bits)] per second: 2.255M
Number of [myBitCount2 (128 bits)] per second: 0.286M
Number of [myBitCount2 (14 bits)] per second: 3.623M
Number of [myBitCount2 (16 bits)] per second: 3.630M
Number of [myBitCount2 (30 bits)] per second: 2.320M
Number of [myBitCount2 (32 bits)] per second: 2.336M
Number of [myBitCount2 (60 bits)] per second: 1.415M
Number of [myBitCount2 (64 bits)] per second: 1.208M
Number of [myBitCount2 (8 bits)] per second: 4.950M
Number of [myBitCount3 (128 bits)] per second: 0.498M
Number of [myBitCount3 (14 bits)] per second: 4.556M
Number of [myBitCount3 (16 bits)] per second: 4.673M
Number of [myBitCount3 (30 bits)] per second: 3.401M
Number of [myBitCount3 (32 bits)] per second: 3.401M
Number of [myBitCount3 (60 bits)] per second: 2.130M
Number of [myBitCount3 (64 bits)] per second: 1.674M
Number of [myBitCount3 (8 bits)] per second: 4.938M
Number of [myBitCount4 (128 bits)] per second: 0.041M
Number of [myBitCount4 (14 bits)] per second: 5.333M
Number of [myBitCount4 (16 bits)] per second: 4.819M
Number of [myBitCount4 (30 bits)] per second: 2.841M
Number of [myBitCount4 (32 bits)] per second: 2.674M
Number of [myBitCount4 (60 bits)] per second: 1.499M
Number of [myBitCount4 (64 bits)] per second: 0.270M
Number of [myBitCount4 (8 bits)] per second: 7.435M
Number of [myBitCount5 (128 bits)] per second: 0.377M
Number of [myBitCount5 (14 bits)] per second: 3.937M
Number of [myBitCount5 (16 bits)] per second: 3.035M
Number of [myBitCount5 (30 bits)] per second: 2.137M
Number of [myBitCount5 (32 bits)] per second: 2.035M
Number of [myBitCount5 (60 bits)] per second: 1.386M
Number of [myBitCount5 (64 bits)] per second: 1.188M
Number of [myBitCount5 (8 bits)] per second: 4.167M
Number of [myBitCount6 (128 bits)] per second: 0.381M
Number of [myBitCount6 (14 bits)] per second: 5.195M
Number of [myBitCount6 (16 bits)] per second: 3.552M
Number of [myBitCount6 (30 bits)] per second: 2.488M
Number of [myBitCount6 (32 bits)] per second: 2.364M
Number of [myBitCount6 (60 bits)] per second: 1.555M
Number of [myBitCount6 (64 bits)] per second: 1.284M
Number of [myBitCount6 (8 bits)] per second: 5.571M
Number of [myPopCount14bit (14 bits)] per second: 18.349M
Number of [myPopCount14bit (8 bits)] per second: 18.519M
Number of [myPopCount24bit (14 bits)] per second: 7.407M
Number of [myPopCount24bit (16 bits)] per second: 7.463M
Number of [myPopCount24bit (8 bits)] per second: 7.018M
Number of [myPopCount32bit (14 bits)] per second: 4.963M
Number of [myPopCount32bit (16 bits)] per second: 5.013M
Number of [myPopCount32bit (30 bits)] per second: 4.608M
Number of [myPopCount32bit (32 bits)] per second: 4.619M
Number of [myPopCount32bit (8 bits)] per second: 4.608M
Number of [myPopCount64a (14 bits)] per second: 2.778M
Number of [myPopCount64a (16 bits)] per second: 2.793M
Number of [myPopCount64a (30 bits)] per second: 2.751M
Number of [myPopCount64a (32 bits)] per second: 2.703M
Number of [myPopCount64a (60 bits)] per second: 2.809M
Number of [myPopCount64a (64 bits)] per second: 1.385M
Number of [myPopCount64a (8 bits)] per second: 2.755M
Number of [myPopCount64b (14 bits)] per second: 3.063M
Number of [myPopCount64b (16 bits)] per second: 3.096M
Number of [myPopCount64b (30 bits)] per second: 3.106M
Number of [myPopCount64b (32 bits)] per second: 3.053M
Number of [myPopCount64b (60 bits)] per second: 3.008M
Number of [myPopCount64b (64 bits)] per second: 1.444M
Number of [myPopCount64b (8 bits)] per second: 3.091M
Number of [myPopCount64c (14 bits)] per second: 1.625M
Number of [myPopCount64c (16 bits)] per second: 1.600M
Number of [myPopCount64c (30 bits)] per second: 1.542M
Number of [myPopCount64c (32 bits)] per second: 1.529M
Number of [myPopCount64c (60 bits)] per second: 1.566M
Number of [myPopCount64c (64 bits)] per second: 1.082M
Number of [myPopCount64c (8 bits)] per second: 3.945M

Now, since method #myBitCount2 is similar to the #bitCount method in Squeak, that means there is still place for improvement as far as a faster #bitCount is needed.  Now the question is : do we optimize it for the usual usage (SmallInteger), for 64bit integer or we use an algorithm that performs relatively well in most cases?  Obviously, since I will always be working with 64bit positive integers, I have the luxury to pick a method that precisely works best in my specific case!

All test code I have used can be found here.

Note: Rush fans have probably noticed the reference in the title…

Publicités