That guy

3 avril 2017

We all know that guy. You know, that guy who sees performance improvements everywhere, all the time?

That programmer who squeezes everything into bitmaps because « it’s so much faster » ?  And when I say everything, I mean everything! Even if your class only has one flag!

That guy who caches everything in the application because « that’s the optimal way to do it« ?  Even if the data that is cached is just accessed only one time?

That guy who fits a date into an integer in the database because « it’s so much more compact » ?  There goes all your SQL and date functions!

That developer who always ends up re-implementing another sorting algorithm because he read a « great paper » on the subject and « it’s proven that it’s 0.2% faster than the default sort » available?  And after an insane debugging session, you finally realize he has overriden SortedCollection>>#sort: ?  And that his code just doesn’t work properly!

You know, that guy with a C/C++ background who spends countless hours optimizing everything not realizing he has to « make it work » first! You know, that guy who still doesn’t get that often times you only need to optimize very small parts of an application to make a real difference?

You know that guy with strange concepts such as « defensive programming » who tells you nobody ever caught a bug in his code in production? You know, that guy who came up with the « clever » catch-everything method #ifTrue:ifFalse:ifNil:onError:otherwise: ?

You know that guy who never works for more than 4 months at the same place because « they didn’t get it, they’re a bunch of morons » ?

You know, that guy who prefers to implement complex database queries with a #do: loop and a gazillion SELECT statement because « SELECT statements with a 1-row lookup are very optimized by the database server » instead of using JOINs. And then he blames the slow response time of his « highly optimized » data retrieval code on the incompetence of the DBAs maintaining the database?

You know, that guy who once told me « inheritance in Smalltalk is very bad because the deeper the class in the hierarchy, the slower the method lookup is going to be » so that’s why he always preferred « flattened hierarchies » (meaning promoting ALL instance variables into one root class) with 2-3 levels deep in the worst case? « Besides, my code is easier to understand and debug since everything is in one place« .

Well, I was going through some of the sh*ttiest code I’ve seen in a long time last night and I remembered that guy and his ideas about « highly optimized flattened hierarchies » and thought I’d measure his theory!  Here’s the script.

Basically, it creates a hierarchy of classes (10000 subclasses) and times message sends from the top class and the bottom class to measure the difference.

Well, that guy was right… There’s a 1 millisecond difference over 9 million message sends from the root class as compared to the 10000th class at the bottom of the hierarchy.

I wanted to tell that guy he was right but his email address at work is probably no longer valid…

CAVEAT: Make sure you do not have a category with the same name as the one in the script because the script removes the category at the end (and all classes in it!).  Also, be warned it takes a very very very long time to execute!

P.S. All the stories above are real. No f*cking kidding!  Those guys really existed : I’ve worked with them!


The order of indexes

28 février 2017

If you thought all you had to do was to declare a few indexes here and there and MySQL would magically be fast, you’ll be surprised reading this excellent article.


Count occurrences of a string using MySQL

20 juillet 2016

This was originally posted in French here.

There’s no string function in MySQL (and many other databases!) to help you find the number of occurrences of a string within another string.  For example, how many times does « abc »  appear in « abcbcbabcbacbcabcababcabacb » ?

I was asked this question on IRC a long time ago. Some poor soul was trying to find a particular subsequence in a genomic string (for instance « TAT ») in the following sequence :

ATTGGTGGGCTCTACTAAGATATCAACGGGACTTCGGAGCGTGCCGCACTATTT

Obviously, you can use your favorite programming language and do this kind of search programmatically but is there a way to do it in SQL?

Luckily, the answer is yes!  The solution is simple and looks like this:

SELECT FLOOR(( LENGTH(source) - LENGTH(REPLACE(source, target, '')) ) / (LENGTH(target))) as occ

To come back to our example, « source » being the genomic sequence and « target » being « TAT », you’d have :

SELECT FLOOR(( LENGTH('ATTGGTGGGCTCTACTAAGATATCAACGGGACTTCGGAGCGTGCCGCACTATTT') - LENGTH(REPLACE('ATTGGTGGGCTCTACTAAGATATCAACGGGACTTCGGAGCGTGCCGCACTATTT', 'TAT', '')) ) / (LENGTH('TAT'))) as occ

Here’s the answer!

Fortunately, in life there are way more many solutions than problems!  And sometimes, long SQL queries!

P.S.  This new post will probably interest you a lot!

Update (2019-03-07)

Save

Save

Save

Save

Save