We all know that guy. You know, that guy who sees performance improvements everywhere, all the time?
That programmer who squeezes everything into bitmaps because « it’s so much faster » ? And when I say everything, I mean everything! Even if your class only has one flag!
That guy who caches everything in the application because « that’s the optimal way to do it« ? Even if the data that is cached is just accessed only one time?
That guy who fits a date into an integer in the database because « it’s so much more compact » ? There goes all your SQL and date functions!
That developer who always ends up re-implementing another sorting algorithm because he read a « great paper » on the subject and « it’s proven that it’s 0.2% faster than the default sort » available? And after an insane debugging session, you finally realize he has overriden SortedCollection>>#sort: ? And that his code just doesn’t work properly!
You know, that guy with a C/C++ background who spends countless hours optimizing everything not realizing he has to « make it work » first! You know, that guy who still doesn’t get that often times you only need to optimize very small parts of an application to make a real difference?
You know that guy with strange concepts such as « defensive programming » who tells you nobody ever caught a bug in his code in production? You know, that guy who came up with the « clever » catch-everything method #ifTrue:ifFalse:ifNil:onError:otherwise: ?
You know that guy who never works for more than 4 months at the same place because « they didn’t get it, they’re a bunch of morons » ?
You know, that guy who prefers to implement complex database queries with a #do: loop and a gazillion SELECT statement because « SELECT statements with a 1-row lookup are very optimized by the database server » instead of using JOINs. And then he blames the slow response time of his « highly optimized » data retrieval code on the incompetence of the DBAs maintaining the database?
You know, that guy who once told me « inheritance in Smalltalk is very bad because the deeper the class in the hierarchy, the slower the method lookup is going to be » so that’s why he always preferred « flattened hierarchies » (meaning promoting ALL instance variables into one root class) with 2-3 levels deep in the worst case? « Besides, my code is easier to understand and debug since everything is in one place« .
Well, I was going through some of the sh*ttiest code I’ve seen in a long time last night and I remembered that guy and his ideas about « highly optimized flattened hierarchies » and thought I’d measure his theory! Here’s the script.
Basically, it creates a hierarchy of classes (10000 subclasses) and times message sends from the top class and the bottom class to measure the difference.
Well, that guy was right… There’s a 1 millisecond difference over 9 million message sends from the root class as compared to the 10000th class at the bottom of the hierarchy.
I wanted to tell that guy he was right but his email address at work is probably no longer valid…
CAVEAT: Make sure you do not have a category with the same name as the one in the script because the script removes the category at the end (and all classes in it!). Also, be warned it takes a very very very long time to execute!
P.S. All the stories above are real. No f*cking kidding! Those guys really existed : I’ve worked with them!
Un excellent résumé qui fait un tour d’horizon des séquences (et des particularités de chacunes) pour les bases de données les plus populaires.
For those interested, I have released the very first version of my database of chess games, the CGR database! All details here!
Some of you may already know *lots* of MongoDB servers are compromised as we speak. That’s sad. But in a way, it’s not. It’s just the very predictable consequence of sheer incompetence. Here’s a reminder on the very basics of database server security.
- Lock the admin account with a f*cking password you moron!
- Create another admin account under another name that one would hardly think is an admin account and then delete the original admin account. Hackers expect that most server are installed with the default features/accounts. They will look for admin, sysadmin, system, administrator, etc and the like. They will most likely not check for butterfly, user7342 or whateverElse.
- Be creative! Everyone knows MySQL is listening to port 3306, DB/2 to 50000 and PostgreSQL to 5432. Hackers know that too. Never install your server on the default port! Give ’em a hard time figuring out what database server is installed and where they can get in!
- Remove everything you do not need. That sample database, that test database and all that crap that is installed by default and that you don’t need is just another tool hackers can use, for SQL injection for instance. Don’t facilitate the hacker’s job!
- Don’t wait. Install security updates as soon as they are made available.
- Permissions are a must. Learn to use GRANT and REVOKE. And use them!
- Monitor your servers. It’s not because your instance has been up and running for 302 straight days that things are OK! Your server could have been compromised for 301 days and you still don’t know if you don’t monitor it!
- Stay informed. There are lots of mailing lists, discussion forums, IRC channels, free eBooks, YouTube videos of seminars and conferences, etc about database security. It’s free! You have no excuse!
- Remember advice #1 : lock the admin account with a f*cking password you moron! A very strong password!
As you can see, Freewill now supports 17 different selection policies. At this point, all of them are coded but only half of them have been tested.
The 11 available termination policies are coded, half of them tested.
So far, only 2 mutation policies are available. Both of them are coded and tested. I will probably need a few extras for TSP type of problems as well as numerically parametrized problems (e.g. De Jong functions with a domain for each variable). I’ll probably add 3-4 other ones specific to the problem that started all this adventure!
Only one immigration policy (no immigration!) is available and it will stay that way for a long time. I’ll wait until I am hyper confident that this framework is rock solid before introducing parallelism and exchange of individuals between « islands » (i.e. simulations). This one is a faaaaaaar away!
Six crossover policies are available as of now . This area will require some (minor I think at first glance) changes for the TSP type of problems : not quite decided on the approach I will take to solve this. Since crossover is often very problem/chromosome specific, I’ll probably delay those change until the end, once I have all examples coded and ready to be tested to have a better idea of what is needed. But I will definitely add a few (3-4) crossover policies tailored for the Ruzzle problem.
I have solved the discrepancy (see here and here) between my results and the TSPLIB ones regarding the tour length of the Burma14 problem. Will probably add a lot bigger TSP problem to see how the framework can handle an extremely huge search space! Oh! And I need to clean up all the crap I added/modified while looking for the problem of « distance difference » : 2 classes were butchered in the process!
I need to add a few « crash test dummy » classes to test all those different selection policies (and crossover) in a simpler and more efficient manner! Or I should kick myself in the %*&#$!@ and code the « bits » example classes…
I will soon work on a customizable display of statistics. All that’s needed is already there, it’s just a matter of gluing everything together!
Once I’m done with the 8 queens problems, I’ll attack the numerically parametrized problems. Will probably have 2-3 examples (from De Jong functions) as well as the INSANE Griewank function.
The classes used for randomly choosing the next parent chromosomes as well as scaling/ranking can be optimized. But since they just work great since day 1, I’ll keep that for the very end. But I know they can be a lot faster than what they are right now.
I also plan on having a very basic export mechanism so I can dump all those ruzzle chromosomes in a MySQL database to be able to do some reporting and study the various policies and their effects.
I started adding comments to the classes, mostly to keep references, maintain a todo list per class and add some notes for myself to quickly remember why things work that way!
I’ll probably have an image by tomorrow that will run simulations for the ruzzle problem full-time. I wanna beat that record!
After a major data loss (I haven’t given up on getting back all my data, mostly code repositories and databases!), I had to start all my pet projects from scratch. Luckily, it’s easier second time around as they say! And, lucky me, I store all my personal stuff on the web! So here’s a list of what’s coming up on this blog.
Even though I had a decent working version of the genetic algorithm program to find the best ruzzle grid (original posts in French here, here and here), I wasn’t satisfied with the code. It slowly evolved from a bunch of code snippets into something I could somehow call a genetic algorithm. Problem was that my solution was tailored for this specific problem only! Since I lost all the Smalltalk code, I redid the whole thing from scratch : better design, simpler API, more flexible framework. I can currently solve a TSP problem, the best ruzzle grid search and a diophantine equation.
I also plan to provide examples of the 8 queens problem, the knapsack problem, a quadratic equation problem, a resource-constrained problem and a simple bit-based example with the GA framework. Besides, the are now more selection operators, more crossover operators, more termination detectors (as well as support for sets of termination criteria!), cleaner code and the list goes on! So I’ll soon publish a GA framework for Pharo.
As most of you know, the Rush fan in me had to pick a project name in some way related to my favorite band! So the framework will be called Freewill, for the lyrics in the song :
Each of us
A cell of awareness
Imperfect and incomplete
With uncertain ends
On a fortune hunt that’s far too fleet
A stupid quest I’ll address after the first version of my GA framework is published. It all started with a simple question related to the game of bingo (don’t ask!) : can we estimate the number of bingo cards sold in an event based on how many numbers it takes for each card configuration to have a winner? So it’s just a matter of generating millions of draws and cards à la Monte Carlo and averaging how many numbers it takes for every configuration. Why am I doing that? Just because I’m curious!
There’s been a lot of action on the Pharo side and Glorp. I plan on having a serious look at the latest Glorp/Pharo combo and even participate to the development!
I’ll translate my articles (in French here, here and here) on the SQL sudoku solver in English and test the whole thing on the latest MySQL server. Besides, db4free has upgraded to a new MySQL server version!
I had done a port of NeoCSV to Dolphin right before losing all my code data. Wasn’t hard to port so I’ll redo it as soon as I reinstall Dolphin!
It’s time to reinstall VisualAge, VisualWorks, Squeak, ObjectStudio and Dolphin and see what’s new in each environment! From what I saw, there’s a lot of new and interesting stuff on the web side. Add to that the fact that most social media platforms have had significant changes in their respective APIs recently, so there’s a lot to learn there!
That’s a wrap folks!