FullText, statistiques et MySQL : quelques mauvaises surprises

15 mars 2018

Si vous utilisez MySQL de façon sérieuse, je ne peux vous dire à quel point vous devriez suivre le blogue MySQL Entomologist religieusement!  L’auteur y expose les bogues, les problèmes, les particularités de MySQL dans un style efficace, sans biais ni embellissement : que la vérité crue!

Comme par exemple les nombreux problèmes avec les indexes fulltext ou les statistiques persistentes de InnoDB.

Mu, sigma & friends

1 mai 2017

You’re a programmer and sometimes you need some statistical tools and knowledge?  Here’s a short list of resources that could be helpful to you.

Free Statistics eBooks

10 septembre 2016

A short list of interesting eBooks on statistics here.

Statistics for liars and idiots!

31 août 2016

It’s all explained here.




26 août 2016

If you need to do some serious maths/statistics stuff or just number crunching, there’s plenty of tools out there.

Being a happy Smalltalker/Pharoer/VisualWorker/VisualAger/Dolphiner/Squeaker, my favorite library is PolyMath (previously known as SciSmalltalk).  Otherwise, I’m a big fan of R (mostly because of the huge amount of packages available).  Hey! That’s a long way from my nightmare days of SAS and SPSS!

Most complaints I hear about R is its inability to deal with large amounts of data and somewhat annoying syntax/style (I don’t get it!!!).

But there’s always Julia.  Give it a try!

Freewill in progress (2)

3 août 2016

Freewill Selection Policies(Click to enlarge)

What’s up?

As you can see, Freewill now supports 17 different selection policies.  At this point, all of them are coded but only half of them have been tested.

The 11 available termination policies are coded, half of them tested.

So far, only 2 mutation policies are available.  Both of them are coded and tested. I will probably need a few extras for TSP type of problems as well as numerically parametrized problems (e.g. De Jong functions with a domain for each variable).  I’ll probably add 3-4 other ones specific to the problem that started all this adventure!

Only one immigration policy (no immigration!) is available and it will stay that way for a long time.  I’ll wait until I am hyper confident that this framework is rock solid before introducing parallelism and exchange of individuals between « islands » (i.e. simulations).  This one is a faaaaaaar away!

Six crossover policies are available as of now .  This area will require some (minor I think at first glance) changes for the TSP type of problems : not quite decided on the approach I will take to solve this.  Since crossover is often very problem/chromosome specific, I’ll probably delay those change until the end, once I have all examples coded and ready to be tested to have a better idea of what is needed.  But I will definitely add a few (3-4) crossover policies tailored for the Ruzzle problem.

I have solved the discrepancy (see here and here) between my results and the TSPLIB ones regarding the tour length of the Burma14 problem.  Will probably add a lot bigger TSP problem to see how the framework can handle an extremely huge search space! Oh!  And I need to clean up all the crap I added/modified while looking for the problem of « distance difference » : 2 classes were butchered in the process!

I need to add a few « crash test dummy » classes to test all those different selection policies (and crossover) in a simpler and more efficient manner!  Or I should kick myself in the %*&#$!@ and code the « bits » example classes…

I will soon work on a customizable display of statistics.  All that’s needed is already there, it’s just a matter of gluing everything together!

Once I’m done with the 8 queens problems, I’ll attack the numerically parametrized problems.  Will probably have 2-3 examples (from De Jong functions) as well as the INSANE Griewank function.

The classes used for randomly choosing the next parent chromosomes as well as scaling/ranking can be optimized.  But since they just work great since day 1, I’ll keep that for the very end.  But I know they can be a lot faster than what they are right now.

I also plan on having a very basic export mechanism so I can dump all those ruzzle chromosomes in a MySQL database to be able to do some reporting and study the various policies and their effects.

I started adding comments to the classes, mostly to keep references, maintain a todo list per class and add some notes for myself to quickly remember why things work that way!

I’ll probably have an image by tomorrow that will run simulations for the ruzzle problem full-time. I wanna beat that record!