The Realm

4 avril 2018

Pour les nostalgiques, sachez que The Realm est revenu!  Je n’avais pas joué à ce jeu en ligne depuis 1995 quand je suis tombé, par hasard, sur un article qui expliquait que le jeu renaissaît de ses cendres! En plus d’être maintenant gratuit (et même que le code est open source), le jeu a été revampé pour offrir de plus gros défis à chaque catégorie de joueur.  Bien que l’aspect minimal des graphiques et du jeu lui-même puisse en décevoir certains, il faut savoir que The Realm c’était et c’est encore surtout une communauté accueillante où le jeu est parfois un prétexte!

Pour créer son compte, s’enregistrer et obtenir l’installeur de l’application, vous n’avez qu’à visiter le site de MistWalkers et à suivre les instructions!

Et si jamais vous passez dans le coin, venez me dire bonjour!  Vous me trouverez avec les characters Yara, Lerxst, GarbageCollector ou Lamneth!

Daniel Lemire

3 mars 2018

Un autre blogue que je recommande fortement est celui de Daniel Lemire (pas l’humoriste mais bien l’informaticien!).  Il y traite souvent d’algorithmes et de performance avec une minutie du détail inouïe.

Pour avoir un bon exemple de son style et des sujets qu’il traite, je vous suggère son billet sur le choix aléatoire de nombre distinct (Picking distinct numbers at random: benchmarking a brilliant algorithm) ainsi que son article intitulé Iterating over set bits quickly.

Finalement, un exemple de ses contributions à l’open source, les RoaringBitmap.



Le télétravail

28 février 2018

Le télétravail, est-ce facile?  Julia Evans nous raconte son expérience dans cet article.

Uppercase et lowercase

25 mai 2017

Si vous vous demandiez d’où venaient ces termes pour désigner la casse des caractères, l’origine provient de la trousse qu’utilisaient anciennement les typographes!


Que faire d’un vieux mainframe?

13 avril 2017

Vous verrez bien ce que ce jeune homme a fait d’un vieux IBM z890 dans ce video!  C’est fascinant!


13 septembre 2016

Who are you?

That simple question can have many answers depending on how you interpret it. Who are you? Spiritually? Professionally? Psychologically? As a human?  Emotionally? As a parent?  Metaphysically? But there’s an even simpler answer. Almost all of us would answer that question the same way.  Why?  Because all humans do share at least one thing : we have a name!

I recently started working on some gender inference package.  At first glance, an easy task : determine the gender of a person based on its first name.  Not too hard if you first consider the western world but, pretty quickly, it’s not as easy as it looks…

But would would need that?  Why would you want to determine the gender of someone from its first name?  For lots and lots of reasons! If you’re doing research or profiling in sociology, politics, human resources, demographics, marketing or in any other domain, there’s a lot of data out there but, often times, only bits and pieces of it is available.  And often times, gender is not something that is directly available.

But let’s go back a bit.  The study of proper names is actually a science : it’s called onomastics. To be more precise, in our case (the study of the names of human beings), that science is a branch of onomastics that is called anthroponomastics (or anthroponymy).

And as always, whenever I’m starting to work on something, I like to ponder about it all by myself, from scratch. After that, I like to read on the subject and confront my ideas on the subject with what I read.  So that’s what I did.

At first, I was struck by the simplicity of what is out there!  Most gender prediction services/applications were way too naive and simple.  And in almost all cases, useful information was simply stripped away in the sanitizing process.

But first, here’s a list of gender prediction program/packages/services :

Gender API
Gender Guesser
Gender Detector (formerly known as SexMachine)
Gender Predictor
Name Gender Guesser
Gender Guesser API
GendRE Gender APP
Gender Checker
Name Genderization
Kantrowitz Gender Program
Gender package on CRAN
PD Nickname

Now, here’s a brief list of what I found problematic with the current gender prediction services/applications…

North American Bias

Most programs/services I studied use, at least partially, data that comes from the US Census or the SSA. That’s fine as long as you only have to deal with North Americans but those programs miserably fail when used against names from outside North America.  Even worse, in some cases that data was also used to increment the count of occurrences of some names (thus making the gender prediction appear as more precise).  That has the effect of making it almost impossible for some European names to come out with the proper gender in their respective country as it differs from the one that prevails in the United States. In this case, a first name like Michele comes out as being 99% female while, in Italy, it’s mostly a male first name. Besides, those two data sources have a more important problem…

Normalized Data

The US Census and the SSA data sets have one major problem : the data has been sanitized a lot making it almost useless outside North America.  Accentuated characters have been stripped and name particles have been eliminated.  For instance, in a lot of countries, Andréa is a female first name while Andrea is unisex.  Same thing for Michèle : it’s 100% a female first name.  Unfortunately, Michele (without the accentuated character) can either be a male name (in Italy for instance) or a female name (in US for instance). Unfortunately, since the « é » (or « è ») has been normalized to an « e » in both data sets, that distinction is impossible to do now.  Crucial information has been lost in the sanitizing process.

Same thing applies to the removal of some particles that are essential to identify the gender from a name.  In many languages, parts of the name include linguistic hints that indicate the gender of the person or the relationship of the person with it’s parents : « son of », « daughter of », etc.  All those have also been removed from the SSA and US Census data sets.

Black or white results

Some of the services/applications do not answer the gender prediction with a probability.  All you get is either male, female, unisex or unknown without any other detail.  That is not very helpful if you want to filter the gender predictions with a certain level of confidence.  There are situations where you need to know if the prediction is 95% accurate as compared to 52% !

Extra information is not used

In many cases, extra info that could be essential to identify gender or at least useful to determine it in some countries is simply not used by the programs/services.  Even if most programs allow you to specify a country, in many case that information could very well be supplemented by the last name.  For instance, Michele is mostly female.  But even if you’d ask for the gender of Michele in Canada, it would be essential to know that if the last name is Forgione, (an Italian last name), you’re very most likely dealing with a male first name!  The same critical information is even more obvious with Russian names : once you know how female last names are formed in Russia, you don’t even have to know the first name to determine the gender if you are dealing with someone named Kournikova!  Same kind of detail can be inferred from the year of birth of the person you are trying to determine the gender from : it is well known that some of today’s unisex first names have, at some point in time, gone from one side to the other and then eventually became unisex.  Year of birth information in those cases can precise the probability of the gender a lot!

Lots of data is inferred

Lots of these programs crawl social media to gather more data for their database.  The main problem is that the collected data is « validated » based upon the same 2 data sets (US Census and SSA) thus polluting the data they are collecting at the same time!  That is just wrong!  You’re collecting data for a gender prediction program and the data you collect is also « predicted » or inferred!  Nothing will replace official lists, based on official sources that specify the gender.  And the more local (one per country ideally) those lists are, the better.

In other cases, some name collecting methods have to rely on a multitude of imprecise methods to estimate the gender of the names they collect, thus making the precision of the inferred gender even lower.  When you have to go through a face recognition algorithm, then deduction of the first name by parsing a Twitter nickname and finally processing all that data through a SVM classifier means one and only one thing : the more steps you have, the more error-prone you are.

Now what?

Well, in just 2 days I was able to collect 2.5 million (not unique, of course) names from 220+ countries from 60+ data sets.  I even haven’t used the US Census nor the SSA data set!  What am I going to do differently to deal with the issues I’m describing in this post?

Well, I’ll keep that for another article! I’m not ready to reveal my secrets… yet.

P.S. If you know other gender prediction programs, leave a comment!  I’ll update this post if necessary.  Take note that I tried to list genuine gender prediction program/services, not wrappers to an existing web service!



8 septembre 2016

God, did I love that program!!! A classic, an epic battle between Minotaur and Blacksmith!


Statistics for liars and idiots!

31 août 2016

It’s all explained here.



TBT (3)

25 août 2016

J’ai récemment eu des flashbacks de moi, ti-cul, vers 8-9 ans…  Mon amour pour les modèles à coller!  Dieu que j’en ai fait!  Je dépensais tout mon argent de poche pour de la peinture Testors!

Alors voici la liste de mes modèles (et aéronefs) préférés (encore à ce jour)…

Le Saab JA37 Viggen.

À l’époque, un des très rares avions possédent des ailes en delta.  On croyait que cet appareil sortait tout droit du futur!

Le Spitfire Mark I.

Ma version préférée du Spitfire.  Je n’ai jamais aimé les autres incarnations de cet appareil (ailes tronquées, prises d’air sous les ailes, train d’atterrissage renforcé, ailes plus ou moins pointues, le nombre augmenté de tuyaux d’échappement visibles à l’avant, etc)…  La version 1 a toujours été ma favorite!

Le CF-5A.

L’avion par excellence de l’armée canadienne pendant des années! En comparaison du reste de la flotte, cet avion ramenait le Canada  dans l’ère moderne!

Le F-14A Tomcat.

J’ai eu, pendant une assez longue période, un intérêt tout particulier pour le porte-avion USS Nimitz après avoir vu le film Nimitz, retour vers l’enfer. Comme à une certaine époque le F-14 constituait une bonne partie de sa flotte, c’est « par la bande » que je me suis intéressé au F-14. Dès le départ, j’ai été fasciné par ses ailes à géométrie variable!

Le Thunderbird 2.

Souvenirs de mon enfance!  Malheureusement, je n’ai jamais pu la mettre sur un modèle à coller de mon aéronef préféré de tous!

NCC-1701 Revival!

1 août 2016

Nice, touching, beautiful, fantastic!