Jeux de données : suite

5 mars 2018

Si Kaggle (dont j’ai parlé ici) ne vous suffit pas, il y a aussi cette compilation de jeux de données qui pourrait vous aider!


12 août 2016

A long long while ago, I had a close look at GoogleRefine.  This tool’s sole purpose is to extract, clean, transform and reconcile data.  And  the more the messy is your data, the better you’ll like this tool!

At first glance GoogleRefine was very interesting but, at the time, the whole thing was more promising than useful.  But recently, while looking for GoogleRefine again (I just could not remember the name!), I found its successor: OpenRefine!

Since then, Java has matured, web services are more robust, the tool has progressed quite a lot and OpenRefine uses everything in it’s power to facilitate your job!  More ways to reconcile the data, many different ways to transform your data, more predefined functions and functionalities!

Custom transformations can be done in 3 ways with some easy coding : with GREL (Google Refine Expression Language), Jython (a Python implementation that runs on Java) or Clojure.  Many many many ways to reconcile the data are now available, more import formats (TSV, CSV, Excel, JSON, XML, etc), more ways to reconcile data from webservices and the list goes on.  I must say OpenRefine has lots to offer!

So instead of writing a novel about how cool this tool is, I’ll leave you with a list/compilation of videos, tutorials, documents and websites that demonstrate what OpenRefine do for you!

School of data

Enipedia Tutorial

Hope this help!

In the I-have-to-clean-up-this-mess department, DataCleaner is another useful tool.  But that’s going to be the topic of another post!