I just came across a question related to parsing fractions on StackOverflow and I thought about how easy that kind of stuff could be done Smalltalk (because we have real fractions!). And then, I wondered how hard that challenge would be with something as limited as GREL (the language used in OpenRefine, namely Google Refine Expression Language). So I tried to do it in GREL!
Here’s the challenge : how do you go from a textual representation of a fraction like -10 2/3 to the number -10.66666 with GREL?
Here’s my test data:
fraction, target 10,10 -10,-10 10 2/3, 10.666666666666666 -10 2/3, -10.666666666666666 2/3, 0.666666666666666 -2/3, -0.666666666666666 +10,10 +10 2/3, 10.666666666666666 +2/3, 0.666666666666666 2,2 -2,-2 2 2/3, 2.666666666666666 -2 2/3, -2.666666666666666 +2,2 +2 2/3, 2.666666666666666 +2/3, 0.666666666666666 10 11/13, 10.8461538461538461 -10 11/13, -10.8461538461538461 11/13, 0.8461538461538461 -11/13, -0.8461538461538461 +10 11/13, 10.8461538461538461 +11/13, 0.8461538461538461 17/4, 4.25 +17/4, 4.25 -17/4, -4.25
The trick is to start small and focus on one case at a time! Just make sure your logic isn’t flawed first and then you can work on each specific case. Here’s how I did it.
Start by making sure your code logic is right:
if(contains(value, « / »), if(contains(value, » « ), « integer with fraction », « fraction »), « integer »)
Then, for each case, dig deeper and make sure you identify each case precisely. In the expression above, replace :
« integer with fraction » with:
if(startsWith(value, « -« ), « negative integer with fraction », « positive integer with fraction »)
« fraction » with:
if(startsWith(value, « -« ), « negative fraction », « positive fraction »)
« integer » with:
if(startsWith(value, « -« ), « negative integer », « positive integer »)
You should end up with something like this:
if(contains(value, « / »), if(contains(value, » « ), if(startsWith(value, « -« ), « negative integer with fraction », « positive integer with fraction »), if(startsWith(value, « -« ), « negative fraction », « positive fraction »)), if(startsWith(value, « -« ), « negative integer », « positive integer »))
Now test!
Once every case is covered, now you’ll have to transform every of the 6 cases in code to extract and convert those numbers. It’s just a matter of splitting every part of the number with GREL’s usual friendly methods, namely indexOf, substring and startsWith. And since the division operator (/) in GREL behaves like the floor function, you need to multiply your results (the strings that have been converted to numbers using toNumber) by 1.0 to force a conversion to a decimal number.
It’s really painful to code but, honestly, all you need is a bit of patience and make sure you deal with one part of the string at a time (and one case at a time)!
Wanna test your code? Just select the « fraction » column and apply the code below in a transform expression : the target value and the transformed value should be the same in the preview pane!
So here’s the GREL code to parse fractions (and convert them to decimal) :
if(contains(value, "/"), if(contains(value, " "), if(startsWith(value, "-"), -1.0*(toNumber(substring(value, 1, indexOf(value, " ")))+ (1.0*toNumber(substring(value, indexOf(value, " "), indexOf(value, "/")))/ toNumber(substring(value, indexOf(value, "/")+1)))), (toNumber(substring(value, 0,indexOf(value, " ")))+ (1.0*toNumber(substring(value, indexOf(value, " "), indexOf(value, "/")))/toNumber(substring(value, indexOf(value, "/")+1))))), if(startsWith(value, "-"), 1.0*toNumber(substring(value, 0, indexOf(value, "/")))/ toNumber(substring(value, indexOf(value, "/")+1)), 1.0*toNumber(substring(value, 0, indexOf(value, "/")))/ toNumber(substring(value, indexOf(value, "/")+1)))), if(startsWith(value, "-"), toNumber(value), toNumber(value))) Enjoy! NOTE: for some strange reason, WordPress keeps changing my double quotes with the French quotes. Don't worry, it's all regular English quotes you should see!