Converting Vulgar Fraction text in Python
So, I had a little bit of a problem. A customer wanted a set of recipes, but the quantities had vulgar fractions in them. An example of this is ‘1 ⅛'
These factions are unicode characters, but they are not seen as any type of number. You will also notice that I have two numbers in my text, so I have to convert the ‘⅛’ to a decimal and then add to the first number.
There are a number of solutions out there for this, but no standard one. nearly all of them work around the ‘unicodedata’ library, as does mine. So lets see how this works
Unicodedata is a builtin library that allows you to work with uncode characters. You can use it to convert a unicode code to a character, get the code for a given character, and a few other things. We are going to use two different functions from this library, normalize() and name(). I will go into using name() later and start with normalize().
import unicodedata
string_with_fraction = '1 ⅛'
string_number = unicodedata.normalize('NFKC', string_with_fraction)What the normalize function does is change the fraction into multiple characters, but the results still look a little odd. ‘1 1⁄8’
The normalize has not put in a forward slash, but a character it calls a fraction slash. So while we now have our fraction split into 3 characters, we still need to replace some of the characters. We also need to be able to separate the two numbers in the text, and add them together to get a decimal number.
This is where name() comes in. Its a REALLY useful function. It passes back the unicode name for a given character.
import unicodedata
string_with_fraction = '1 ⅛'
string_number = unicodedata.normalize('NFKC', string_with_fraction)
for x in string_number:
print(unicodedata.name(x))DIGIT ONE
SPACE
DIGIT ONE
FRACTION SLASH
DIGIT EIGHT
Now we have a nice list of strings representing the characters in the string, including the space. With these nice descriptive strings, it makes it very easy to convert our number into a decimal
def convert(fraction_string):
fraction_string = unicodedata.normalize('NFKC', fraction_string)
names = [unicodedata.name(x) for x in fraction_string]
number_map = {"ONE":"1","TWO":"2","THREE":"3","FOUR":"4","FIVE":"5","SIX":"6","SEVEN":"7","EIGHT":"8", "NINE":"9", "ZERO":"0"}
numbers = []
current = []
for name in names:
if "DIGIT" in name:
current.append(number_map[name])
elif "FRACTION SLASH" in name or "SOLIDUS" in name:
current.append("/")
elif "SPACE" in name:
numbers.append("".join(current))
current = []
if len(current) > 0:
numbers.append("".join(current))
result = 0
for number in numbers:
result += eval(number)
return resultOne this I have not mentioned is “SOLIDUS”. This is the name given to a forward slash.
That’s it. The above function will convert string numbers with unicode fractions into decimals. There was of course, more work put in looking at other routes and hitting a lot of dead ends, but this is it.
The Unicodedata library is also one that I think, those of us that have to do a lot of string manipulations should be familiar with. Using the name() function to assign text names to each character can really make live easier when we are dealing with the more complex strings and also make pattern matching easier as well.
I hope this little function will save some of you some time.
