09-22-2015, 11:36 AM
(09-22-2015, 10:49 AM)Shaadaris Wrote: Is anyone here knowledgeable in python 3, specifically when it comes to JSON and unicodes?
I have a little project I'm working on and I've hit a snag. Anyone remember the idea I had for a procedural pseudo-language generator? Well I'm working on it. Only problem is that the way I have it set up, it writes new words to a JSON file and this happens:
{"dog": "\u0251\u00f8"}
The problem here is that it automatically encodes the nonstandard letters I'm using into this format, which means when I go to compare a string to check if the word already exists in the dictionary file, it thinks it doesn't and rerolls the word because "\u0251\u00f8" does not equal "ɑø".
I've tried encoding and then decoding it in a more manipulatable format like utf8 but the resulting unicode object isn't serializable in JSON meaning I can't write it to the file. I've spent ages trying to fix it and nothing is working at all! ARGH!
From what I'm googling (keep in mind I know basically jack all about character encoding) - try giving "ensure_ascii=False" as a parameter to json.dumps()
This stackoverflow answer suggests you can do it yourself with a lambda function (or regular function) as well. A little bit more wisdom here (but the example code seems to be largely a workaround for python 2 - seems that ensure_ascii=False and encode('utf-8') should be sufficient for python 3 assuming everything else is setup and the input itself is not incorrect in some way)