UTF-8 encoding/decoding in C
I was working on a simple database to Excel XML exporter the other day and decided to write it in C. Now, the problem was that since the Swedish language contains non-ascii characters the output needs to be UTF-8 encoded. C doesn’t have a built-in function for this – it seems I should add since I’m a C rookie – and no matter how I searched at Google I couldn’t find anything useful. So I thought…
Look at PHP
…why not look at the source code of PHP and see how the PHP functions utf8_encode and utf8_decode are being done. So I downloaded the source of PHP and with a little find . -name *.c -print | xargs grep "utf8_encode" I found the functions in xml.c. Thankfully they weren’t too complicated – when dug out from the rest of the XML functions – so I didn’t take too long before I had them as standalone functions.
This is how they are used:
- #include "utf8.h"
- int main(int argc, char **argv)
- {
- char *iso_str = "Pontus Östlund";
- char *utf8_str;
- utf8_str = utf8_encode(iso_str);
- iso_str = utf8_decode(utf8_str);
- return 0;
- }
And it seems to be working quite OK!