Docker and encoding (bis)
Have you ever faced encoding issues in some Ruby code — eventually ran inside a Docker container? It is painful and on the other hand it is easy to fix when you know what to do.
§The problem
The task is pretty simple: a Ruby code parses some JSON back from an standard out output (stdout) shelled out by an executable (through Popen.capture3
).
This code worked totally fine until we containerised the process for Docker. For some mysterious reasons, JSON.parse
complained about a wrongly formatted string.
Pair-debugging helped to debunk the issue, and as @garethadams fairly pointed out, Ruby is supposed to use UTF-8 by default… the error claimed the opposite.
OK Docker ruby, show me your encoding:
1 |
|
This is wrong. It might be inherited from the default system locale:
1 |
|
Which leads to the popular article Docker and locales by Jared Markell… but it did not solve the issue.
The ruby
official image is based on debian:jessie
. Debian removed their dependency on the locales
package in 2011. It explains the unavailability of en_US.UTF-8
.
The quickest solution is to opt in for C.UTF-8
, the UTF-8 locale provided by libc-bin
, on Debian at least:
1 |
|
Problem solved! Although it is a bit cumbersome to specify that manually. A good and persistent way of doing is to specify that environment variable in any image based on a Ruby one:
1 |
|
Et voilà ! It is a lightweight and recommended solution as it does not require to install an apt
package.
Knowing about the chain of base images is very helpful as you can understand how to configure a container. Because the configuration can vary from a base image to another one, you might want to ensure that all/most of your Docker images inherit from a same consistent base.
Notice: Node.js base image is not affected by the system locale, as Node deal with either Buffers or UTF-8 encoded strings anyway.