4 min read

When 3‑letter language codes become 2; or, ISO frustrated

I wish I was a little bit taller, I wish I was a baller
I wish I had a girl who looked good, I would call her
I wish I had a rabbit in a hat with a bat
And a six-four Impala
—Skee-Lo, ​“I Wish” (1995)

Size. In tech, we deal with it daily. We’re growing our profits. The meeting’s going longer than expected. The client wants a short turnaround.

Video streaming devs can relate. In particular, the size of language codes has been flummoxing some of you recently.

Thus, this month’s customer support question is: Why am I getting only 2‑letter language codes in my client manifest/​playlist?

Discussing the length of language codes means discussing standards.

Gotta have standards

Most of you are familiar with the catchily named ISO 639 standard, more specifically ISO 639 – 2/T, ISO 639/B, or ISO 639 – 3, and their 3‑letter language codes:

eng

for English;

ita

for Italian; and

epo

for Esperanto (obviously).

And why not? ISO 639’s codes are everywhere. This is what you see in your server manifest where you have an audio or subtitles language assigned to a track:

video src="tears-of-steel-avc1-1500k.cmfv" systemBitrate="1502000"
systemLanguage="eng">


But the story doesn’t end there.

ISO 639 – 1 is a list of 2‑letter language codes. Now, having gone 11 words without introducing another standard, let’s go ahead and introduce RFC 5646 — the standard used by both DASH and HLS for languaging.

RFC: request for confusion?

The RFC 5646 standard—which we recommend reading out loud to your partner right before a romantic time — states:

“When languages have both an ISO 639 – 1 two-character code and a three-character code (assigned by ISO 639 – 2, ISO 639 – 3, or ISO 639 – 5), only the ISO 639 – 1 two-character code is defined in the IANA registry.

“When a language has no ISO 639 – 1 two-character code and the ISO 639 – 2/T (Terminology) code and the ISO 639 – 2/B (Bibliographic) code for that language differ, only the Terminology code is defined in the IANA registry. At the time this document was created, all languages that had both kinds of three-character codes were also assigned a two-character code; it is expected that future assignments of this nature will not occur.

​“In order to avoid instability in the canonical form of tags, if a two-character code is added to ISO 639 – 1 for a language for which a three-character code was already included in either ISO 639 – 2 or ISO 639 – 3, the two-character code MUST NOT be registered. See Section 3.4.

“For example, if some content were tagged with ​‘haw’ (Hawaiian), which currently has no two-character code, the tag would not need to be changed if ISO 639 – 1 were to assign a two-character code to the Hawaiian language at a later date.”

In other words, while using RFC 5646, if there’s an ISO 639 – 1, 2‑letter language code, *that* will be used as standard. If not, you’ll get something else!

So what does this mean? Well …

eng for English becomes en;
ita for Italian becomes it; and
epo for Esperanto becomes eo.

Facila, ĉu né?

For Cantonese, though, yue stays yue because no 2‑letter code exists for it.

So, even if you don’t think that Cantonese code is what’s best for you, yue do yue.

We’re done, right?

Thought the story ended there, did you? It doesn’t. Look at these examples:

mp4split -o audio-nl-be.mp4 --track_language=nl-be \
    --track_description="Vlaams Nederlands"
mp4split -o audio-zh-yue-hant.mp4 --track_language=zh-yue-hant \
    --track_description="Cantonese Chinese using Traditional script"


It’s good practice to specify a macro language when possible. For example: signal Cantonese Chinese using ​“zh-yue” rather than ​“yue,” so that ​“Chinese” is used as the fallback option. Or use the ISO 639 – 1 codes for language variants such as nl-be for Flemish, which, due to enforced capitalization, is actually nl-BE.

So now you know why that 2‑letter language code ends up in your client manifest or playlist.

In fact, there are very few languages without a 2‑letter code. Only ​“haw” for Hawaiian and ​“yue” for Cantonese come to mind!

Your DASH and HLS players should be to spec, so they shouldn’t have an issue with RFC 5646.

But — as Skee-Lo’s humble and vulnerable lyricism in ​“I Wish” is trying to tell us — we don’t live in a perfect world. If we did, then every day would be a Friday, and you could even speed on the highway, and name your kids Little Mookie, Big Al, and Lorraine. To paraphrase the song.

Go on, be seditious

If you really must use 3‑letter versions of languages for static packaging, we can only suggest using ​“sed” (i.e., the command-line tool, not as a language code) to update your manifests. For Origin, you can use Apache’s mod_​substitute to make the replacement.

So, yes, pertaining to language codes in video packaging, it can all be a bit frustrating.

But once you master the RFC 5646 and related ISO rules, you’ll be a baller with a six-four Impala.

Something Skee-Lo wished for in his music video, and got.

Share