6 min read

I’m not offset, *you’re* offset

Where I come from, we have a phrase that I didn’t think was odd or absolutely horrific until I began writing this blog.

“There are many ways to skin a cat.”

I don’t know how many weird individuals are out there thinking of the various ways they could peel their pet, and I apologize for the image, but strangely, I can’t find a more non-NSFW idiom. Feline abuse aside, the phrase means there’s more than one way of doing things.

And it’s true. Some ways are better (or at least more suitable) than others. [Or, to stick with animal-based sayings, you have ​‘horses for courses’ (different beings and methods fit different things)].

This month’s support question of the month is: How come Validator is complaining about my offsets?

What do offsets have to do with cats? Fair question.

In a perfect world of only IDR-frames, each frame can be decoded independently and hold all the information to make up a picture. If you add P‑frames into the mix, to save on data, then these frames (which contain only the changes between frames) need the previous IDR-frame for them to make sense.

All good so far, and we’re still moving forward in time.

Now to save even more space, you introduce B‑frames. And that’s when things get complicated. To complete the picture, a B‑frame might use a P‑frame in the future and an IDR-frame in the past.

But what does it all mean, o mystical time traveler? It means the order in which video frames need to be decoded doesn’t always match the order in which they should be presented. So each frame has a decode timestamp (DTS) and a presentation timestamp (PTS).

[Trying to keep video acronyms like PTS and DTS straight may give you a case of PTSD (post-traumatic stress disorder). So, just saying: be careful.]

So a GOP (group of pictures) with IDR- and P‑frames could look like this:

DTS  0     1     2     3      
   [IDR]  [P]   [P]   [P]
PTS  0     1     2     3      

But add B frames and …

DTS  0     1     2     3      
   [IDR]  [P]   [B]   [B]
PTS  0     3     1     2

If a frame’s PTS doesn’t equal its DTS, you need an offset. In MP4 land, there are two things that can help:

  • A track-level edit list in ​‘moov.trak.edts.elst’
  • A sample-level composition time offset (CTO) in ​‘moof.traf.trun.’ (By the way, moof.traf.trun is also a fine name for a Danish electronica band.)

The track below includes positive CTOs, ensuring the P‑frame that needs to be presented fourth, is decoded second:

DTS  0     1     2     3      
   [IDR]  [P]   [B]   [B]
CTO  1     3     0     0
PTS  1     4     2     3

This fixes the order problem. But introducing these positive CTOs means the PTS value of the first frame becomes 1. To make sure the track still starts at ​‘0,’ we need an edit list to signal that media_time=1, or, in other words, that PTS ​‘1’ should actually be considered ​‘0.’

Why is this a problem? Well sometimes edit lists get ignored, misinterpreted or deleted, causing sync issues or buffering or even bitrate switching issues. When different tracks that originally contained different edit lists are bundled together in a stream, misalignment may ensue.

Furthermore, it remains open to interpretation whether the start times for all fragments, which are signaled in the fMP4’s index (‘mfra,’ or in the case of CMAF, ​‘sidx’), should be understood as referring to DTS or PTS. As long as DTS does not equal PTS at the start of each fragment, this is a problem.

Fortunately, negative CTOs can solve our issues. This approach can guarantee that PTS equals DTS for the first sample of each fragment without the need for an edit list. This also makes sure that the PTS of the first sample of each track aligns across tracks that are encoded according to different video profiles (with and without B‑frames).

Let’s glance back at the earlier example that used B‑frames with positive CTO’s and an edit list. Now, we’ll introduce negative CTOs and get rid of that edit list:

DTS  0     1     2     3      
   [IDR]  [P]   [B]   [B]
CTO  0     2    -1    -1
PTS  0     3     1     2

So that brings us back to our question: how would we like you to skin your cat?

With version 1 ​“trun” boxes, and the DTS of the first keyframe equal to its PTS (i.e., no CTO and no edit list). Any samples that follow should use a CTO where applicable, negative or positive.

DASH, Smooth, and HLS all support the use of negative CTOs. How about that?

IRL: Let’s say you’re seeing buffering on some HLS JS players when switching between bitrates. Just make sure PTS and DTS are equal at the beginnings of fragments. Or let’s say, while stitching bumper and main content, you’re seeing the bumper playout but the main content 404’s. We recommend using only version=1 composition times, and, again, simply letting your PTS equal your DTS.

Good luck removing epidermal layers from domesticated animals (figuratively speaking only, please). Till next time!

Share