Title:  Package my video
Date:   2024-06-20T13:19:03-04:00
Draft:  1
---

## TODO

* Some intro
* Investigate key frames and seeking

--------------------------------------------------------------------------------

The goal of this project is to get a minimum viable product of:

* support H264 video and AAC audio
    * experiment with AV1 and Opus if time permits
* produce a .mp4 file that is optimized for: raw progressive playback and segment generation
* generate a HLS playlist on demand
* generate the Nth segment from a HLS file on demand

## Setup

### ffmpeg

First I want to start with the latest available ffmpeg static build from: <https://johnvansickle.com/ffmpeg/>

```shell
❯ curl -sL -O https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz
❯ sudo tar -C /opt -xvf ffmpeg-release-amd64-static.tar.xz
```

My preference is to then link my preferred build to some location (`/opt/ffmpeg-static`) that I will then add to my `PATH`.

```
❯ sudo ln -sf /opt/ffmpeg-7.0.1-amd64-static/ /opt/ffmpeg-static
# then edit your shell rc or profile, reset shell
❯ type ffmpeg
ffmpeg is /opt/ffmpeg-static/ffmpeg

❯ ffmpeg -version
ffmpeg version 7.0.1-static https://johnvansickle.com/ffmpeg/  Copyright (c) 2000-2024 the FFmpeg developers
built with gcc 8 (Debian 8.3.0-6)
configuration: --enable-gpl --enable-version3 --enable-static --disable-debug --disable-ffplay --disable-indev=sndio --disable-outdev=sndio --cc=gcc --enable-fontconfig --enable-frei0r --enable-gnutls --enable-gmp --enable-libgme --enable-gray --enable-libaom --enable-libfribidi --enable-libass --enable-libvmaf --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librubberband --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libvorbis --enable-libopus --enable-libtheora --enable-libvidstab --enable-libvo-amrwbenc --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libdav1d --enable-libxvid --enable-libzvbi --enable-libzimg
libavutil      59.  8.100 / 59.  8.100
libavcodec     61.  3.100 / 61.  3.100
libavformat    61.  1.100 / 61.  1.100
libavdevice    61.  1.100 / 61.  1.100
libavfilter    10.  1.100 / 10.  1.100
libswscale      8.  1.100 /  8.  1.100
libswresample   5.  1.100 /  5.  1.100
libpostproc    58.  1.100 / 58.  1.100
```

And checking codec support
```
❯ ffmpeg -codecs 2>/dev/null | grep '\s\(aac\|h264\|av1\|opus\)'
 DEV.L. av1                  Alliance for Open Media AV1 (decoders: libdav1d libaom-av1 av1) (encoders: libaom-av1)
 DEV.LS h264                 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (decoders: h264 h264_v4l2m2m) (encoders: libx264 libx264rgb h264_v4l2m2m)
 DEAIL. aac                  AAC (Advanced Audio Coding) (decoders: aac aac_fixed)
 D.AIL. aac_latm             AAC LATM (Advanced Audio Coding LATM syntax)
 DEAIL. opus                 Opus (Opus Interactive Audio Codec) (decoders: opus libopus) (encoders: opus libopus)
```

### Test video file

Here are a few and open test video sources:

* Sintel: <https://download.blender.org/durian/movies/>
    * License: [Creative Commons Attribution 3.0](https://web.archive.org/web/20240105060647/https://durian.blender.org/sharing/)
* Big Buck Bunny: <https://download.blender.org/peach/bigbuckbunny_movies/>
    * License: [Creative Commons Attribution 3.0](https://web.archive.org/web/20240521095028/https://peach.blender.org/about/)

I grabbed a 720p version of each

```shell
❯ du -h test-videos/*
398M    test-videos/big_buck_bunny_720p_h264.mov
650M    test-videos/Sintel.2010.720p.mkv
```

### Deciding on eventual segment size

Target segment size will hold some influence over our progressive transcoding.

Each segment will begin with at least 1 key frame, so our progressive output key frame placement should line up with where our segments will be extracted.

Apple [suggests 6 second durations for each HLS segment][apple_hls_seg] for VOD playback with HLS.

6s would be fine to use, but it's a choice with consequences.

If there was a desire to use a 3s segment instead, the progressive file would need to re-transcode to insert more key frames.

So for flexibility's sake will choose 3s for key frames in the progressive transcode, but eventual segments will be 6s.

### Packaging the progressive file

But first, let's produce v1 of the files with the target codecs applied (H264 and AAC).

```
❯ ffmpeg -i ./test-videos/big_buck_bunny_720p_h264.mov -acodec 'aac' -vcodec 'h264' ./test-videos/bbb_h264_aac.mp4

❯ ffmpeg -i ./test-videos/Sintel.2010.720p.mkv -acodec 'aac' -vcodec 'h264' ./test-videos/sintel_h264_aac.mp4

❯ du -h ./test-videos/*
138M    ./test-videos/bbb_h264_aac.mp4
398M    ./test-videos/big_buck_bunny_720p_h264.mov
650M    ./test-videos/Sintel.2010.720p.mkv
201M    ./test-videos/sintel_h264_aac.mp4
```

Now lets inspect the frames a bit closer with this script `dumpframes.sh`
```bash
#!/usr/bin/env bash

ffprobe -select_streams v -show_frames -show_entries frame=pict_type -of csv $1
```

This should show what [picture type][pic_types] each frame is.

> I‑frames are the least compressible but don't require other video frames to decode.

> P‑frames can use data from previous frames to decompress and are more compressible than I‑frames.

> B‑frames can use both previous and forward frames for data reference to get the highest amount of data compression.

**I** frames are also called **key frames**.

So given a dump of Big Buck Bunny (BBB):
```
./dumpframes.sh test-videos/bbb_h264_aac.mp4 > bbb_frames.csv
```

BBB is 24fps, so every 3 seconds we want to see a key frame. Here are the frame numbers where the key frame should be.
```
❯ python3
>>> fps = 24
>>> i_frame_s = 3

>>> for i in range(0, 10): print(i * fps * i_frame_s + 1)
...
1
73
145
217
289
361
433
505
577
649
```

* First segment should contain 3s of content which will be 72 frames.
* First frame should be a key frame.
* Then 71 non I frames.
* Then the next I frame (frame 73) begins the next segment.

But our frames are not quite right.

```shell
❯ grep -n I ./bbb_frames.csv  | head
1:frame,I,H.26[45] User Data Unregistered SEI message
7:frame,I
251:frame,I
286:frame,I
379:frame,I
554:frame,I
804:frame,I
1054:frame,I
1146:frame,I
1347:frame,I
```

This is because we didn't tell ffmpeg anything about how to encode and where to place I frames.

There are a few options to `libx264` that help control this:

* `--no-scenecut`: "Disable adaptive I-frame decision"
* `--keyint`: effectively the key frame interval. Technically it is the "Maximum GOP size"
* `--min-keyint`: the "Minimum GOP size"

> A GOP is "Group of Pictures" or the distance between two key frames.

So lets re-encode with those options. Actually lets write a wrapper script to do this.

I'll choose something besides bash because there will be a bit of math involved.

```python
#!/usr/bin/env python
import sys
import json
import subprocess
import logging

logging.basicConfig(level=logging.DEBUG)

def probe_info(infname):
	cmd = f'ffprobe -v quiet -print_format json -show_format -show_streams {infname}'.split(' ')
	res = subprocess.run(cmd, check=False, capture_output=True)
	logging.info('running cmd %s', ' '.join(cmd))
	ffprobe_dict = json.loads(res.stdout)
	v_stream = None
	for stream in ffprobe_dict.get('streams'):
		if stream.get('codec_type') == 'video':
			v_stream = stream
			break

	r_frame_rate = v_stream.get('r_frame_rate')
	num, denom = r_frame_rate.split('/')
	fps = float(num) / float(denom)
	logging.info('got fps %s', fps)
	return {
		'fps': fps,
	}

def run_ffmpeg_transcode(infname, outfname, probeinfo, segment_length=3):
	# must be an integer
	keyint = int(probeinfo.get('fps') * segment_length)
	cmd = [
		'ffmpeg',
		'-i',
		infname,
		'-vcodec',
		'libx264',
		'-x264opts',
		f'keyint={keyint}:min-keyint={keyint}:no-scenecut',
		'-acodec',
		'aac',
		outfname
	]
	logging.info('running cmd %s', ' '.join(cmd))
	subprocess.run(cmd, check=True)

if __name__ == '__main__':
	args = sys.argv
	prog = args.pop(0)
	if len(args) != 2:
		sys.exit(1)

	infname, outfname = args
	probeinfo = probe_info(infname)
	run_ffmpeg_transcode(infname, outfname, probeinfo)
```

* Use `ffprobe` to dump the streams as json.
* Take first video stream.
* Get the `r_frame_rate` which is a fraction. Eval the fraction as `fps`
* Calculate the keyframe interval using a static 3s segment length.

And if we run it, lets take a look at the cmds it executes:
```
❯ ./progressive.py ./test-videos/big_buck_bunny_720p_h264.mov ./test-videos/bbb_h264_aac.mp4
INFO:root:running cmd ffprobe -v quiet -print_format json -show_format -show_streams ./test-videos/big_buck_bunny_720p_h264.mov
INFO:root:got fps 24.0
INFO:root:running cmd ffmpeg -i ./test-videos/big_buck_bunny_720p_h264.mov -vcodec libx264 -x264opts keyint=72:min-keyint=72:no-scenecut -acodec aac ./test-videos/bbb_h264_aac.mp4
```

Now regenerate the frame dump and check if our I frames match the expected: 1, 73, 145, 217 ...

```shell
❯ ./dumpframes.sh test-videos/bbb_h264_aac.mp4 > bbb_frames.csv
❯ grep -n I ./bbb_vimeo_frames.csv | head
1:frame,I,H.26[45] User Data Unregistered SEI message
73:frame,I
145:frame,I
217:frame,I
286:frame,I
289:frame,I
361:frame,I
379:frame,I
433:frame,I
505:frame,I
```

Excellent!

[pic_types]: https://en.wikipedia.org/wiki/Video_compression_picture_types
[apple_hls_seg]: https://developer.apple.com/documentation/http-live-streaming/hls-authoring-specification-for-apple-devices#Media-Segmentation