blog/posts/package_my_video.md

9.3 KiB
Raw Blame History

Title: Package my video Date: 2024-06-20T13:19:03-04:00 Draft: 1

TODO

  • Some intro
  • Investigate key frames and seeking

The goal of this project is to get a minimum viable product of:

  • support H264 video and AAC audio
    • experiment with AV1 and Opus if time permits
  • produce a .mp4 file that is optimized for: raw progressive playback and segment generation
  • generate a HLS playlist on demand
  • generate the Nth segment from a HLS file on demand

Setup

ffmpeg

First I want to start with the latest available ffmpeg static build from: https://johnvansickle.com/ffmpeg/

 curl -sL -O https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz
 sudo tar -C /opt -xvf ffmpeg-release-amd64-static.tar.xz

My preference is to then link my preferred build to some location (/opt/ffmpeg-static) that I will then add to my PATH.

 sudo ln -sf /opt/ffmpeg-7.0.1-amd64-static/ /opt/ffmpeg-static
# then edit your shell rc or profile, reset shell
 type ffmpeg
ffmpeg is /opt/ffmpeg-static/ffmpeg

 ffmpeg -version
ffmpeg version 7.0.1-static https://johnvansickle.com/ffmpeg/  Copyright (c) 2000-2024 the FFmpeg developers
built with gcc 8 (Debian 8.3.0-6)
configuration: --enable-gpl --enable-version3 --enable-static --disable-debug --disable-ffplay --disable-indev=sndio --disable-outdev=sndio --cc=gcc --enable-fontconfig --enable-frei0r --enable-gnutls --enable-gmp --enable-libgme --enable-gray --enable-libaom --enable-libfribidi --enable-libass --enable-libvmaf --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librubberband --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libvorbis --enable-libopus --enable-libtheora --enable-libvidstab --enable-libvo-amrwbenc --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libdav1d --enable-libxvid --enable-libzvbi --enable-libzimg
libavutil      59.  8.100 / 59.  8.100
libavcodec     61.  3.100 / 61.  3.100
libavformat    61.  1.100 / 61.  1.100
libavdevice    61.  1.100 / 61.  1.100
libavfilter    10.  1.100 / 10.  1.100
libswscale      8.  1.100 /  8.  1.100
libswresample   5.  1.100 /  5.  1.100
libpostproc    58.  1.100 / 58.  1.100

And checking codec support

 ffmpeg -codecs 2>/dev/null | grep '\s\(aac\|h264\|av1\|opus\)'
 DEV.L. av1                  Alliance for Open Media AV1 (decoders: libdav1d libaom-av1 av1) (encoders: libaom-av1)
 DEV.LS h264                 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (decoders: h264 h264_v4l2m2m) (encoders: libx264 libx264rgb h264_v4l2m2m)
 DEAIL. aac                  AAC (Advanced Audio Coding) (decoders: aac aac_fixed)
 D.AIL. aac_latm             AAC LATM (Advanced Audio Coding LATM syntax)
 DEAIL. opus                 Opus (Opus Interactive Audio Codec) (decoders: opus libopus) (encoders: opus libopus)

Test video file

Here are a few and open test video sources:

I grabbed a 720p version of each

 du -h test-videos/*
398M    test-videos/big_buck_bunny_720p_h264.mov
650M    test-videos/Sintel.2010.720p.mkv

Deciding on eventual segment size

Target segment size will hold some influence over our progressive transcoding.

Each segment will begin with at least 1 key frame, so our progressive output key frame placement should line up with where our segments will be extracted.

Apple suggests 6 second durations for each HLS segment for VOD playback with HLS.

6s would be fine to use, but it's a choice with consequences.

If there was a desire to use a 3s segment instead, the progressive file would need to re-transcode to insert more key frames.

So for flexibility's sake will choose 3s for key frames in the progressive transcode, but eventual segments will be 6s.

Packaging the progressive file

But first, let's produce v1 of the files with the target codecs applied (H264 and AAC).

 ffmpeg -i ./test-videos/big_buck_bunny_720p_h264.mov -acodec 'aac' -vcodec 'h264' ./test-videos/bbb_h264_aac.mp4

 ffmpeg -i ./test-videos/Sintel.2010.720p.mkv -acodec 'aac' -vcodec 'h264' ./test-videos/sintel_h264_aac.mp4

 du -h ./test-videos/*
138M    ./test-videos/bbb_h264_aac.mp4
398M    ./test-videos/big_buck_bunny_720p_h264.mov
650M    ./test-videos/Sintel.2010.720p.mkv
201M    ./test-videos/sintel_h264_aac.mp4

Now lets inspect the frames a bit closer with this script dumpframes.sh

#!/usr/bin/env bash

ffprobe -select_streams v -show_frames -show_entries frame=pict_type -of csv $1

This should show what picture type each frame is.

Iframes are the least compressible but don't require other video frames to decode.

Pframes can use data from previous frames to decompress and are more compressible than Iframes.

Bframes can use both previous and forward frames for data reference to get the highest amount of data compression.

I frames are also called key frames.

So given a dump of Big Buck Bunny (BBB):

./dumpframes.sh test-videos/bbb_h264_aac.mp4 > bbb_frames.csv

BBB is 24fps, so every 3 seconds we want to see a key frame. Here are the frame numbers where the key frame should be.

 python3
>>> fps = 24
>>> i_frame_s = 3

>>> for i in range(0, 10): print(i * fps * i_frame_s + 1)
...
1
73
145
217
289
361
433
505
577
649
  • First segment should contain 3s of content which will be 72 frames.
  • First frame should be a key frame.
  • Then 71 non I frames.
  • Then the next I frame (frame 73) begins the next segment.

But our frames are not quite right.

 grep -n I ./bbb_frames.csv  | head
1:frame,I,H.26[45] User Data Unregistered SEI message
7:frame,I
251:frame,I
286:frame,I
379:frame,I
554:frame,I
804:frame,I
1054:frame,I
1146:frame,I
1347:frame,I

This is because we didn't tell ffmpeg anything about how to encode and where to place I frames.

There are a few options to libx264 that help control this:

  • --no-scenecut: "Disable adaptive I-frame decision"
  • --keyint: effectively the key frame interval. Technically it is the "Maximum GOP size"
  • --min-keyint: the "Minimum GOP size"

A GOP is "Group of Pictures" or the distance between two key frames.

So lets re-encode with those options. Actually lets write a wrapper script to do this.

I'll choose something besides bash because there will be a bit of math involved.

#!/usr/bin/env python
import sys
import json
import subprocess
import logging

logging.basicConfig(level=logging.DEBUG)

def probe_info(infname):
	cmd = f'ffprobe -v quiet -print_format json -show_format -show_streams {infname}'.split(' ')
	res = subprocess.run(cmd, check=False, capture_output=True)
	logging.info('running cmd %s', ' '.join(cmd))
	ffprobe_dict = json.loads(res.stdout)
	v_stream = None
	for stream in ffprobe_dict.get('streams'):
		if stream.get('codec_type') == 'video':
			v_stream = stream
			break

	r_frame_rate = v_stream.get('r_frame_rate')
	num, denom = r_frame_rate.split('/')
	fps = float(num) / float(denom)
	logging.info('got fps %s', fps)
	return {
		'fps': fps,
	}

def run_ffmpeg_transcode(infname, outfname, probeinfo, segment_length=3):
	# must be an integer
	keyint = int(probeinfo.get('fps') * segment_length)
	cmd = [
		'ffmpeg',
		'-i',
		infname,
		'-vcodec',
		'libx264',
		'-x264opts',
		f'keyint={keyint}:min-keyint={keyint}:no-scenecut',
		'-acodec',
		'aac',
		outfname
	]
	logging.info('running cmd %s', ' '.join(cmd))
	subprocess.run(cmd, check=True)

if __name__ == '__main__':
	args = sys.argv
	prog = args.pop(0)
	if len(args) != 2:
		sys.exit(1)

	infname, outfname = args
	probeinfo = probe_info(infname)
	run_ffmpeg_transcode(infname, outfname, probeinfo)
  • Use ffprobe to dump the streams as json.
  • Take first video stream.
  • Get the r_frame_rate which is a fraction. Eval the fraction as fps
  • Calculate the keyframe interval using a static 3s segment length.

And if we run it, lets take a look at the cmds it executes:

 ./progressive.py ./test-videos/big_buck_bunny_720p_h264.mov ./test-videos/bbb_h264_aac.mp4
INFO:root:running cmd ffprobe -v quiet -print_format json -show_format -show_streams ./test-videos/big_buck_bunny_720p_h264.mov
INFO:root:got fps 24.0
INFO:root:running cmd ffmpeg -i ./test-videos/big_buck_bunny_720p_h264.mov -vcodec libx264 -x264opts keyint=72:min-keyint=72:no-scenecut -acodec aac ./test-videos/bbb_h264_aac.mp4

Now regenerate the frame dump and check if our I frames match the expected: 1, 73, 145, 217 ...

 ./dumpframes.sh test-videos/bbb_h264_aac.mp4 > bbb_frames.csv
 grep -n I ./bbb_vimeo_frames.csv | head
1:frame,I,H.26[45] User Data Unregistered SEI message
73:frame,I
145:frame,I
217:frame,I
286:frame,I
289:frame,I
361:frame,I
379:frame,I
433:frame,I
505:frame,I

Excellent!