16 KiB
Title: Package my video Date: 2024-06-20T13:19:03-04:00 Draft: 1
TODO
- Some intro
- Investigate key frames and seeking
The goal of this project is to get a minimum viable product of:
- support H264 video and AAC audio
- experiment with AV1 and Opus if time permits
- produce a .mp4 file that is optimized for: raw progressive playback and segment generation
- generate a HLS playlist on demand
- generate the Nth segment from a HLS file on demand
Setup
ffmpeg
First I want to start with the latest available ffmpeg static build from: https://johnvansickle.com/ffmpeg/
❯ curl -sL -O https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz
❯ sudo tar -C /opt -xvf ffmpeg-release-amd64-static.tar.xz
My preference is to then link my preferred build to some location (/opt/ffmpeg-static
) that I will then add to my PATH
.
❯ sudo ln -sf /opt/ffmpeg-7.0.1-amd64-static/ /opt/ffmpeg-static
# then edit your shell rc or profile, reset shell
❯ type ffmpeg
ffmpeg is /opt/ffmpeg-static/ffmpeg
❯ ffmpeg -version
ffmpeg version 7.0.1-static https://johnvansickle.com/ffmpeg/ Copyright (c) 2000-2024 the FFmpeg developers
built with gcc 8 (Debian 8.3.0-6)
configuration: --enable-gpl --enable-version3 --enable-static --disable-debug --disable-ffplay --disable-indev=sndio --disable-outdev=sndio --cc=gcc --enable-fontconfig --enable-frei0r --enable-gnutls --enable-gmp --enable-libgme --enable-gray --enable-libaom --enable-libfribidi --enable-libass --enable-libvmaf --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librubberband --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libvorbis --enable-libopus --enable-libtheora --enable-libvidstab --enable-libvo-amrwbenc --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libdav1d --enable-libxvid --enable-libzvbi --enable-libzimg
libavutil 59. 8.100 / 59. 8.100
libavcodec 61. 3.100 / 61. 3.100
libavformat 61. 1.100 / 61. 1.100
libavdevice 61. 1.100 / 61. 1.100
libavfilter 10. 1.100 / 10. 1.100
libswscale 8. 1.100 / 8. 1.100
libswresample 5. 1.100 / 5. 1.100
libpostproc 58. 1.100 / 58. 1.100
And checking codec support
❯ ffmpeg -codecs 2>/dev/null | grep '\s\(aac\|h264\|av1\|opus\)'
DEV.L. av1 Alliance for Open Media AV1 (decoders: libdav1d libaom-av1 av1) (encoders: libaom-av1)
DEV.LS h264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (decoders: h264 h264_v4l2m2m) (encoders: libx264 libx264rgb h264_v4l2m2m)
DEAIL. aac AAC (Advanced Audio Coding) (decoders: aac aac_fixed)
D.AIL. aac_latm AAC LATM (Advanced Audio Coding LATM syntax)
DEAIL. opus Opus (Opus Interactive Audio Codec) (decoders: opus libopus) (encoders: opus libopus)
Test video file
Here are a few and open test video sources:
- Sintel: https://download.blender.org/durian/movies/
- License: Creative Commons Attribution 3.0
- Big Buck Bunny: https://download.blender.org/peach/bigbuckbunny_movies/
- License: Creative Commons Attribution 3.0
I grabbed a 720p version of each
❯ du -h test-videos/*
398M test-videos/big_buck_bunny_720p_h264.mov
650M test-videos/Sintel.2010.720p.mkv
Deciding on eventual segment size
Target segment size will hold some influence over our progressive transcoding.
Each segment will begin with at least 1 key frame, so our progressive output key frame placement should line up with where our segments will be extracted.
Apple suggests 6 second durations for each HLS segment for VOD playback with HLS.
6s would be fine to use, but it's a choice with consequences.
If there was a desire to use a 3s segment instead, the progressive file would need to re-transcode to insert more key frames.
So for flexibility's sake will choose 3s for key frames in the progressive transcode, but eventual segments will be 6s.
Packaging the progressive file
But first, let's produce v1 of the files with the target codecs applied (H264 and AAC).
❯ ffmpeg -i ./test-videos/big_buck_bunny_720p_h264.mov -acodec 'aac' -vcodec 'h264' ./test-videos/bbb_h264_aac.mp4
❯ ffmpeg -i ./test-videos/Sintel.2010.720p.mkv -acodec 'aac' -vcodec 'h264' ./test-videos/sintel_h264_aac.mp4
❯ du -h ./test-videos/*
138M ./test-videos/bbb_h264_aac.mp4
398M ./test-videos/big_buck_bunny_720p_h264.mov
650M ./test-videos/Sintel.2010.720p.mkv
201M ./test-videos/sintel_h264_aac.mp4
Now lets inspect the frames a bit closer with this script dumpframes.sh
#!/usr/bin/env bash
ffprobe -select_streams v -show_frames -show_entries frame=pict_type -of csv $1
This should show what picture type each frame is.
I‑frames are the least compressible but don't require other video frames to decode.
P‑frames can use data from previous frames to decompress and are more compressible than I‑frames.
B‑frames can use both previous and forward frames for data reference to get the highest amount of data compression.
I frames are also called key frames.
So given a dump of Big Buck Bunny (BBB):
./dumpframes.sh test-videos/bbb_h264_aac.mp4 > bbb_frames.csv
BBB is 24fps, so every 3 seconds we want to see a key frame. Here are the frame numbers where the key frame should be.
❯ python3
>>> fps = 24
>>> i_frame_s = 3
>>> for i in range(0, 10): print(i * fps * i_frame_s + 1)
...
1
73
145
217
289
361
433
505
577
649
- First segment should contain 3s of content which will be 72 frames.
- First frame should be a key frame.
- Then 71 non I frames.
- Then the next I frame (frame 73) begins the next segment.
But our frames are not quite right.
❯ grep -n I ./bbb_frames.csv | head
1:frame,I,H.26[45] User Data Unregistered SEI message
7:frame,I
251:frame,I
286:frame,I
379:frame,I
554:frame,I
804:frame,I
1054:frame,I
1146:frame,I
1347:frame,I
This is because we didn't tell ffmpeg anything about how to encode and where to place I frames.
There are a few options to libx264
that help control this:
--no-scenecut
: "Disable adaptive I-frame decision"--keyint
: effectively the key frame interval. Technically it is the "Maximum GOP size"--min-keyint
: the "Minimum GOP size"
A GOP is "Group of Pictures" or the distance between two key frames.
So lets re-encode with those options. Actually lets write a wrapper script to do this.
I'll choose something besides bash because there will be a bit of math involved.
#!/usr/bin/env python
import sys
import json
import subprocess
import logging
logging.basicConfig(level=logging.DEBUG)
def probe_info(infname):
cmd = f'ffprobe -v quiet -print_format json -show_format -show_streams {infname}'.split(' ')
res = subprocess.run(cmd, check=False, capture_output=True)
logging.info('running cmd %s', ' '.join(cmd))
ffprobe_dict = json.loads(res.stdout)
v_stream = None
for stream in ffprobe_dict.get('streams'):
if stream.get('codec_type') == 'video':
v_stream = stream
break
r_frame_rate = v_stream.get('r_frame_rate')
num, denom = r_frame_rate.split('/')
fps = float(num) / float(denom)
logging.info('got fps %s', fps)
return {
'fps': fps,
}
def run_ffmpeg_transcode(infname, outfname, probeinfo, segment_length=3):
# must be an integer
keyint = int(probeinfo.get('fps') * segment_length)
cmd = [
'ffmpeg',
'-i',
infname,
'-vcodec',
'libx264',
'-x264opts',
f'keyint={keyint}:min-keyint={keyint}:no-scenecut',
'-acodec',
'aac',
outfname
]
logging.info('running cmd %s', ' '.join(cmd))
subprocess.run(cmd, check=True)
if __name__ == '__main__':
args = sys.argv
prog = args.pop(0)
if len(args) != 2:
sys.exit(1)
infname, outfname = args
probeinfo = probe_info(infname)
run_ffmpeg_transcode(infname, outfname, probeinfo)
- Use
ffprobe
to dump the streams as json. - Take first video stream.
- Get the
r_frame_rate
which is a fraction. Eval the fraction asfps
- Calculate the keyframe interval using a static 3s segment length.
And if we run it, lets take a look at the cmds it executes:
❯ ./progressive.py ./test-videos/big_buck_bunny_720p_h264.mov ./test-videos/bbb_h264_aac.mp4
INFO:root:running cmd ffprobe -v quiet -print_format json -show_format -show_streams ./test-videos/big_buck_bunny_720p_h264.mov
INFO:root:got fps 24.0
INFO:root:running cmd ffmpeg -i ./test-videos/big_buck_bunny_720p_h264.mov -vcodec libx264 -x264opts keyint=72:min-keyint=72:no-scenecut -acodec aac ./test-videos/bbb_h264_aac.mp4
Now regenerate the frame dump and check if our I frames match the expected: 1, 73, 145, 217 ...
❯ ./dumpframes.sh test-videos/bbb_h264_aac.mp4 > bbb_frames.csv
❯ grep -n I ./bbb_vimeo_frames.csv | head
1:frame,I,H.26[45] User Data Unregistered SEI message
73:frame,I
145:frame,I
217:frame,I
286:frame,I
289:frame,I
361:frame,I
379:frame,I
433:frame,I
505:frame,I
Excellent!
Let's check where the mp4 "atoms" are located in the resulting file.
❯ ffprobe -v trace ./test-videos/bbb_h264_aac.mp4 2>&1 | grep 'type:.\(ftyp\|free\|mdat\|moov\)'
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x90d7a80] type:'ftyp' parent:'root' sz: 32 8 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x90d7a80] type:'free' parent:'root' sz: 8 40 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x90d7a80] type:'mdat' parent:'root' sz: 157264899 48 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x90d7a80] type:'moov' parent:'root' sz: 412246 157264947 157677185
So the moov
atom is at the end of the file by default.
Save this version of the transcode if you want to test how this works in the browser.
To optimize for faster startup, there is a faststart
option available which moves the moov
atom to the head of the file.
So adjusting the progressive script
diff --git a/progressive.py b/progressive.py
index 0ba58b7..a3dc63a 100755
--- a/progressive.py
+++ b/progressive.py
@@ -36,6 +36,8 @@ def run_ffmpeg_transcode(infname, outfname, probeinfo, segment_length=3):
'libx264',
'-x264opts',
f'keyint={keyint}:min-keyint={keyint}:no-scenecut',
+ '-movflags',
+ 'faststart',
'-acodec',
'aac',
And after the re-transcode:
❯ ffprobe -v trace ./test-videos/bbb_h264_aac.mp4 2>&1 | grep 'type:.\(ftyp\|free\|mdat\|moov\)'
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x34423a80] type:'ftyp' parent:'root' sz: 32 8 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x34423a80] type:'moov' parent:'root' sz: 412246 40 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x34423a80] type:'free' parent:'root' sz: 8 412286 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x34423a80] type:'mdat' parent:'root' sz: 157264899 412294 157677185
It worked!
Lets prove out why this is great for browser playback.
faststart testing
caddy
has a nice quick built in file server with verbose access logs.
Drop this index.html
into the same directory as your test videos.
❯ caddy file-server --access-log --browse --listen :2015 --root ./test-videos
I kept my version of the mp4 prior to adding the faststart
option, so I have two files:
❯ ffprobe -v trace ./test-videos/bbb_h264_aac.mp4 2>&1 | grep 'type:.\(ftyp\|free\|mdat\|moov\)'
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x15b64a80] type:'ftyp' parent:'root' sz: 32 8 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x15b64a80] type:'moov' parent:'root' sz: 412246 40 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x15b64a80] type:'free' parent:'root' sz: 8 412286 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x15b64a80] type:'mdat' parent:'root' sz: 157264899 412294 157677185
❯ ffprobe -v trace ./test-videos/bbb_h264_aac_endmov.mp4 2>&1 | grep 'type:.\(ftyp\|free\|mdat\|moov\)'
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x89b2a80] type:'ftyp' parent:'root' sz: 32 8 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x89b2a80] type:'free' parent:'root' sz: 8 40 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x89b2a80] type:'mdat' parent:'root' sz: 157264899 48 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x89b2a80] type:'moov' parent:'root' sz: 412246 157264947 157677185
Now plugging in http://localhost:2015/bbb_h264_aac_endmov.mp4 to the form:
In firefiox there are 3 requests made:
# 1 req
GET /bbb_h264_aac_endmov.mp4 HTTP/1.1
Host: localhost:2015
Accept: video/webm,video/ogg,video/*;q=0.9,application/ogg;q=0.7,audio/*;q=0.6,*/*;q=0.5
Range: bytes=0-
# 1 resp
HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Content-Length: 157677185
Content-Range: bytes 0-157677184/157677185
Content-Type: video/mp4
Etag: "sfecu12lvkht"
Content-Type: video/mp4
Note the amt transfered in first request is actually only 1.57 MB as reported in devtools.
# 2 req
GET /bbb_h264_aac_endmov.mp4 HTTP/1.1
Host: localhost:2015
Accept: video/webm,video/ogg,video/*;q=0.9,application/ogg;q=0.7,audio/*;q=0.6,*/*;q=0.5
Range: bytes=157253632-
# 2 resp
HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Content-Length: 423553
Content-Range: bytes 157253632-157677184/157677185
Content-Type: video/mp4
157677184 is the last byte -1, so it is reading the last 423.83 kB of the file.
# 3 req
GET /bbb_h264_aac_endmov.mp4 HTTP/1.1
Host: localhost:2015
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:127.0) Gecko/20100101 Firefox/127.0
Accept: video/webm,video/ogg,video/*;q=0.9,application/ogg;q=0.7,audio/*;q=0.6,*/*;q=0.5
Accept-Language: en-US,en;q=0.5
Range: bytes=131072-
# 3 resp
HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Content-Length: 157546113
Content-Range: bytes 131072-157677184/157677185
Content-Type: video/mp4
Lastly, start reading at byte 131072 to the end of the file.
A rough guess about how this works.
Take a look at annotated byte sizes to the ffprobe -v trace
from above as they match up with the range requests:
# the format of the numbers is: {size} {start_byte} {total_size}
# 1 req type:'ftyp' parent:'root' sz: 32 8 157677185
# 1 req type:'free' parent:'root' sz: 8 40 157677185
# 1+3 req ype:'mdat' parent:'root' sz: 157264899 48 157677185
# 2 req type:'moov' parent:'root' sz: 412246 157264947 157677185
# 1 req
fetches the first 1.57MB in a 206 partial content read from the head of the file.- Looking for a
moov
atom for file information so it can start playing. - This example video
moov
is 412 kB, so it's reading about 3x that and into themdat
section where the video data lives.
- Looking for a
# 2 req
fetches the last 423.83 kB from the end of the file.- It hits the
moov
- It hits the
# 3 req
fetches whole file starting at 131.072 kB from the beginning of file.
Pretty cool, you can see it hunting for the moov
then starting playback.
In contrast, here's the faststart
option: http://localhost:2015/bbb_h264_aac.mp4
# 1 req
GET /bbb_h264_aac.mp4 HTTP/1.1
Host: localhost:2015
Accept: video/webm,video/ogg,video/*;q=0.9,application/ogg;q=0.7,audio/*;q=0.6,*/*;q=0.5
Accept-Language: en-US,en;q=0.5
Range: bytes=0-
# 1 resp
HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Content-Length: 157677185
Content-Range: bytes 0-157677184/157677185
Content-Type: video/mp4
Same exact start to the flow - just read whole file with Range: bytes=0-
.
But this time firefox transfers ~7-9 MB (it changes per test), and there's only 1 request.
Best guess here is that firefox is still trying to read 1.5MB, but it encounters the moov
immediately and just keeps reading.
That's the first time I've seen this in action.