455 lines
16 KiB
Markdown
455 lines
16 KiB
Markdown
Title: Package my video
|
||
Date: 2024-06-20T13:19:03-04:00
|
||
Draft: 1
|
||
---
|
||
|
||
## TODO
|
||
|
||
* Some intro
|
||
* Investigate key frames and seeking
|
||
|
||
--------------------------------------------------------------------------------
|
||
|
||
The goal of this project is to get a minimum viable product of:
|
||
|
||
* support H264 video and AAC audio
|
||
* experiment with AV1 and Opus if time permits
|
||
* produce a .mp4 file that is optimized for: raw progressive playback and segment generation
|
||
* generate a HLS playlist on demand
|
||
* generate the Nth segment from a HLS file on demand
|
||
|
||
## Setup
|
||
|
||
### ffmpeg
|
||
|
||
First I want to start with the latest available ffmpeg static build from: <https://johnvansickle.com/ffmpeg/>
|
||
|
||
```shell
|
||
❯ curl -sL -O https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz
|
||
❯ sudo tar -C /opt -xvf ffmpeg-release-amd64-static.tar.xz
|
||
```
|
||
|
||
My preference is to then link my preferred build to some location (`/opt/ffmpeg-static`) that I will then add to my `PATH`.
|
||
|
||
```
|
||
❯ sudo ln -sf /opt/ffmpeg-7.0.1-amd64-static/ /opt/ffmpeg-static
|
||
# then edit your shell rc or profile, reset shell
|
||
❯ type ffmpeg
|
||
ffmpeg is /opt/ffmpeg-static/ffmpeg
|
||
|
||
❯ ffmpeg -version
|
||
ffmpeg version 7.0.1-static https://johnvansickle.com/ffmpeg/ Copyright (c) 2000-2024 the FFmpeg developers
|
||
built with gcc 8 (Debian 8.3.0-6)
|
||
configuration: --enable-gpl --enable-version3 --enable-static --disable-debug --disable-ffplay --disable-indev=sndio --disable-outdev=sndio --cc=gcc --enable-fontconfig --enable-frei0r --enable-gnutls --enable-gmp --enable-libgme --enable-gray --enable-libaom --enable-libfribidi --enable-libass --enable-libvmaf --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librubberband --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libvorbis --enable-libopus --enable-libtheora --enable-libvidstab --enable-libvo-amrwbenc --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libdav1d --enable-libxvid --enable-libzvbi --enable-libzimg
|
||
libavutil 59. 8.100 / 59. 8.100
|
||
libavcodec 61. 3.100 / 61. 3.100
|
||
libavformat 61. 1.100 / 61. 1.100
|
||
libavdevice 61. 1.100 / 61. 1.100
|
||
libavfilter 10. 1.100 / 10. 1.100
|
||
libswscale 8. 1.100 / 8. 1.100
|
||
libswresample 5. 1.100 / 5. 1.100
|
||
libpostproc 58. 1.100 / 58. 1.100
|
||
```
|
||
|
||
And checking codec support
|
||
```
|
||
❯ ffmpeg -codecs 2>/dev/null | grep '\s\(aac\|h264\|av1\|opus\)'
|
||
DEV.L. av1 Alliance for Open Media AV1 (decoders: libdav1d libaom-av1 av1) (encoders: libaom-av1)
|
||
DEV.LS h264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (decoders: h264 h264_v4l2m2m) (encoders: libx264 libx264rgb h264_v4l2m2m)
|
||
DEAIL. aac AAC (Advanced Audio Coding) (decoders: aac aac_fixed)
|
||
D.AIL. aac_latm AAC LATM (Advanced Audio Coding LATM syntax)
|
||
DEAIL. opus Opus (Opus Interactive Audio Codec) (decoders: opus libopus) (encoders: opus libopus)
|
||
```
|
||
|
||
### Test video file
|
||
|
||
Here are a few and open test video sources:
|
||
|
||
* Sintel: <https://download.blender.org/durian/movies/>
|
||
* License: [Creative Commons Attribution 3.0](https://web.archive.org/web/20240105060647/https://durian.blender.org/sharing/)
|
||
* Big Buck Bunny: <https://download.blender.org/peach/bigbuckbunny_movies/>
|
||
* License: [Creative Commons Attribution 3.0](https://web.archive.org/web/20240521095028/https://peach.blender.org/about/)
|
||
|
||
I grabbed a 720p version of each
|
||
|
||
```shell
|
||
❯ du -h test-videos/*
|
||
398M test-videos/big_buck_bunny_720p_h264.mov
|
||
650M test-videos/Sintel.2010.720p.mkv
|
||
```
|
||
|
||
### Deciding on eventual segment size
|
||
|
||
Target segment size will hold some influence over our progressive transcoding.
|
||
|
||
Each segment will begin with at least 1 key frame, so our progressive output key frame placement should line up with where our segments will be extracted.
|
||
|
||
Apple [suggests 6 second durations for each HLS segment][apple_hls_seg] for VOD playback with HLS.
|
||
|
||
6s would be fine to use, but it's a choice with consequences.
|
||
|
||
If there was a desire to use a 3s segment instead, the progressive file would need to re-transcode to insert more key frames.
|
||
|
||
So for flexibility's sake will choose 3s for key frames in the progressive transcode, but eventual segments will be 6s.
|
||
|
||
### Packaging the progressive file
|
||
|
||
But first, let's produce v1 of the files with the target codecs applied (H264 and AAC).
|
||
|
||
```
|
||
❯ ffmpeg -i ./test-videos/big_buck_bunny_720p_h264.mov -acodec 'aac' -vcodec 'h264' ./test-videos/bbb_h264_aac.mp4
|
||
|
||
❯ ffmpeg -i ./test-videos/Sintel.2010.720p.mkv -acodec 'aac' -vcodec 'h264' ./test-videos/sintel_h264_aac.mp4
|
||
|
||
❯ du -h ./test-videos/*
|
||
138M ./test-videos/bbb_h264_aac.mp4
|
||
398M ./test-videos/big_buck_bunny_720p_h264.mov
|
||
650M ./test-videos/Sintel.2010.720p.mkv
|
||
201M ./test-videos/sintel_h264_aac.mp4
|
||
```
|
||
|
||
Now lets inspect the frames a bit closer with this script `dumpframes.sh`
|
||
```bash
|
||
#!/usr/bin/env bash
|
||
|
||
ffprobe -select_streams v -show_frames -show_entries frame=pict_type -of csv $1
|
||
```
|
||
|
||
This should show what [picture type][pic_types] each frame is.
|
||
|
||
> I‑frames are the least compressible but don't require other video frames to decode.
|
||
|
||
> P‑frames can use data from previous frames to decompress and are more compressible than I‑frames.
|
||
|
||
> B‑frames can use both previous and forward frames for data reference to get the highest amount of data compression.
|
||
|
||
**I** frames are also called **key frames**.
|
||
|
||
So given a dump of Big Buck Bunny (BBB):
|
||
```
|
||
./dumpframes.sh test-videos/bbb_h264_aac.mp4 > bbb_frames.csv
|
||
```
|
||
|
||
BBB is 24fps, so every 3 seconds we want to see a key frame. Here are the frame numbers where the key frame should be.
|
||
```
|
||
❯ python3
|
||
>>> fps = 24
|
||
>>> i_frame_s = 3
|
||
|
||
>>> for i in range(0, 10): print(i * fps * i_frame_s + 1)
|
||
...
|
||
1
|
||
73
|
||
145
|
||
217
|
||
289
|
||
361
|
||
433
|
||
505
|
||
577
|
||
649
|
||
```
|
||
|
||
* First segment should contain 3s of content which will be 72 frames.
|
||
* First frame should be a key frame.
|
||
* Then 71 non I frames.
|
||
* Then the next I frame (frame 73) begins the next segment.
|
||
|
||
But our frames are not quite right.
|
||
|
||
```shell
|
||
❯ grep -n I ./bbb_frames.csv | head
|
||
1:frame,I,H.26[45] User Data Unregistered SEI message
|
||
7:frame,I
|
||
251:frame,I
|
||
286:frame,I
|
||
379:frame,I
|
||
554:frame,I
|
||
804:frame,I
|
||
1054:frame,I
|
||
1146:frame,I
|
||
1347:frame,I
|
||
```
|
||
|
||
This is because we didn't tell ffmpeg anything about how to encode and where to place I frames.
|
||
|
||
There are a few options to `libx264` that help control this:
|
||
|
||
* `--no-scenecut`: "Disable adaptive I-frame decision"
|
||
* `--keyint`: effectively the key frame interval. Technically it is the "Maximum GOP size"
|
||
* `--min-keyint`: the "Minimum GOP size"
|
||
|
||
> A GOP is "Group of Pictures" or the distance between two key frames.
|
||
|
||
So lets re-encode with those options. Actually lets write a wrapper script to do this.
|
||
|
||
I'll choose something besides bash because there will be a bit of math involved.
|
||
|
||
```python
|
||
#!/usr/bin/env python
|
||
import sys
|
||
import json
|
||
import subprocess
|
||
import logging
|
||
|
||
logging.basicConfig(level=logging.DEBUG)
|
||
|
||
def probe_info(infname):
|
||
cmd = f'ffprobe -v quiet -print_format json -show_format -show_streams {infname}'.split(' ')
|
||
res = subprocess.run(cmd, check=False, capture_output=True)
|
||
logging.info('running cmd %s', ' '.join(cmd))
|
||
ffprobe_dict = json.loads(res.stdout)
|
||
v_stream = None
|
||
for stream in ffprobe_dict.get('streams'):
|
||
if stream.get('codec_type') == 'video':
|
||
v_stream = stream
|
||
break
|
||
|
||
r_frame_rate = v_stream.get('r_frame_rate')
|
||
num, denom = r_frame_rate.split('/')
|
||
fps = float(num) / float(denom)
|
||
logging.info('got fps %s', fps)
|
||
return {
|
||
'fps': fps,
|
||
}
|
||
|
||
def run_ffmpeg_transcode(infname, outfname, probeinfo, segment_length=3):
|
||
# must be an integer
|
||
keyint = int(probeinfo.get('fps') * segment_length)
|
||
cmd = [
|
||
'ffmpeg',
|
||
'-i',
|
||
infname,
|
||
'-vcodec',
|
||
'libx264',
|
||
'-x264opts',
|
||
f'keyint={keyint}:min-keyint={keyint}:no-scenecut',
|
||
'-acodec',
|
||
'aac',
|
||
outfname
|
||
]
|
||
logging.info('running cmd %s', ' '.join(cmd))
|
||
subprocess.run(cmd, check=True)
|
||
|
||
if __name__ == '__main__':
|
||
args = sys.argv
|
||
prog = args.pop(0)
|
||
if len(args) != 2:
|
||
sys.exit(1)
|
||
|
||
infname, outfname = args
|
||
probeinfo = probe_info(infname)
|
||
run_ffmpeg_transcode(infname, outfname, probeinfo)
|
||
```
|
||
|
||
* Use `ffprobe` to dump the streams as json.
|
||
* Take first video stream.
|
||
* Get the `r_frame_rate` which is a fraction. Eval the fraction as `fps`
|
||
* Calculate the keyframe interval using a static 3s segment length.
|
||
|
||
And if we run it, lets take a look at the cmds it executes:
|
||
```
|
||
❯ ./progressive.py ./test-videos/big_buck_bunny_720p_h264.mov ./test-videos/bbb_h264_aac.mp4
|
||
INFO:root:running cmd ffprobe -v quiet -print_format json -show_format -show_streams ./test-videos/big_buck_bunny_720p_h264.mov
|
||
INFO:root:got fps 24.0
|
||
INFO:root:running cmd ffmpeg -i ./test-videos/big_buck_bunny_720p_h264.mov -vcodec libx264 -x264opts keyint=72:min-keyint=72:no-scenecut -acodec aac ./test-videos/bbb_h264_aac.mp4
|
||
```
|
||
|
||
Now regenerate the frame dump and check if our I frames match the expected: 1, 73, 145, 217 ...
|
||
|
||
```shell
|
||
❯ ./dumpframes.sh test-videos/bbb_h264_aac.mp4 > bbb_frames.csv
|
||
❯ grep -n I ./bbb_vimeo_frames.csv | head
|
||
1:frame,I,H.26[45] User Data Unregistered SEI message
|
||
73:frame,I
|
||
145:frame,I
|
||
217:frame,I
|
||
286:frame,I
|
||
289:frame,I
|
||
361:frame,I
|
||
379:frame,I
|
||
433:frame,I
|
||
505:frame,I
|
||
```
|
||
|
||
Excellent!
|
||
|
||
Let's check where the mp4 "atoms" are located in the resulting file.
|
||
|
||
```
|
||
❯ ffprobe -v trace ./test-videos/bbb_h264_aac.mp4 2>&1 | grep 'type:.\(ftyp\|free\|mdat\|moov\)'
|
||
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x90d7a80] type:'ftyp' parent:'root' sz: 32 8 157677185
|
||
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x90d7a80] type:'free' parent:'root' sz: 8 40 157677185
|
||
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x90d7a80] type:'mdat' parent:'root' sz: 157264899 48 157677185
|
||
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x90d7a80] type:'moov' parent:'root' sz: 412246 157264947 157677185
|
||
```
|
||
|
||
So the `moov` atom is at the end of the file by default.
|
||
|
||
Save this version of the transcode if you want to test how this works in the browser.
|
||
|
||
To optimize for faster startup, there is a `faststart` option available which moves the `moov` atom to the head of the file.
|
||
|
||
So adjusting the progressive script
|
||
|
||
```diff
|
||
diff --git a/progressive.py b/progressive.py
|
||
index 0ba58b7..a3dc63a 100755
|
||
--- a/progressive.py
|
||
+++ b/progressive.py
|
||
@@ -36,6 +36,8 @@ def run_ffmpeg_transcode(infname, outfname, probeinfo, segment_length=3):
|
||
'libx264',
|
||
'-x264opts',
|
||
f'keyint={keyint}:min-keyint={keyint}:no-scenecut',
|
||
+ '-movflags',
|
||
+ 'faststart',
|
||
'-acodec',
|
||
'aac',
|
||
```
|
||
|
||
And after the re-transcode:
|
||
|
||
```
|
||
❯ ffprobe -v trace ./test-videos/bbb_h264_aac.mp4 2>&1 | grep 'type:.\(ftyp\|free\|mdat\|moov\)'
|
||
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x34423a80] type:'ftyp' parent:'root' sz: 32 8 157677185
|
||
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x34423a80] type:'moov' parent:'root' sz: 412246 40 157677185
|
||
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x34423a80] type:'free' parent:'root' sz: 8 412286 157677185
|
||
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x34423a80] type:'mdat' parent:'root' sz: 157264899 412294 157677185
|
||
```
|
||
|
||
It worked!
|
||
|
||
Lets prove out why this is great for browser playback.
|
||
|
||
### faststart testing
|
||
|
||
[`caddy`][caddy_files] has a nice quick built in file server with verbose access logs.
|
||
|
||
Drop [this `index.html`][test_index] into the same directory as your test videos.
|
||
|
||
```
|
||
❯ caddy file-server --access-log --browse --listen :2015 --root ./test-videos
|
||
```
|
||
|
||
I kept my version of the mp4 prior to adding the `faststart` option, so I have two files:
|
||
|
||
```
|
||
❯ ffprobe -v trace ./test-videos/bbb_h264_aac.mp4 2>&1 | grep 'type:.\(ftyp\|free\|mdat\|moov\)'
|
||
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x15b64a80] type:'ftyp' parent:'root' sz: 32 8 157677185
|
||
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x15b64a80] type:'moov' parent:'root' sz: 412246 40 157677185
|
||
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x15b64a80] type:'free' parent:'root' sz: 8 412286 157677185
|
||
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x15b64a80] type:'mdat' parent:'root' sz: 157264899 412294 157677185
|
||
|
||
❯ ffprobe -v trace ./test-videos/bbb_h264_aac_endmov.mp4 2>&1 | grep 'type:.\(ftyp\|free\|mdat\|moov\)'
|
||
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x89b2a80] type:'ftyp' parent:'root' sz: 32 8 157677185
|
||
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x89b2a80] type:'free' parent:'root' sz: 8 40 157677185
|
||
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x89b2a80] type:'mdat' parent:'root' sz: 157264899 48 157677185
|
||
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x89b2a80] type:'moov' parent:'root' sz: 412246 157264947 157677185
|
||
```
|
||
|
||
Now plugging in <http://localhost:2015/bbb_h264_aac_endmov.mp4> to the form:
|
||
|
||
In firefiox there are 3 requests made:
|
||
|
||
```
|
||
# 1 req
|
||
GET /bbb_h264_aac_endmov.mp4 HTTP/1.1
|
||
Host: localhost:2015
|
||
Accept: video/webm,video/ogg,video/*;q=0.9,application/ogg;q=0.7,audio/*;q=0.6,*/*;q=0.5
|
||
Range: bytes=0-
|
||
# 1 resp
|
||
HTTP/1.1 206 Partial Content
|
||
Accept-Ranges: bytes
|
||
Content-Length: 157677185
|
||
Content-Range: bytes 0-157677184/157677185
|
||
Content-Type: video/mp4
|
||
Etag: "sfecu12lvkht"
|
||
Content-Type: video/mp4
|
||
```
|
||
|
||
Note the amt transfered in first request is actually only 1.57 MB as reported in devtools.
|
||
|
||
```
|
||
# 2 req
|
||
GET /bbb_h264_aac_endmov.mp4 HTTP/1.1
|
||
Host: localhost:2015
|
||
Accept: video/webm,video/ogg,video/*;q=0.9,application/ogg;q=0.7,audio/*;q=0.6,*/*;q=0.5
|
||
Range: bytes=157253632-
|
||
# 2 resp
|
||
HTTP/1.1 206 Partial Content
|
||
Accept-Ranges: bytes
|
||
Content-Length: 423553
|
||
Content-Range: bytes 157253632-157677184/157677185
|
||
Content-Type: video/mp4
|
||
```
|
||
|
||
157677184 is the last byte -1, so it is reading the last 423.83 kB of the file.
|
||
|
||
```
|
||
# 3 req
|
||
GET /bbb_h264_aac_endmov.mp4 HTTP/1.1
|
||
Host: localhost:2015
|
||
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:127.0) Gecko/20100101 Firefox/127.0
|
||
Accept: video/webm,video/ogg,video/*;q=0.9,application/ogg;q=0.7,audio/*;q=0.6,*/*;q=0.5
|
||
Accept-Language: en-US,en;q=0.5
|
||
Range: bytes=131072-
|
||
# 3 resp
|
||
HTTP/1.1 206 Partial Content
|
||
Accept-Ranges: bytes
|
||
Content-Length: 157546113
|
||
Content-Range: bytes 131072-157677184/157677185
|
||
Content-Type: video/mp4
|
||
```
|
||
|
||
Lastly, start reading at byte 131072 to the end of the file.
|
||
|
||
A rough guess about how this works.
|
||
|
||
Take a look at annotated byte sizes to the `ffprobe -v trace` from above as they match up with the range requests:
|
||
|
||
```
|
||
# the format of the numbers is: {size} {start_byte} {total_size}
|
||
# 1 req type:'ftyp' parent:'root' sz: 32 8 157677185
|
||
# 1 req type:'free' parent:'root' sz: 8 40 157677185
|
||
# 1+3 req ype:'mdat' parent:'root' sz: 157264899 48 157677185
|
||
# 2 req type:'moov' parent:'root' sz: 412246 157264947 157677185
|
||
```
|
||
|
||
* `# 1 req` fetches the first 1.57MB in a 206 partial content read from the head of the file.
|
||
* Looking for a `moov` atom for file information so it can start playing.
|
||
* This example video `moov` is 412 kB, so it's reading about 3x that and into the `mdat` section where the video data lives.
|
||
* `# 2 req` fetches the last 423.83 kB from the end of the file.
|
||
* It hits the `moov`
|
||
* `# 3 req` fetches whole file starting at 131.072 kB from the beginning of file.
|
||
|
||
Pretty cool, you can see it hunting for the `moov` then starting playback.
|
||
|
||
In contrast, here's the `faststart` option: <http://localhost:2015/bbb_h264_aac.mp4>
|
||
|
||
```
|
||
# 1 req
|
||
GET /bbb_h264_aac.mp4 HTTP/1.1
|
||
Host: localhost:2015
|
||
Accept: video/webm,video/ogg,video/*;q=0.9,application/ogg;q=0.7,audio/*;q=0.6,*/*;q=0.5
|
||
Accept-Language: en-US,en;q=0.5
|
||
Range: bytes=0-
|
||
# 1 resp
|
||
HTTP/1.1 206 Partial Content
|
||
Accept-Ranges: bytes
|
||
Content-Length: 157677185
|
||
Content-Range: bytes 0-157677184/157677185
|
||
Content-Type: video/mp4
|
||
```
|
||
|
||
Same exact start to the flow - just read whole file with `Range: bytes=0-`.
|
||
|
||
But this time firefox transfers ~7-9 MB (it changes per test), and there's only 1 request.
|
||
|
||
Best guess here is that firefox is still trying to read 1.5MB, but it encounters the `moov` immediately and just keeps reading.
|
||
|
||
That's the first time I've seen this in action.
|
||
|
||
[pic_types]: https://en.wikipedia.org/wiki/Video_compression_picture_types
|
||
[apple_hls_seg]: https://developer.apple.com/documentation/http-live-streaming/hls-authoring-specification-for-apple-devices#Media-Segmentation
|
||
[caddy_files]: https://caddyserver.com/docs/quick-starts/static-files#command-line
|
||
[test_index]: https://git.sr.ht/~cfebs/vidpkg/tree/main/item/test-videos/index.html
|