blog/posts/package_my_video.md

25 KiB
Raw Permalink Blame History

Title: Package my video Date: 2024-06-20T13:19:03-04:00 Draft: 1

TODO

  • Some intro
  • Investigate key frames and seeking

The goal of this project is to get a minimum viable product of:

  • support H264 video and AAC audio
    • experiment with AV1 and Opus if time permits
  • produce a .mp4 file that is optimized for: raw progressive playback and segment generation
  • generate a HLS playlist on demand
  • generate the Nth segment from a HLS file on demand

Setup

ffmpeg

First I want to start with the latest available ffmpeg static build from: https://johnvansickle.com/ffmpeg/

 curl -sL -O https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz
 sudo tar -C /opt -xvf ffmpeg-release-amd64-static.tar.xz

My preference is to then link my preferred build to some location (/opt/ffmpeg-static) that I will then add to my PATH.

 sudo ln -sf /opt/ffmpeg-7.0.1-amd64-static/ /opt/ffmpeg-static
# then edit your shell rc or profile, reset shell
 type ffmpeg
ffmpeg is /opt/ffmpeg-static/ffmpeg

 ffmpeg -version
ffmpeg version 7.0.1-static https://johnvansickle.com/ffmpeg/  Copyright (c) 2000-2024 the FFmpeg developers
built with gcc 8 (Debian 8.3.0-6)
configuration: --enable-gpl --enable-version3 --enable-static --disable-debug --disable-ffplay --disable-indev=sndio --disable-outdev=sndio --cc=gcc --enable-fontconfig --enable-frei0r --enable-gnutls --enable-gmp --enable-libgme --enable-gray --enable-libaom --enable-libfribidi --enable-libass --enable-libvmaf --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librubberband --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libvorbis --enable-libopus --enable-libtheora --enable-libvidstab --enable-libvo-amrwbenc --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libdav1d --enable-libxvid --enable-libzvbi --enable-libzimg
libavutil      59.  8.100 / 59.  8.100
libavcodec     61.  3.100 / 61.  3.100
libavformat    61.  1.100 / 61.  1.100
libavdevice    61.  1.100 / 61.  1.100
libavfilter    10.  1.100 / 10.  1.100
libswscale      8.  1.100 /  8.  1.100
libswresample   5.  1.100 /  5.  1.100
libpostproc    58.  1.100 / 58.  1.100

And checking codec support

 ffmpeg -codecs 2>/dev/null | grep '\s\(aac\|h264\|av1\|opus\)'
 DEV.L. av1                  Alliance for Open Media AV1 (decoders: libdav1d libaom-av1 av1) (encoders: libaom-av1)
 DEV.LS h264                 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (decoders: h264 h264_v4l2m2m) (encoders: libx264 libx264rgb h264_v4l2m2m)
 DEAIL. aac                  AAC (Advanced Audio Coding) (decoders: aac aac_fixed)
 D.AIL. aac_latm             AAC LATM (Advanced Audio Coding LATM syntax)
 DEAIL. opus                 Opus (Opus Interactive Audio Codec) (decoders: opus libopus) (encoders: opus libopus)

Test video file

Here are a few and open test video sources:

I grabbed a 720p version of each

 du -h test-videos/*
398M    test-videos/big_buck_bunny_720p_h264.mov
650M    test-videos/Sintel.2010.720p.mkv

Deciding on eventual segment size

Target segment size will hold some influence over our progressive transcoding.

Each segment will begin with at least 1 key frame, so our progressive output key frame placement should line up with where our segments will be extracted.

Apple suggests 6 second durations for each HLS segment for VOD playback with HLS.

6s would be fine to use, but it's a choice with consequences.

If there was a desire to use a 3s segment instead, the progressive file would need to re-transcode to insert more key frames.

So for flexibility's sake will choose 3s for key frames in the progressive transcode, but eventual segments will be 6s.

Packaging the progressive file

But first, let's produce v1 of the files with the target codecs applied (H264 and AAC).

 ffmpeg -i ./test-videos/big_buck_bunny_720p_h264.mov -acodec 'aac' -vcodec 'h264' ./test-videos/bbb_h264_aac.mp4

 ffmpeg -i ./test-videos/Sintel.2010.720p.mkv -acodec 'aac' -vcodec 'h264' ./test-videos/sintel_h264_aac.mp4

 du -h ./test-videos/*
138M    ./test-videos/bbb_h264_aac.mp4
398M    ./test-videos/big_buck_bunny_720p_h264.mov
650M    ./test-videos/Sintel.2010.720p.mkv
201M    ./test-videos/sintel_h264_aac.mp4

Now lets inspect the frames a bit closer with this script dumpframes.sh

#!/usr/bin/env bash

ffprobe -select_streams v -show_frames -show_entries frame=pict_type -of csv $1

This should show what picture type each frame is.

Iframes are the least compressible but don't require other video frames to decode.

Pframes can use data from previous frames to decompress and are more compressible than Iframes.

Bframes can use both previous and forward frames for data reference to get the highest amount of data compression.

I frames are also called key frames.

So given a dump of Big Buck Bunny (BBB):

./dumpframes.sh test-videos/bbb_h264_aac.mp4 > bbb_frames.csv

BBB is 24fps, so every 3 seconds we want to see a key frame. Here are the frame numbers where the key frame should be.

 python3
>>> fps = 24
>>> i_frame_s = 3

>>> for i in range(0, 10): print(i * fps * i_frame_s + 1)
...
1
73
145
217
289
361
433
505
577
649
  • First segment should contain 3s of content which will be 72 frames.
  • First frame should be a key frame.
  • Then 71 non I frames.
  • Then the next I frame (frame 73) begins the next segment.

But our frames are not quite right.

 grep -n I ./bbb_frames.csv  | head
1:frame,I,H.26[45] User Data Unregistered SEI message
7:frame,I
251:frame,I
286:frame,I
379:frame,I
554:frame,I
804:frame,I
1054:frame,I
1146:frame,I
1347:frame,I

This is because we didn't tell ffmpeg anything about how to encode and where to place I frames.

There are a few options to libx264 that help control this:

  • --no-scenecut: "Disable adaptive I-frame decision"
  • --keyint: effectively the key frame interval. Technically it is the "Maximum GOP size"
  • --min-keyint: the "Minimum GOP size"

A GOP is "Group of Pictures" or the distance between two key frames.

So lets re-encode with those options. Actually lets write a wrapper script to do this.

I'll choose something besides bash because there will be a bit of math involved.

#!/usr/bin/env python
import sys
import json
import subprocess
import logging

logging.basicConfig(level=logging.DEBUG)

def probe_info(infname):
	cmd = f'ffprobe -v quiet -print_format json -show_format -show_streams {infname}'.split(' ')
	res = subprocess.run(cmd, check=False, capture_output=True)
	logging.info('running cmd %s', ' '.join(cmd))
	ffprobe_dict = json.loads(res.stdout)
	v_stream = None
	for stream in ffprobe_dict.get('streams'):
		if stream.get('codec_type') == 'video':
			v_stream = stream
			break

	r_frame_rate = v_stream.get('r_frame_rate')
	num, denom = r_frame_rate.split('/')
	fps = float(num) / float(denom)
	logging.info('got fps %s', fps)
	return {
		'fps': fps,
	}

def run_ffmpeg_transcode(infname, outfname, probeinfo, segment_length=3):
	# must be an integer
	keyint = int(probeinfo.get('fps') * segment_length)
	cmd = [
		'ffmpeg',
		'-i',
		infname,
        # only keep the first video stream and first audio stream
        '-map',
		'0:v:0',
		'-map',
		'0:a:0',
		'-vcodec',
		'libx264',
		'-x264opts',
		f'keyint={keyint}:min-keyint={keyint}:no-scenecut',
		'-acodec',
		'aac',
		outfname
	]
	logging.info('running cmd %s', ' '.join(cmd))
	subprocess.run(cmd, check=True)

if __name__ == '__main__':
	args = sys.argv
	prog = args.pop(0)
	if len(args) != 2:
		sys.exit(1)

	infname, outfname = args
	probeinfo = probe_info(infname)
	run_ffmpeg_transcode(infname, outfname, probeinfo)
  • Use ffprobe to dump the streams as json.
  • Take first video stream.
  • Get the r_frame_rate which is a fraction. Eval the fraction as fps
  • Calculate the keyframe interval using a static 3s segment length.

And if we run it, lets take a look at the cmds it executes:

 ./progressive.py ./test-videos/big_buck_bunny_720p_h264.mov ./test-videos/bbb_h264_aac.mp4
INFO:root:running cmd ffprobe -v quiet -print_format json -show_format -show_streams ./test-videos/big_buck_bunny_720p_h264.mov
INFO:root:got fps 24.0
INFO:root:running cmd ffmpeg -i ./test-videos/big_buck_bunny_720p_h264.mov -map 0:v:0 -map 0:a:0 -vcodec libx264 -x264opts keyint=72:min-keyint=72:no-scenecut -movflags faststart -acodec aac ./test-videos/bbb_h264_aac.mp4

Now regenerate the frame dump and check if our I frames match the expected: 1, 73, 145, 217 ...

 ./dumpframes.sh test-videos/bbb_h264_aac.mp4 > bbb_frames.csv
 grep -n I ./bbb_vimeo_frames.csv | head
1:frame,I,H.26[45] User Data Unregistered SEI message
73:frame,I
145:frame,I
217:frame,I
286:frame,I
289:frame,I
361:frame,I
379:frame,I
433:frame,I
505:frame,I

Excellent!

Let's check where the mp4 "atoms" are located in the resulting file.

 ffprobe -v trace ./test-videos/bbb_h264_aac.mp4 2>&1 | grep 'type:.\(ftyp\|free\|mdat\|moov\)'
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x90d7a80] type:'ftyp' parent:'root' sz: 32 8 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x90d7a80] type:'free' parent:'root' sz: 8 40 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x90d7a80] type:'mdat' parent:'root' sz: 157264899 48 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x90d7a80] type:'moov' parent:'root' sz: 412246 157264947 157677185

So the moov atom is at the end of the file by default.

Save this version of the transcode if you want to test how this works in the browser.

To optimize for faster startup, there is a faststart option available which moves the moov atom to the head of the file.

So adjusting the progressive script

diff --git a/progressive.py b/progressive.py
index 0ba58b7..a3dc63a 100755
--- a/progressive.py
+++ b/progressive.py
@@ -36,6 +36,8 @@ def run_ffmpeg_transcode(infname, outfname, probeinfo, segment_length=3):
                'libx264',
                '-x264opts',
                f'keyint={keyint}:min-keyint={keyint}:no-scenecut',
+               '-movflags',
+               'faststart',
                '-acodec',
                'aac',

And after the re-transcode:

 ffprobe -v trace ./test-videos/bbb_h264_aac.mp4 2>&1 | grep 'type:.\(ftyp\|free\|mdat\|moov\)'
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x34423a80] type:'ftyp' parent:'root' sz: 32 8 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x34423a80] type:'moov' parent:'root' sz: 412246 40 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x34423a80] type:'free' parent:'root' sz: 8 412286 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x34423a80] type:'mdat' parent:'root' sz: 157264899 412294 157677185

It worked!

Lets prove out why this is great for browser playback.

faststart testing

caddy has a nice quick built in file server with verbose access logs.

Drop this index.html into the same directory as your test videos.

 caddy file-server --access-log --browse --listen :2015 --root ./test-videos

Will stash that in a Makefile helper:

.PHONY: filesrv
filesrv: filesrv
    caddy file-server --access-log --browse --listen :2015 --root ./test-videos

I kept my version of the mp4 prior to adding the faststart option, so I have two files:

 ffprobe -v trace ./test-videos/bbb_h264_aac.mp4 2>&1 | grep 'type:.\(ftyp\|free\|mdat\|moov\)'
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x15b64a80] type:'ftyp' parent:'root' sz: 32 8 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x15b64a80] type:'moov' parent:'root' sz: 412246 40 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x15b64a80] type:'free' parent:'root' sz: 8 412286 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x15b64a80] type:'mdat' parent:'root' sz: 157264899 412294 157677185

 ffprobe -v trace ./test-videos/bbb_h264_aac_endmov.mp4 2>&1 | grep 'type:.\(ftyp\|free\|mdat\|moov\)'
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x89b2a80] type:'ftyp' parent:'root' sz: 32 8 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x89b2a80] type:'free' parent:'root' sz: 8 40 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x89b2a80] type:'mdat' parent:'root' sz: 157264899 48 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x89b2a80] type:'moov' parent:'root' sz: 412246 157264947 157677185

Now plugging in http://localhost:2015/bbb_h264_aac_endmov.mp4 to the form:

In firefiox there are 3 requests made:

# 1 req
GET /bbb_h264_aac_endmov.mp4 HTTP/1.1
Host: localhost:2015
Accept: video/webm,video/ogg,video/*;q=0.9,application/ogg;q=0.7,audio/*;q=0.6,*/*;q=0.5
Range: bytes=0-
# 1 resp
HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Content-Length: 157677185
Content-Range: bytes 0-157677184/157677185
Content-Type: video/mp4
Etag: "sfecu12lvkht"
Content-Type: video/mp4

Note the amt transfered in first request is actually only 1.57 MB as reported in devtools.

# 2 req
GET /bbb_h264_aac_endmov.mp4 HTTP/1.1
Host: localhost:2015
Accept: video/webm,video/ogg,video/*;q=0.9,application/ogg;q=0.7,audio/*;q=0.6,*/*;q=0.5
Range: bytes=157253632-
# 2 resp
HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Content-Length: 423553
Content-Range: bytes 157253632-157677184/157677185
Content-Type: video/mp4

157677184 is the last byte -1, so it is reading the last 423.83 kB of the file.

# 3 req
GET /bbb_h264_aac_endmov.mp4 HTTP/1.1
Host: localhost:2015
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:127.0) Gecko/20100101 Firefox/127.0
Accept: video/webm,video/ogg,video/*;q=0.9,application/ogg;q=0.7,audio/*;q=0.6,*/*;q=0.5
Accept-Language: en-US,en;q=0.5
Range: bytes=131072-
# 3 resp
HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Content-Length: 157546113
Content-Range: bytes 131072-157677184/157677185
Content-Type: video/mp4

Lastly, start reading at byte 131072 to the end of the file.

A rough guess about how this works.

Take a look at annotated byte sizes to the ffprobe -v trace from above as they match up with the range requests:

# the format of the numbers is: {size} {start_byte} {total_size}
# 1 req     type:'ftyp' parent:'root' sz: 32 8 157677185
# 1 req     type:'free' parent:'root' sz: 8 40 157677185
# 1+3 req   ype:'mdat' parent:'root' sz: 157264899 48 157677185
# 2 req     type:'moov' parent:'root' sz: 412246 157264947 157677185
  • # 1 req fetches the first 1.57MB in a 206 partial content read from the head of the file.
    • Looking for a moov atom for file information so it can start playing.
    • This example video moov is 412 kB, so it's reading about 3x that and into the mdat section where the video data lives.
  • # 2 req fetches the last 423.83 kB from the end of the file.
    • It hits the moov
  • # 3 req fetches whole file starting at 131.072 kB from the beginning of file.

Pretty cool, you can see it hunting for the moov then starting playback.

In contrast, here's the faststart option: http://localhost:2015/bbb_h264_aac.mp4

# 1 req
GET /bbb_h264_aac.mp4 HTTP/1.1
Host: localhost:2015
Accept: video/webm,video/ogg,video/*;q=0.9,application/ogg;q=0.7,audio/*;q=0.6,*/*;q=0.5
Accept-Language: en-US,en;q=0.5
Range: bytes=0-
# 1 resp
HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Content-Length: 157677185
Content-Range: bytes 0-157677184/157677185
Content-Type: video/mp4

Same exact start to the flow - just read whole file with Range: bytes=0-.

But this time firefox transfers ~7-9 MB (it changes per test), and there's only 1 request.

Best guess here is that firefox is still trying to read 1.5MB, but it encounters the moov immediately and just keeps reading.

With the progressive file in a good place it's now time to turn to segmenting. And in the browser we need Media Source Extensions for this.

MediaSource

RFC 6381 codecs and MediaSource.isTypeSupported

One of the first weird hurdles is checking if our particular codecs are supported:

MediaSource.isTypeSupported('video/mp4; codecs="avc1.64001f, mp4a.40.2"');

This string is in the format specified by RFC 6381.

Strangely there is no easy way to get this information from ffprobe. For reference, here is a 2017 feature request to add this: https://web.archive.org/web/20240406102137/https://trac.ffmpeg.org/ticket/6617

As noted by a comment in the ticket, there is actually support in the codebase for writing the string in what looks like the hls segmenter.

Instead of trying to hack something up there, an alternative is to use MP4Box from https://github.com/gpac/gpac

 MP4Box -info ./test-videos/bbb_h264_aac.mp4 2>&1 | grep 'RFC6381' | awk -F':\\s*' '{print $2}'
avc1.64001F
mp4a.40.2

Then just build the string: video/mp4; codecs="{0}, {1}" from that output.

Now in the browser lets check:

> MediaSource.isTypeSupported('video/mp4; codecs="avc1.64001F, mp4a.40.2"');
true

How does MediaSource work? What is actually playable?

MediaSource is all about appending bytes to buffers that match the expected codecs.

When you append a buffer of bytes into a MediaSource buffer, it must be a valid Byte Stream Format: https://www.w3.org/TR/media-source-2/#byte-stream-formats

Here are the types of valid stream formats: https://www.w3.org/TR/mse-byte-stream-format-registry/#registry

MP4 byte stream

The first segment should be an "initialization segment":

An ISO BMFF initialization segment is defined in this specification as a single File Type Box (ftyp) followed by a single Movie Box (moov).

Then the actual media:

An ISO BMFF media segment is defined in this specification as one optional Segment Type Box (styp) followed by a single Movie Fragment Box (moof) followed by one or more Media Data Boxes (mdat). If the Segment Type Box is not present, the segment MUST conform to the brands listed in the File Type Box (ftyp) in the initialization segment.

Our progressive file at the moment does not conform to this spec. The file layout we have at the moment is:

[mov,mp4,m4a,3gp,3g2,mj2 @ 0x34423a80] type:'ftyp' parent:'root' sz: 32 8 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x34423a80] type:'moov' parent:'root' sz: 412246 40 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x34423a80] type:'free' parent:'root' sz: 8 412286 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x34423a80] type:'mdat' parent:'root' sz: 157264899 412294 157677185

The next step would be to mux into the required format such that each segment will contain a moof box then mdat box otherwise known as "fragmented mp4".

MPEG-2 Transport Stream

Skipping this for now because fragemented MP4 is valid media segment format according to the HLS spec.

Getting bytes in a MSE buffer with fragmented mp4

Firstly, lets produce a smaller file to work with for this example.

 ffmpeg -t 13s -i bbb_h264_aac.mp4 -c copy -f mp4 ./bbb_h264_aac_13s.mp4

13 seconds should be 2 6s segments + 1 partial segment so should be good for testing.

Next lets fragment with a helper script makefragmented.sh

#!/usr/bin/env bash
ffmpeg -i $1 -c copy -movflags 'frag_keyframe+empty_moov+default_base_moof' -f mp4 $2
  • frag_keyframe should only create fragments on our key frames that were already established every 3s.
  • empty_moov if not included there will be some mdat data included in the moov box which is not a valid segment according to the spec.
  • default_base_moof this is recommended by MDN for Chrome. With or without this option the root level mp4 boxes look the same. The ffmpeg docs say:

this flag avoids writing the absolute base_data_offset field in tfhd atoms, but does so by using the new default-base-is-moof flag instead. This flag is new from 14496-12:2012. This may make the fragments easier to parse in certain circumstances (avoiding basing track fragment location calculations on the implicit end of the previous track fragment).

So now make the fragemented mp4

./makefragmented.sh ./test-videos/bbb_h264_aac_13s.mp4 ./test-videos/bbb_h264_aac_13s_frag.mp4

Now lets look at the boxes again. Again using a handy script: ffprobe-trace-boxes.sh

#!/usr/bin/env bash
echo "box_type,box_parent,offset,size"
ffprobe -v trace $1 2>&1 | grep 'type:.*parent:.*sz:' | sed "s/^.*type://; s/'//g" | awk '{print $1 "," $2 "," $5 "," $4}'

And just the top level boxes:

 ./ffprobe-trace-boxes.sh ./test-videos/bbb_h264_aac_13s_frag.mp4 | grep parent:root
ftyp,parent:root,8,28
moov,parent:root,36,1282
moof,parent:root,1318,1860
mdat,parent:root,3178,461608
moof,parent:root,464786,1320
mdat,parent:root,466106,771527
moof,parent:root,1237633,1316
mdat,parent:root,1238949,1052753
moof,parent:root,2291702,1320
mdat,parent:root,2293022,1382411
moof,parent:root,3675433,796
mdat,parent:root,3676229,602490
mfra,parent:root,4278719,262

So that looks like it hits the ISO BMFF stream spec

  • ftyp + moov make up the init sequence
  • moof + mdat make up each segment/fragment
  • mfra not sure about this yet, but ignoring for now.

And it looks like it matches roughly our expected 13s duration / 3s key frame = 4.3 segments which means there should be 5 total moof boxes.

This will be our first ever manifest format dubbed a "jank csv manifest"

The goal now is fetching each of these byte ranges and adding them to a MediaSource buffer.

I have created a new mse.html file and will explain the important points in comments.

NOTE: this is not proper MSE buffer state handling. It is the MVP of just shoving bytes into the buffer.

// The RFC 6381 codec string of the video
const mimeCodec = 'video/mp4; codecs="avc1.64001F, mp4a.40.2"';

// This is all boilerplate buffer setup
let sourceBuffer = null;
let mediaSource = null;
if ("MediaSource" in window && MediaSource.isTypeSupported(mimeCodec)) {
  mediaSource = new MediaSource();
  video.src = URL.createObjectURL(mediaSource);
  mediaSource.addEventListener("sourceopen", () => {
      sourceBuffer = mediaSource.addSourceBuffer(mimeCodec);
  });
} else {
  console.error("Unsupported MIME type or codec: ", mimeCodec);
}

// parse a line from the .jank file -> Range: bytes=X-Y string
function jankToByteRange(jank_csv) {
  parts = jank_csv.split(',')
  beg = parseInt(parts[2], 10);
  sz = parseInt(parts[3], 10);
  return "bytes=" + beg + '-' + (beg + sz - 1)
}

// fetch a byte range from a url
function fetchAB(url, jank_csv, cb) {
  const xhr = new XMLHttpRequest();
  byte_range = jankToByteRange(jank_csv);
  console.log(url, byte_range, jank_csv);
  xhr.open("get", url);
  xhr.setRequestHeader("Range", byte_range);
  xhr.responseType = "arraybuffer";
  xhr.onload = () => {
    cb(xhr.response);
  };
  xhr.send();
}

// on form submit:
// * grab the video url and csv data.
// * fetch the first entry from the jank csv.
// * when the buffer finishes updating itself, fetch the next line until no more lines exist.
form.addEventListener('submit', (e) => {
  e.preventDefault();
  let data = new FormData(e.target, e.submitter);
  let vid_url = data.get('vid_url');
  let jank_csv = data.get('jank_csv');

  lines = jank_csv.split("\n")
  first = lines.shift()

  fetchAB(vid_url, first, (buf) => {
    sourceBuffer.appendBuffer(buf);
  });

  sourceBuffer.addEventListener("updateend", () => {
    if (lines.length === 0) {
      console.log('end of lines', mediaSource.readyState); // ended
      return;
    }
    next = lines.shift()
    if (!next) {
      return;
    }
    fetchAB(vid_url, next, (buf) => {
      sourceBuffer.appendBuffer(buf);
    });
  });
});

Now startup caddy again (make filesrv)

And use this for the form inputs:

ftyp+moov,parent:root,0,1318
moof,parent:root,1318,1860
mdat,parent:root,3178,461608
moof,parent:root,464786,1320
mdat,parent:root,466106,771527
moof,parent:root,1237633,1316
mdat,parent:root,1238949,1052753
moof,parent:root,2291702,1320
mdat,parent:root,2293022,1382411
moof,parent:root,3675433,796
mdat,parent:root,3676229,602490
mfra,parent:root,4278719,262

Submit and press play.

You should be able to play the first ~13 of the video!

first play!