package_my_video: working!

This commit is contained in:
Collin Lefeber 2024-06-23 13:12:19 -04:00
parent 1b0d204379
commit a9d07ba798
2 changed files with 256 additions and 2 deletions

BIN
img/mse_first_play.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 673 KiB

View file

@ -220,6 +220,11 @@ def run_ffmpeg_transcode(infname, outfname, probeinfo, segment_length=3):
'ffmpeg',
'-i',
infname,
# only keep the first video stream and first audio stream
'-map',
'0:v:0',
'-map',
'0:a:0',
'-vcodec',
'libx264',
'-x264opts',
@ -252,7 +257,7 @@ And if we run it, lets take a look at the cmds it executes:
./progressive.py ./test-videos/big_buck_bunny_720p_h264.mov ./test-videos/bbb_h264_aac.mp4
INFO:root:running cmd ffprobe -v quiet -print_format json -show_format -show_streams ./test-videos/big_buck_bunny_720p_h264.mov
INFO:root:got fps 24.0
INFO:root:running cmd ffmpeg -i ./test-videos/big_buck_bunny_720p_h264.mov -vcodec libx264 -x264opts keyint=72:min-keyint=72:no-scenecut -acodec aac ./test-videos/bbb_h264_aac.mp4
INFO:root:running cmd ffmpeg -i ./test-videos/big_buck_bunny_720p_h264.mov -map 0:v:0 -map 0:a:0 -vcodec libx264 -x264opts keyint=72:min-keyint=72:no-scenecut -movflags faststart -acodec aac ./test-videos/bbb_h264_aac.mp4
```
Now regenerate the frame dump and check if our I frames match the expected: 1, 73, 145, 217 ...
@ -331,6 +336,14 @@ Drop [this `index.html`][test_index] into the same directory as your test videos
caddy file-server --access-log --browse --listen :2015 --root ./test-videos
```
Will stash that in a `Makefile` helper:
```
.PHONY: filesrv
filesrv: filesrv
caddy file-server --access-log --browse --listen :2015 --root ./test-videos
```
I kept my version of the mp4 prior to adding the `faststart` option, so I have two files:
```
@ -447,9 +460,250 @@ But this time firefox transfers ~7-9 MB (it changes per test), and there's only
Best guess here is that firefox is still trying to read 1.5MB, but it encounters the `moov` immediately and just keeps reading.
That's the first time I've seen this in action.
With the progressive file in a good place it's now time to turn to segmenting. And in the browser we need `Media Source Extensions` for this.
## MediaSource
### RFC 6381 codecs and `MediaSource.isTypeSupported`
One of the first weird hurdles is checking if our particular codecs are supported:
```js
MediaSource.isTypeSupported('video/mp4; codecs="avc1.64001f, mp4a.40.2"');
```
This string is in the format specified by [RFC 6381](https://datatracker.ietf.org/doc/html/rfc6381).
Strangely there is no easy way to get this information from `ffprobe`. For reference, here is a 2017 feature request to add this: <https://web.archive.org/web/20240406102137/https://trac.ffmpeg.org/ticket/6617>
As noted by a comment in the ticket, there is actually support [in the codebase for writing the string][ffmpeg_write_codec_attr] in what looks like the hls segmenter.
Instead of trying to hack something up there, an alternative is to use `MP4Box` from <https://github.com/gpac/gpac>
```shell
MP4Box -info ./test-videos/bbb_h264_aac.mp4 2>&1 | grep 'RFC6381' | awk -F':\\s*' '{print $2}'
avc1.64001F
mp4a.40.2
```
Then just build the string: `video/mp4; codecs="{0}, {1}"` from that output.
Now in the browser lets check:
```js
> MediaSource.isTypeSupported('video/mp4; codecs="avc1.64001F, mp4a.40.2"');
true
```
### How does MediaSource work? What is actually playable?
MediaSource is all about appending bytes to buffers that match the expected codecs.
When you append a buffer of bytes into a MediaSource buffer, it must be a valid Byte Stream Format: <https://www.w3.org/TR/media-source-2/#byte-stream-formats>
Here are the types of valid stream formats: <https://www.w3.org/TR/mse-byte-stream-format-registry/#registry>
* ISOBMFF: <https://www.w3.org/TR/mse-byte-stream-format-isobmff/>
* MPEG-2 Transport Stream: <https://www.w3.org/TR/mse-byte-stream-format-mp2t/>
### MP4 byte stream
The first segment should be an "initialization segment":
> An ISO BMFF initialization segment is defined in this specification as a single File Type Box (ftyp) followed by a single Movie Box (moov).
Then the actual media:
> An ISO BMFF media segment is defined in this specification as one optional Segment Type Box (styp) followed by a single Movie Fragment Box (moof) followed by one or more Media Data Boxes (mdat). If the Segment Type Box is not present, the segment MUST conform to the brands listed in the File Type Box (ftyp) in the initialization segment.
Our progressive file at the moment does not conform to this spec. The file layout we have at the moment is:
```
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x34423a80] type:'ftyp' parent:'root' sz: 32 8 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x34423a80] type:'moov' parent:'root' sz: 412246 40 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x34423a80] type:'free' parent:'root' sz: 8 412286 157677185
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x34423a80] type:'mdat' parent:'root' sz: 157264899 412294 157677185
```
The next step would be to mux into the required format such that each segment will contain a `moof` box then `mdat` box otherwise known as "fragmented mp4".
### MPEG-2 Transport Stream
Skipping this for now because [fragemented MP4 is valid media segment format according to the HLS spec][hls_spec_frag_mp4].
### Getting bytes in a MSE buffer with fragmented mp4
Firstly, lets produce a smaller file to work with for this example.
```
ffmpeg -t 13s -i bbb_h264_aac.mp4 -c copy -f mp4 ./bbb_h264_aac_13s.mp4
```
13 seconds should be 2 6s segments + 1 partial segment so should be good for testing.
Next lets fragment with a helper script `makefragmented.sh`
```
#!/usr/bin/env bash
ffmpeg -i $1 -c copy -movflags 'frag_keyframe+empty_moov+default_base_moof' -f mp4 $2
```
* `frag_keyframe` should only create fragments on our key frames that were already established every 3s.
* `empty_moov` if not included there will be some `mdat` data included in the `moov` box which is not a valid segment according to the spec.
* `default_base_moof` this is recommended by MDN for Chrome. With or without this option the root level mp4 boxes look the same. The ffmpeg docs say:
> this flag avoids writing the absolute base_data_offset field in tfhd atoms, but does so by using the new default-base-is-moof flag instead. This flag is new from 14496-12:2012. This may make the fragments easier to parse in certain circumstances (avoiding basing track fragment location calculations on the implicit end of the previous track fragment).
So now make the fragemented mp4
```shell
./makefragmented.sh ./test-videos/bbb_h264_aac_13s.mp4 ./test-videos/bbb_h264_aac_13s_frag.mp4
```
Now lets look at the boxes again. Again using a handy script: `ffprobe-trace-boxes.sh`
```
#!/usr/bin/env bash
echo "box_type,box_parent,offset,size"
ffprobe -v trace $1 2>&1 | grep 'type:.*parent:.*sz:' | sed "s/^.*type://; s/'//g" | awk '{print $1 "," $2 "," $5 "," $4}'
```
And just the top level boxes:
```
./ffprobe-trace-boxes.sh ./test-videos/bbb_h264_aac_13s_frag.mp4 | grep parent:root
ftyp,parent:root,8,28
moov,parent:root,36,1282
moof,parent:root,1318,1860
mdat,parent:root,3178,461608
moof,parent:root,464786,1320
mdat,parent:root,466106,771527
moof,parent:root,1237633,1316
mdat,parent:root,1238949,1052753
moof,parent:root,2291702,1320
mdat,parent:root,2293022,1382411
moof,parent:root,3675433,796
mdat,parent:root,3676229,602490
mfra,parent:root,4278719,262
```
So that looks like it hits the ISO BMFF stream spec
* `ftyp` + `moov` make up the init sequence
* `moof` + `mdat` make up each segment/fragment
* `mfra` not sure about this yet, but ignoring for now.
And it looks like it matches roughly our expected `13s duration / 3s key frame = 4.3` segments which means there should be 5 total `moof` boxes.
This will be our first ever manifest format dubbed a "jank csv manifest"
The goal now is fetching each of these byte ranges and adding them to a MediaSource buffer.
I have created a new [`mse.html`][mse.html] file and will explain the important points in comments.
NOTE: this is _not_ proper MSE buffer state handling. It is the MVP of just shoving bytes into the buffer.
```js
// The RFC 6381 codec string of the video
const mimeCodec = 'video/mp4; codecs="avc1.64001F, mp4a.40.2"';
// This is all boilerplate buffer setup
let sourceBuffer = null;
let mediaSource = null;
if ("MediaSource" in window && MediaSource.isTypeSupported(mimeCodec)) {
mediaSource = new MediaSource();
video.src = URL.createObjectURL(mediaSource);
mediaSource.addEventListener("sourceopen", () => {
sourceBuffer = mediaSource.addSourceBuffer(mimeCodec);
});
} else {
console.error("Unsupported MIME type or codec: ", mimeCodec);
}
// parse a line from the .jank file -> Range: bytes=X-Y string
function jankToByteRange(jank_csv) {
parts = jank_csv.split(',')
beg = parseInt(parts[2], 10);
sz = parseInt(parts[3], 10);
return "bytes=" + beg + '-' + (beg + sz - 1)
}
// fetch a byte range from a url
function fetchAB(url, jank_csv, cb) {
const xhr = new XMLHttpRequest();
byte_range = jankToByteRange(jank_csv);
console.log(url, byte_range, jank_csv);
xhr.open("get", url);
xhr.setRequestHeader("Range", byte_range);
xhr.responseType = "arraybuffer";
xhr.onload = () => {
cb(xhr.response);
};
xhr.send();
}
// on form submit:
// * grab the video url and csv data.
// * fetch the first entry from the jank csv.
// * when the buffer finishes updating itself, fetch the next line until no more lines exist.
form.addEventListener('submit', (e) => {
e.preventDefault();
let data = new FormData(e.target, e.submitter);
let vid_url = data.get('vid_url');
let jank_csv = data.get('jank_csv');
lines = jank_csv.split("\n")
first = lines.shift()
fetchAB(vid_url, first, (buf) => {
sourceBuffer.appendBuffer(buf);
});
sourceBuffer.addEventListener("updateend", () => {
if (lines.length === 0) {
console.log('end of lines', mediaSource.readyState); // ended
return;
}
next = lines.shift()
if (!next) {
return;
}
fetchAB(vid_url, next, (buf) => {
sourceBuffer.appendBuffer(buf);
});
});
});
```
Now startup caddy again (`make filesrv`)
And use this for the form inputs:
* Video file url: <http://localhost:2015/bbb_h264_aac_13s_frag.mp4>
* Jank csv from above with 1 change: combine the ftyp and moov range as the single init segment to append first.
```
ftyp+moov,parent:root,0,1318
moof,parent:root,1318,1860
mdat,parent:root,3178,461608
moof,parent:root,464786,1320
mdat,parent:root,466106,771527
moof,parent:root,1237633,1316
mdat,parent:root,1238949,1052753
moof,parent:root,2291702,1320
mdat,parent:root,2293022,1382411
moof,parent:root,3675433,796
mdat,parent:root,3676229,602490
mfra,parent:root,4278719,262
```
Submit and press play.
You should be able to play the first ~13 of the video!
![first play!](/img/mse_first_play.png)
[pic_types]: https://en.wikipedia.org/wiki/Video_compression_picture_types
[apple_hls_seg]: https://developer.apple.com/documentation/http-live-streaming/hls-authoring-specification-for-apple-devices#Media-Segmentation
[caddy_files]: https://caddyserver.com/docs/quick-starts/static-files#command-line
[test_index]: https://git.sr.ht/~cfebs/vidpkg/tree/main/item/test-videos/index.html
[ffmpeg_write_codec_attr]: https://git.ffmpeg.org/gitweb/ffmpeg.git/blob/d45e20c37b1144d9c4ff08732a94fee0786dc0b5:/libavformat/hlsenc.c#l345
[mse.html]: https://git.sr.ht/~cfebs/vidpkg/tree/main/item/test-videos/mse.html
[hls_spec]: https://datatracker.ietf.org/doc/html/rfc8216
[hls_spec_frag_mp4]: https://datatracker.ietf.org/doc/html/rfc8216#section-3.3