package_my_video: working!
This commit is contained in:
parent
1b0d204379
commit
a9d07ba798
2 changed files with 256 additions and 2 deletions
BIN
img/mse_first_play.png
Normal file
BIN
img/mse_first_play.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 673 KiB |
|
@ -220,6 +220,11 @@ def run_ffmpeg_transcode(infname, outfname, probeinfo, segment_length=3):
|
|||
'ffmpeg',
|
||||
'-i',
|
||||
infname,
|
||||
# only keep the first video stream and first audio stream
|
||||
'-map',
|
||||
'0:v:0',
|
||||
'-map',
|
||||
'0:a:0',
|
||||
'-vcodec',
|
||||
'libx264',
|
||||
'-x264opts',
|
||||
|
@ -252,7 +257,7 @@ And if we run it, lets take a look at the cmds it executes:
|
|||
❯ ./progressive.py ./test-videos/big_buck_bunny_720p_h264.mov ./test-videos/bbb_h264_aac.mp4
|
||||
INFO:root:running cmd ffprobe -v quiet -print_format json -show_format -show_streams ./test-videos/big_buck_bunny_720p_h264.mov
|
||||
INFO:root:got fps 24.0
|
||||
INFO:root:running cmd ffmpeg -i ./test-videos/big_buck_bunny_720p_h264.mov -vcodec libx264 -x264opts keyint=72:min-keyint=72:no-scenecut -acodec aac ./test-videos/bbb_h264_aac.mp4
|
||||
INFO:root:running cmd ffmpeg -i ./test-videos/big_buck_bunny_720p_h264.mov -map 0:v:0 -map 0:a:0 -vcodec libx264 -x264opts keyint=72:min-keyint=72:no-scenecut -movflags faststart -acodec aac ./test-videos/bbb_h264_aac.mp4
|
||||
```
|
||||
|
||||
Now regenerate the frame dump and check if our I frames match the expected: 1, 73, 145, 217 ...
|
||||
|
@ -331,6 +336,14 @@ Drop [this `index.html`][test_index] into the same directory as your test videos
|
|||
❯ caddy file-server --access-log --browse --listen :2015 --root ./test-videos
|
||||
```
|
||||
|
||||
Will stash that in a `Makefile` helper:
|
||||
|
||||
```
|
||||
.PHONY: filesrv
|
||||
filesrv: filesrv
|
||||
caddy file-server --access-log --browse --listen :2015 --root ./test-videos
|
||||
```
|
||||
|
||||
I kept my version of the mp4 prior to adding the `faststart` option, so I have two files:
|
||||
|
||||
```
|
||||
|
@ -447,9 +460,250 @@ But this time firefox transfers ~7-9 MB (it changes per test), and there's only
|
|||
|
||||
Best guess here is that firefox is still trying to read 1.5MB, but it encounters the `moov` immediately and just keeps reading.
|
||||
|
||||
That's the first time I've seen this in action.
|
||||
With the progressive file in a good place it's now time to turn to segmenting. And in the browser we need `Media Source Extensions` for this.
|
||||
|
||||
## MediaSource
|
||||
|
||||
### RFC 6381 codecs and `MediaSource.isTypeSupported`
|
||||
|
||||
One of the first weird hurdles is checking if our particular codecs are supported:
|
||||
|
||||
```js
|
||||
MediaSource.isTypeSupported('video/mp4; codecs="avc1.64001f, mp4a.40.2"');
|
||||
```
|
||||
|
||||
This string is in the format specified by [RFC 6381](https://datatracker.ietf.org/doc/html/rfc6381).
|
||||
|
||||
Strangely there is no easy way to get this information from `ffprobe`. For reference, here is a 2017 feature request to add this: <https://web.archive.org/web/20240406102137/https://trac.ffmpeg.org/ticket/6617>
|
||||
|
||||
As noted by a comment in the ticket, there is actually support [in the codebase for writing the string][ffmpeg_write_codec_attr] in what looks like the hls segmenter.
|
||||
|
||||
Instead of trying to hack something up there, an alternative is to use `MP4Box` from <https://github.com/gpac/gpac>
|
||||
|
||||
```shell
|
||||
❯ MP4Box -info ./test-videos/bbb_h264_aac.mp4 2>&1 | grep 'RFC6381' | awk -F':\\s*' '{print $2}'
|
||||
avc1.64001F
|
||||
mp4a.40.2
|
||||
```
|
||||
|
||||
Then just build the string: `video/mp4; codecs="{0}, {1}"` from that output.
|
||||
|
||||
Now in the browser lets check:
|
||||
```js
|
||||
> MediaSource.isTypeSupported('video/mp4; codecs="avc1.64001F, mp4a.40.2"');
|
||||
true
|
||||
```
|
||||
|
||||
### How does MediaSource work? What is actually playable?
|
||||
|
||||
MediaSource is all about appending bytes to buffers that match the expected codecs.
|
||||
|
||||
When you append a buffer of bytes into a MediaSource buffer, it must be a valid Byte Stream Format: <https://www.w3.org/TR/media-source-2/#byte-stream-formats>
|
||||
|
||||
Here are the types of valid stream formats: <https://www.w3.org/TR/mse-byte-stream-format-registry/#registry>
|
||||
|
||||
* ISOBMFF: <https://www.w3.org/TR/mse-byte-stream-format-isobmff/>
|
||||
* MPEG-2 Transport Stream: <https://www.w3.org/TR/mse-byte-stream-format-mp2t/>
|
||||
|
||||
### MP4 byte stream
|
||||
|
||||
The first segment should be an "initialization segment":
|
||||
|
||||
> An ISO BMFF initialization segment is defined in this specification as a single File Type Box (ftyp) followed by a single Movie Box (moov).
|
||||
|
||||
Then the actual media:
|
||||
|
||||
> An ISO BMFF media segment is defined in this specification as one optional Segment Type Box (styp) followed by a single Movie Fragment Box (moof) followed by one or more Media Data Boxes (mdat). If the Segment Type Box is not present, the segment MUST conform to the brands listed in the File Type Box (ftyp) in the initialization segment.
|
||||
|
||||
Our progressive file at the moment does not conform to this spec. The file layout we have at the moment is:
|
||||
|
||||
```
|
||||
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x34423a80] type:'ftyp' parent:'root' sz: 32 8 157677185
|
||||
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x34423a80] type:'moov' parent:'root' sz: 412246 40 157677185
|
||||
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x34423a80] type:'free' parent:'root' sz: 8 412286 157677185
|
||||
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x34423a80] type:'mdat' parent:'root' sz: 157264899 412294 157677185
|
||||
```
|
||||
|
||||
The next step would be to mux into the required format such that each segment will contain a `moof` box then `mdat` box otherwise known as "fragmented mp4".
|
||||
|
||||
### MPEG-2 Transport Stream
|
||||
|
||||
Skipping this for now because [fragemented MP4 is valid media segment format according to the HLS spec][hls_spec_frag_mp4].
|
||||
|
||||
### Getting bytes in a MSE buffer with fragmented mp4
|
||||
|
||||
Firstly, lets produce a smaller file to work with for this example.
|
||||
|
||||
```
|
||||
❯ ffmpeg -t 13s -i bbb_h264_aac.mp4 -c copy -f mp4 ./bbb_h264_aac_13s.mp4
|
||||
```
|
||||
|
||||
13 seconds should be 2 6s segments + 1 partial segment so should be good for testing.
|
||||
|
||||
Next lets fragment with a helper script `makefragmented.sh`
|
||||
|
||||
```
|
||||
#!/usr/bin/env bash
|
||||
ffmpeg -i $1 -c copy -movflags 'frag_keyframe+empty_moov+default_base_moof' -f mp4 $2
|
||||
```
|
||||
|
||||
* `frag_keyframe` should only create fragments on our key frames that were already established every 3s.
|
||||
* `empty_moov` if not included there will be some `mdat` data included in the `moov` box which is not a valid segment according to the spec.
|
||||
* `default_base_moof` this is recommended by MDN for Chrome. With or without this option the root level mp4 boxes look the same. The ffmpeg docs say:
|
||||
|
||||
> this flag avoids writing the absolute base_data_offset field in tfhd atoms, but does so by using the new default-base-is-moof flag instead. This flag is new from 14496-12:2012. This may make the fragments easier to parse in certain circumstances (avoiding basing track fragment location calculations on the implicit end of the previous track fragment).
|
||||
|
||||
So now make the fragemented mp4
|
||||
```shell
|
||||
./makefragmented.sh ./test-videos/bbb_h264_aac_13s.mp4 ./test-videos/bbb_h264_aac_13s_frag.mp4
|
||||
```
|
||||
|
||||
Now lets look at the boxes again. Again using a handy script: `ffprobe-trace-boxes.sh`
|
||||
```
|
||||
#!/usr/bin/env bash
|
||||
echo "box_type,box_parent,offset,size"
|
||||
ffprobe -v trace $1 2>&1 | grep 'type:.*parent:.*sz:' | sed "s/^.*type://; s/'//g" | awk '{print $1 "," $2 "," $5 "," $4}'
|
||||
```
|
||||
|
||||
And just the top level boxes:
|
||||
|
||||
```
|
||||
❯ ./ffprobe-trace-boxes.sh ./test-videos/bbb_h264_aac_13s_frag.mp4 | grep parent:root
|
||||
ftyp,parent:root,8,28
|
||||
moov,parent:root,36,1282
|
||||
moof,parent:root,1318,1860
|
||||
mdat,parent:root,3178,461608
|
||||
moof,parent:root,464786,1320
|
||||
mdat,parent:root,466106,771527
|
||||
moof,parent:root,1237633,1316
|
||||
mdat,parent:root,1238949,1052753
|
||||
moof,parent:root,2291702,1320
|
||||
mdat,parent:root,2293022,1382411
|
||||
moof,parent:root,3675433,796
|
||||
mdat,parent:root,3676229,602490
|
||||
mfra,parent:root,4278719,262
|
||||
```
|
||||
|
||||
So that looks like it hits the ISO BMFF stream spec
|
||||
|
||||
* `ftyp` + `moov` make up the init sequence
|
||||
* `moof` + `mdat` make up each segment/fragment
|
||||
* `mfra` not sure about this yet, but ignoring for now.
|
||||
|
||||
And it looks like it matches roughly our expected `13s duration / 3s key frame = 4.3` segments which means there should be 5 total `moof` boxes.
|
||||
|
||||
This will be our first ever manifest format dubbed a "jank csv manifest"
|
||||
|
||||
The goal now is fetching each of these byte ranges and adding them to a MediaSource buffer.
|
||||
|
||||
I have created a new [`mse.html`][mse.html] file and will explain the important points in comments.
|
||||
|
||||
NOTE: this is _not_ proper MSE buffer state handling. It is the MVP of just shoving bytes into the buffer.
|
||||
|
||||
```js
|
||||
// The RFC 6381 codec string of the video
|
||||
const mimeCodec = 'video/mp4; codecs="avc1.64001F, mp4a.40.2"';
|
||||
|
||||
// This is all boilerplate buffer setup
|
||||
let sourceBuffer = null;
|
||||
let mediaSource = null;
|
||||
if ("MediaSource" in window && MediaSource.isTypeSupported(mimeCodec)) {
|
||||
mediaSource = new MediaSource();
|
||||
video.src = URL.createObjectURL(mediaSource);
|
||||
mediaSource.addEventListener("sourceopen", () => {
|
||||
sourceBuffer = mediaSource.addSourceBuffer(mimeCodec);
|
||||
});
|
||||
} else {
|
||||
console.error("Unsupported MIME type or codec: ", mimeCodec);
|
||||
}
|
||||
|
||||
// parse a line from the .jank file -> Range: bytes=X-Y string
|
||||
function jankToByteRange(jank_csv) {
|
||||
parts = jank_csv.split(',')
|
||||
beg = parseInt(parts[2], 10);
|
||||
sz = parseInt(parts[3], 10);
|
||||
return "bytes=" + beg + '-' + (beg + sz - 1)
|
||||
}
|
||||
|
||||
// fetch a byte range from a url
|
||||
function fetchAB(url, jank_csv, cb) {
|
||||
const xhr = new XMLHttpRequest();
|
||||
byte_range = jankToByteRange(jank_csv);
|
||||
console.log(url, byte_range, jank_csv);
|
||||
xhr.open("get", url);
|
||||
xhr.setRequestHeader("Range", byte_range);
|
||||
xhr.responseType = "arraybuffer";
|
||||
xhr.onload = () => {
|
||||
cb(xhr.response);
|
||||
};
|
||||
xhr.send();
|
||||
}
|
||||
|
||||
// on form submit:
|
||||
// * grab the video url and csv data.
|
||||
// * fetch the first entry from the jank csv.
|
||||
// * when the buffer finishes updating itself, fetch the next line until no more lines exist.
|
||||
form.addEventListener('submit', (e) => {
|
||||
e.preventDefault();
|
||||
let data = new FormData(e.target, e.submitter);
|
||||
let vid_url = data.get('vid_url');
|
||||
let jank_csv = data.get('jank_csv');
|
||||
|
||||
lines = jank_csv.split("\n")
|
||||
first = lines.shift()
|
||||
|
||||
fetchAB(vid_url, first, (buf) => {
|
||||
sourceBuffer.appendBuffer(buf);
|
||||
});
|
||||
|
||||
sourceBuffer.addEventListener("updateend", () => {
|
||||
if (lines.length === 0) {
|
||||
console.log('end of lines', mediaSource.readyState); // ended
|
||||
return;
|
||||
}
|
||||
next = lines.shift()
|
||||
if (!next) {
|
||||
return;
|
||||
}
|
||||
fetchAB(vid_url, next, (buf) => {
|
||||
sourceBuffer.appendBuffer(buf);
|
||||
});
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
Now startup caddy again (`make filesrv`)
|
||||
|
||||
And use this for the form inputs:
|
||||
|
||||
* Video file url: <http://localhost:2015/bbb_h264_aac_13s_frag.mp4>
|
||||
* Jank csv from above with 1 change: combine the ftyp and moov range as the single init segment to append first.
|
||||
```
|
||||
ftyp+moov,parent:root,0,1318
|
||||
moof,parent:root,1318,1860
|
||||
mdat,parent:root,3178,461608
|
||||
moof,parent:root,464786,1320
|
||||
mdat,parent:root,466106,771527
|
||||
moof,parent:root,1237633,1316
|
||||
mdat,parent:root,1238949,1052753
|
||||
moof,parent:root,2291702,1320
|
||||
mdat,parent:root,2293022,1382411
|
||||
moof,parent:root,3675433,796
|
||||
mdat,parent:root,3676229,602490
|
||||
mfra,parent:root,4278719,262
|
||||
```
|
||||
|
||||
Submit and press play.
|
||||
|
||||
You should be able to play the first ~13 of the video!
|
||||
|
||||
![first play!](/img/mse_first_play.png)
|
||||
|
||||
[pic_types]: https://en.wikipedia.org/wiki/Video_compression_picture_types
|
||||
[apple_hls_seg]: https://developer.apple.com/documentation/http-live-streaming/hls-authoring-specification-for-apple-devices#Media-Segmentation
|
||||
[caddy_files]: https://caddyserver.com/docs/quick-starts/static-files#command-line
|
||||
[test_index]: https://git.sr.ht/~cfebs/vidpkg/tree/main/item/test-videos/index.html
|
||||
[ffmpeg_write_codec_attr]: https://git.ffmpeg.org/gitweb/ffmpeg.git/blob/d45e20c37b1144d9c4ff08732a94fee0786dc0b5:/libavformat/hlsenc.c#l345
|
||||
[mse.html]: https://git.sr.ht/~cfebs/vidpkg/tree/main/item/test-videos/mse.html
|
||||
[hls_spec]: https://datatracker.ietf.org/doc/html/rfc8216
|
||||
[hls_spec_frag_mp4]: https://datatracker.ietf.org/doc/html/rfc8216#section-3.3
|
||||
|
|
Loading…
Reference in a new issue