![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
![[community profile]](https://www.dreamwidth.org/img/silk/identity/community.png)
Let's say you want to embed some videos on your website, and you
want to put them in a list so people can click through and watch
them.
The title, description, and thumbnail you give to the video are
largely subjective decisions, but the duration - in minutes and
seconds - is an objective property of the media itself, which
means you should be able to extract it if you know the video's
URL. But how exactly do you do that?
Well, the first thing you'll need to do is figure out what
exactly you're dealing with: a raw video file (or stream) that
plays in the browser's <video> tag, or a link to a page
on a site like YouTube or Vimeo that hosts embeddable content.
Technologically speaking, it's an entirely different beast. A
YouTube page gives you the code for a player, and wraps all of it
up with copy protection and a variety of other features specific
to the platform. In other words, it's not handing the end user
something to play; it's playing it for them. It's kind of like the difference
between having a record of a song, and having a band come over with their
own instruments to play it on.
That doesn't mean they can't serve the same purpose
for the end user of your site, though, and in both cases it should
be possible to programmatically determine the duration of the
media. I've written a .NET library (ISchemm.DurationFinder)
that handles this for you for a variety of common video types with
just a URL; I'll walk through how it works overall, and how it
finds the duration for each type of media that it supports.
(If your program isn't written in .NET, but you
really want to use DurationFinder, consider setting up an Azure
account, writing a small C# function app in Visual Studio or VS
Code, and deploying it to a consumption plan! I've done the
reverse more than once, using Azure Functions to wrap npm
libraries for use in a .NET application.)
Using the library
DurationFinder's interface is pretty simple:
IDurationProvider provider = Providers.All;
TimeSpan? duration = provider.GetDurationAsync(uri);
bool found = (duration != null);
Or if you just want to find the duration of an MP4 file on your
filesystem:
IDurationProvider provider = new MP4DurationProvider();
using var fs = new FileStream("file.mp4", FileMode.Open, FileAccess.Read);
using var dataSource = new StreamDataSource(fs);
TimeSpan? duration = provider.GetDurationAsync(fs);
bool found = (duration != null);
DurationFinder comes with six main providers: three for
video-sharing sites like YouTube, and three for actual media
formats like MP4. If given a URL (or an HTTP response), it can
also follow certain types of links when necessary. The built-in
providers are:
- SchemaOrgDurationProvider: A number of conventions exist for including machine-readable data in web pages, spurred on in part by the use of social media to share links. The Schema.org project uses a meta tag with an itemprop of "duration", and with the duration itself ("content=" on the tag) in ISO 8601 format. YouTube makes these tags available on its videos, and SoundCloud also uses this format for individual tracks.
- DurationFinder will return null if it sees a meta tag with
an itemprop of "isLiveBroadcast" and a content of "true"
(this is used on YouTube live streams). Interestingly, the spec seems
to call for a content of "https://schema.org/True", which I
hadn't seen until writing this!
- OpenGraphDurationProvider: Facebook's OpenGraph protocol specifies a meta tag with a property of "video:duration" and with the duration specified in seconds. This is how Dailymotion exposes the duration of a video.
- Twitch VODs sometimes include this tag, but not always; it seems to depend on whether the Twitch servers have the data cached, because loading the page again will usually get it to show up. Twitch also uses a property of og:video:duration, which doesn't seem to match the spec, but DurationFinder will accept either one.
- OEmbedDurationProvider: A good number of sites support oEmbed. The oEmbed format doesn't provide a mechanism to indicate the duration of a video, but it does allow sites to include additional parameters, and Vimeo's oEmbed response includes a "duration" property which gives the duration of the video in seconds.
- Because the oEmbed response requires an additional request to a URL specified in a link tag, DurationFinder uses an additional provider class (JsonDurationProvider) to handle this second request.
- HlsDurationProvider: Although the HLS streaming format introduced by Apple is usually used for live content, it can serve VOD as well. In this situation, the chunklist - which contains references to each media "chunk" - will contain the line #EXT-X-ENDLIST, which tells the player that the video has a finite end point. DurationFinder will add up the duration of each chunk (indicated by the #EXTINF: header above each chunk's relative URL), and return the total duration if it finds that end tag (for live streams, it will return null).
- Because the main HLS URL points to a playlist, which itself lists the relative URLs of the chunklists, an additional provider (ChunklistDurationProvider) is used to handle the actual chunklist request.
- MP4DurationProvider: DurationFinder process MP4 files from beginning to end, by loading just the header of each atom within the file (instead of the entire file); each atom's header contains enough information to determine when the next atom starts. Once the moov atom is found, the time scale and duration information in the atom will be used to determine the duration of the file. (This provider may also work for the very similar QuickTime container format.)
- VorbisDurationProvider: For Ogg files, DurationFinder
will download the end of the file, if the length is known (this
is the only provider to use IDataSource.ContentLength),
and look for the last Ogg page header (by searching for the
string OggS; Ogg files never have a page size larger
than 64 kilobytes). To calculate the timestamp from the last
page header's granule position, DurationFinder needs to know the
audio sample rate, so the file must contain a Vorbis audio
stream. (Theora videos should work as long as the audio stream
is Vorbis, and not something like FLAC.)
How it works
Everything revolves around IDurationProvider. You could
theoretically make your own duration provider, too - the interface
looks like this:
public interface IDurationProvider {
Task<TimeSpan?> GetDurationAsync(IDataSource dataSource);
}
And IDataSource looks like this:
// Represents data, which could be a web page or a media file, and could exist locally or on a web server.
public interface IDataSource {
// The total length of the data, if known. Used if the provider needs to seek to a fixed offset from the end of the file.
long? ContentLength { get; }
// Whether this data might belong to one of the given mimetypes. If the type is unknown, return true so the provider can continue.
bool MatchesType(params string[] types);
// Copies the data into a byte array.
Task<byte[]> ReadAsync();
// Copies part of the data into a byte array. If the range is out of bounds, return null so the provider knows to give up.
Task<byte[]?> GetRangeAsync(long from, long to);
// Tries to convert a relative URL found in the data to an absolute URL so the link can be followed.
bool TryCreateRelativeUri(string uriString, out Uri result);
}
The library provides two types of data sources:
- StreamDataSource can work for media files on your
local filesystem (via FileStream), but it requires a
seekable Stream object, so it won't work well for HTTP
requests.
- RemoteDataSource is helpful for remote files and pages.
- make a request to the URL (but don't immediately read the
response body)
- wrap the response in a RemoteDataSource
- pass the RemoteDataSource to the provider to try to
find the duration
- if the duration cannot be found, AND the URL points to a web
page (not a raw media file), follow any <link
rel="canonical" href="..."> tags found in the HTML
document, and perform the same steps on those URLs until it runs
out of URLs to try
That last step is really useful if the URL isn't quite the
"normal" one; for example, a YouTube "embed" page won't contain
the duration, but it will contain a link back to the "canonical"
YouTube URL.
The DurationFinder library is under an MIT-style
no-attribution license, so if the library as posted on NuGet
doesn't meet your needs, feel free to fork it, or just forklift
the code into your own project and remove the code you don't need!