isaacschemm: Drawing of myself as a snail (snail)
[personal profile] isaacschemm posting in [community profile] snailsharp

Let's say you want to embed some videos on your website, and you want to put them in a list so people can click through and watch them.

Aset of three YouTube thumbnails. 1: "Summer CampIsland". 2: "Connecting a Bluetooth tape adapter to aBluetooth adapter for tape players". 3: "Are QuantumLeap and Gilmore Girls CONNECTED?"

The title, description, and thumbnail you give to the video are largely subjective decisions, but the duration - in minutes and seconds - is an objective property of the media itself, which means you should be able to extract it if you know the video's URL. But how exactly do you do that?

Well, the first thing you'll need to do is figure out what exactly you're dealing with: a raw video file (or stream) that plays in the browser's <video> tag, or a link to a page on a site like YouTube or Vimeo that hosts embeddable content. Technologically speaking, it's an entirely different beast. A YouTube page gives you the code for a player, and wraps all of it up with copy protection and a variety of other features specific to the platform. In other words, it's not handing the end user something to play; it's playing it for them. It's kind of like the difference between having a record of a song, and having a band come over with their own instruments to play it on.

SoI've got "Forever Your Girl" on CD, and also PaulaAbdul is in my kitchen. Not sure why.

That doesn't mean they can't serve the same purpose for the end user of your site, though, and in both cases it should be possible to programmatically determine the duration of the media. I've written a .NET library (ISchemm.DurationFinder) that handles this for you for a variety of common video types with just a URL; I'll walk through how it works overall, and how it finds the duration for each type of media that it supports.

(If your program isn't written in .NET, but you really want to use DurationFinder, consider setting up an Azure account, writing a small C# function app in Visual Studio or VS Code, and deploying it to a consumption plan! I've done the reverse more than once, using Azure Functions to wrap npm libraries for use in a .NET application.)

Using the library

DurationFinder's interface is pretty simple:

IDurationProvider provider = Providers.All;
TimeSpan? duration = provider.GetDurationAsync(uri);
bool found = (duration != null);

Or if you just want to find the duration of an MP4 file on your filesystem:

IDurationProvider provider = new MP4DurationProvider();
using var fs = new FileStream("file.mp4", FileMode.Open, FileAccess.Read);
using var dataSource = new StreamDataSource(fs);
TimeSpan? duration = provider.GetDurationAsync(fs);
bool found = (duration != null);

DurationFinder comes with six main providers: three for video-sharing sites like YouTube, and three for actual media formats like MP4. If given a URL (or an HTTP response), it can also follow certain types of links when necessary. The built-in providers are:

  • SchemaOrgDurationProvider: A number of conventions exist for including machine-readable data in web pages, spurred on in part by the use of social media to share links. The Schema.org project uses a meta tag with an itemprop of "duration", and with the duration itself ("content=" on the tag) in ISO 8601 format. YouTube makes these tags available on its videos, and SoundCloud also uses this format for individual tracks.
    • DurationFinder will return null if it sees a meta tag with an itemprop of "isLiveBroadcast" and a content of "true" (this is used on YouTube live streams). Interestingly, the spec seems to call for a content of "https://schema.org/True", which I hadn't seen until writing this!
  • OpenGraphDurationProvider: Facebook's OpenGraph protocol specifies a meta tag with a property of "video:duration" and with the duration specified in seconds. This is how Dailymotion exposes the duration of a video.
    • Twitch VODs sometimes include this tag, but not always; it seems to depend on whether the Twitch servers have the data cached, because loading the page again will usually get it to show up. Twitch also uses a property of og:video:duration, which doesn't seem to match the spec, but DurationFinder will accept either one.
  • OEmbedDurationProvider: A good number of sites support oEmbed. The oEmbed format doesn't provide a mechanism to indicate the duration of a video, but it does allow sites to include additional parameters, and Vimeo's oEmbed response includes a "duration" property which gives the duration of the video in seconds.
    • Because the oEmbed response requires an additional request to a URL specified in a link tag, DurationFinder uses an additional provider class (JsonDurationProvider) to handle this second request.
  • HlsDurationProvider: Although the HLS streaming format introduced by Apple is usually used for live content, it can serve VOD as well. In this situation, the chunklist - which contains references to each media "chunk" - will contain the line #EXT-X-ENDLIST, which tells the player that the video has a finite end point. DurationFinder will add up the duration of each chunk (indicated by the #EXTINF: header above each chunk's relative URL), and return the total duration if it finds that end tag (for live streams, it will return null).
    • Because the main HLS URL points to a playlist, which itself lists the relative URLs of the chunklists, an additional provider (ChunklistDurationProvider) is used to handle the actual chunklist request.
  • MP4DurationProvider: DurationFinder process MP4 files from beginning to end, by loading just the header of each atom within the file (instead of the entire file); each atom's header contains enough information to determine when the next atom starts. Once the moov atom is found, the time scale and duration information in the atom will be used to determine the duration of the file. (This provider may also work for the very similar QuickTime container format.)
  • VorbisDurationProvider: For Ogg files, DurationFinder will download the end of the file, if the length is known (this is the only provider to use IDataSource.ContentLength), and look for the last Ogg page header (by searching for the string OggS; Ogg files never have a page size larger than 64 kilobytes). To calculate the timestamp from the last page header's granule position, DurationFinder needs to know the audio sample rate, so the file must contain a Vorbis audio stream. (Theora videos should work as long as the audio stream is Vorbis, and not something like FLAC.)

How it works

Everything revolves around IDurationProvider. You could theoretically make your own duration provider, too - the interface looks like this:

public interface IDurationProvider {
    Task<TimeSpan?> GetDurationAsync(IDataSource dataSource);
}

And IDataSource looks like this:

// Represents data, which could be a web page or a media file, and could exist locally or on a web server.
public interface IDataSource {
// The total length of the data, if known. Used if the provider needs to seek to a fixed offset from the end of the file.
    long? ContentLength { get; }

// Whether this data might belong to one of the given mimetypes. If the type is unknown, return true so the provider can continue.
    bool MatchesType(params string[] types);

// Copies the data into a byte array.
    Task<byte[]> ReadAsync();

// Copies part of the data into a byte array. If the range is out of bounds, return null so the provider knows to give up.
    Task<byte[]?> GetRangeAsync(long from, long to);

// Tries to convert a relative URL found in the data to an absolute URL so the link can be followed.
    bool TryCreateRelativeUri(string uriString, out Uri result);
}

The library provides two types of data sources:

  • StreamDataSource can work for media files on your local filesystem (via FileStream), but it requires a seekable Stream object, so it won't work well for HTTP requests.
  • RemoteDataSource is helpful for remote files and pages.
But instead of using RemoteDataSource directly, you'll probably want to go through the extension method GetDurationAsync (defined in the Extensions module) that lets you pass in a Uri. This method will:
  1. make a request to the URL (but don't immediately read the response body)
  2. wrap the response in a RemoteDataSource
  3. pass the RemoteDataSource to the provider to try to find the duration
  4. if the duration cannot be found, AND the URL points to a web page (not a raw media file), follow any <link rel="canonical" href="..."> tags found in the HTML document, and perform the same steps on those URLs until it runs out of URLs to try

That last step is really useful if the URL isn't quite the "normal" one; for example, a YouTube "embed" page won't contain the duration, but it will contain a link back to the "canonical" YouTube URL.

The DurationFinder library is under an MIT-style no-attribution license, so if the library as posted on NuGet doesn't meet your needs, feel free to fork it, or just forklift the code into your own project and remove the code you don't need!

This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

Snail#

A programming blog where the gimmick is that I pretend to be a snail.

Expand Cut Tags

No cut tags

Style Credit

Page generated Jun. 14th, 2025 04:28 am
Powered by Dreamwidth Studios