Video SEO: Ranking Videos in Google Without Rich Snippets

[This blog post is a summary of the first part of a presentation I gave at SearchLove London. You can find the full presentation here.]

Too often, Video SEO is simplified to a discussion of video rich snippets, which are the visual markup of video results to include thumbnails and timestamps.

Video Rich Snippet

This perspective leads to a predictable narrative  around the SERP feature’s value, as measured by its SERP visibility, which is somewhere between 8% and 18% of search results (SEMRush, Mozcast, STAT).

However, as hopeful as these percentages seem, the opportunity is deflated by YouTube owning nearly 90% of these video results, which hasn’t changed much since mid-2014. For the most part, if you want a video rich snippet, you put your video on YouTube (although sites with dedicated video sections can also fair well).

This leads to a few common conclusions:

  1. Google is prioritizing their own properties (and other tinfoil variants)
  2. YouTube is much more important to Google video search in a post-snippet video world
  3. If you’re not using YouTube, the SEO traffic opportunity is limited
  4. There is not much benefit to traditional video SEO tactics (because you’re not going to get the snippet anyways)

However, these are flawed perspectives on video search, including the how and why of using video in your content marketing mix.

Video Rich Snippets Are Incremental & Optional

Using the video SERP feature to denote a “video result” is logically flawed in the same way that a product URL is not defined by its rating & review product snippet. A result can be a video result without the presence of the thumbnail. Google’s discovery of the thumbnail and their determination to present that thumbnail are independent and secondary steps.

URL With and Without Video Rich Snippet

Above are two example URLs in traditional web results (“All” tab) compared to the same two URLs in video results (“Video” tab). This helps demonstrate that Google can have rich snippet data for a URL but choose to suppress that data. Seen another way, these are examples of “video results” being integrated into universal search results, even though they’re missing the visual designation.

Because of this, there is a flaw in assuming 8%-18% of SERPs include “video results.” (That stat defines the video SERP feature only). To speak to video results more broadly, we would need to take the secondary step of crawling the ranking URLs to check their source code for video embeds and check the URL listings in “Video” results to determine whether Google has snippet data.

So why is this important? If Google is using the video” determination to influence rankings, even if they don’t display the rich snippet, then we can take advantage of that. In other words, we can rank on queries (and rank higher) with video content better than we can with text. (And, in some instances, it may be nearly impossible to rank on queries without video).

Intent as a Ranking Factor

Searcher intent has become an increasingly important factor in determining the results for a query. How this is being done could be its own blog post, but it’s a combination of natural language processing, machine learning, and usage data that allows Google to “sort” or “blend” initial result sets to optimize for intent completion or query satisfaction.

Said another way, Google knows that different portions of users searching on a given keyword are searching it for specific reasons. On these queries, it can become nearly impossible to rank without ensuring that your target page addresses the determined intent (no matter how technically sound, well-targeted, or linked to that URL is). They’re also using this intent to weight the blending of different result types and verticals into a universal search result.

Local results appearing on general queries are a common example of this. Look at the SERPs for “hiking” (as they appear here in Portland, OR).

SERP with Local Intent

[Red=local results, Blue=general, non-localized results. Numbers=local query position]

This broad keyword has both general, informational intent, as well as a more “do” or “transactional” local intent (i.e. “where can I go hiking near me”). Google takes the general term, fetches the results, which are shown in blue above, then performs a secondary search where they append the searcher’s location to the original query (“hiking Portland”) and takes those results, which are shown in red above, and blends them together in an attempt to address multiple intents. This means, for sites targeting the general, non-localized intent, there are only 5-6 slots of opportunity on page one, not 10.

Apply this same logic to keywords where the “intent” is to discover content that is more visual in nature. In other words, on a given keyword, a certain percentage of searchers may provide usage data, in aggregate, that suggests they prefer a video result or content with a video on it. This is a sliding scale of intent, but those keywords with a high interest in video content can be called “video keywords”.

We can see this intent “blending” in action on a video keyword like “how to paint a room.”

Breakdown of Video Results

Out of the top 10 results, seven URLs have a video on them. Traditional SEO logic suggests that there is only one video result (the YouTube video with the video rich snippet), but there are actually several video results on this SERP. However, Google has decided to suppress the presentation of the rich snippet on most of these.

Similar to how Google integrates results for “Hiking” + “Portland,” Google is integrating results from searching “how to paint a room” within its video index, as seen below.

Video results ranking directly

The first four results in the “Video” tab rank on page one of the “All” results (similar to local blending). Of these, three have the rich snippet in the video tab, even though it’s suppressed on the blended universal results (suggesting the SERP feature is an incremental determination above and beyond the determination of this URL as a video result).

In addition to these results, two of these results don’t rank directly, but are the video pages of videos embedded on a URL within the top 10 “All” results.

Videos embeded in ranking URLs

Looking at how heavily Google is weighting URLs with video (or weighting the blending of video vertical results) on this SERP, it could be very difficult to earn a top ranking without visual content. Looked at another way, this weighting provides a disproportionate advantage to video content when trying to rank on certain query types.

If you want to rank on this keyword, it’s going to be significantly harder to accomplish that with text-only articles and blog posts.

Higher Video Intent on Mobile

In general, mobile searchers have a higher affinity for certain query and content types when compared to desktop searchers, and one of those content types is video. This is reflected in Google’s visual treatment of video on mobile SERPs.

Video carousel in mobile search

If there is a single video SERP feature on a desktop result, it is converted into a video carousel on mobile. This carousel scrolls through the video vertical results (the results in the “Video” tab). This provides a strategic advantage to video content on mobile for the SEO that targets video vertical rankings on queries that have a video rich snippet at positions 1 through 5.

In other words, if you’re struggling to rank well on a keyword with a video snippet in the top 5 and that keyword has a high percentage of mobile searches, a video-based strategy might be a good way to earn yourself top 5 visibility on the mobile SERP.

Growing Trend of Video Intent

I suspect this trend will continue, not only because visual and voice analysis, both as inputs and outputs, are at the core of search technological and algorithmic development, but because Google needs to follow audiences, who are increasingly interested in video content.

At Briggsby, we conducted a survey to explore relative interest in video versus text content by different age ranges. This was driven by observations that some forms of traditional text content consumption (i.e. blogging) are declining while more visual and voice mediums were growing (i.e. YouTube, Instagram, Snapchat, Facebook Video, and podcasts). Here is what we found:

Video vs. Text Preference by Age

(N = 1,000 / Remaining respondents chose “I do not read or watch content online” or “I do not have a preference for text or video content.”)

In general, we’re seeing a higher affinity for video content amongst younger demographics, with 18-24 year olds preferring video over text content by a fair margin. This preference shifts back towards text as age increases, with the oldest demographic having a significantly higher preference for text content.

However, this is not true for all content types. For “review” content, there is a preference for text content across all age groups.

Review Content Type Preference by Age

This trend seems to have the most significant implications on “learn” content, where the desire for video content becomes my pronounced.

Content Preferences for Learning

This important for SEO, because one of the areas that SEO distinguishes itself from SEM is targeting top-of-funnel, informational queries. This strategy is at the core of many SEO program’s blog/article content marketing strategies.

There are a few implications here. First, for those targeting customers/audiences younger than 35, video is very important on informational and educational queries. Second, this trend may shift as those in the younger age groups age over the next 5+ years. Lastly, considering Google’s “intent” determination is, at least in part, determined by usage data, this content type preference could influence the weight assigned to video intent on certain query types.

Future of Video Search

While there are immediate opportunities in video search, it’s important to keep an eye on current advancements in video information retrieval, as they may dramatically impact how we think about video SEO over the next few years.

Suggested Clip

Google tips its hand with their tests of the “Suggested Clip” featured snippet, where they point users to a specific minute mark when the answer to a question is given deep within a video.

Suggested Clip Feature Snippet in Google

This feature has some significant limitations and fails to provide an effective answer frequently, but it does point to a potential future state of video indexation. First, it is pointing to a future of effective automatic transcriptions of video content. Second, it suggests the application of natural language processing on these transcriptions, similar to featured snippet answers within traditional search. Lastly, it may suggest a future shift away from the more exact-match nature of video search to one that is more long-tail in nature, by better understanding the content within a video.

Indexation of Video Content

One of the major limitations of video indexation is speech-to-text or automatic transcriptions. However, during a session I attended at VidCon, the team at YouTube that works on automatic transcriptions talked about the significant improvements they’ve made in the last year.

They shared that YouTube has automatic captions on 1 billion videos. While they were discussing this from the perspective of accessibility, as an SEO, I see this as an improvement in video indexation.

Historically, though, these automatic captions were a bit humorous in their attempt to transcribe audio in a video. However, in the last year, their accuracy has improved by 50%.

These both demonstrate major investments by Google in having machines understand both video and voice.

Entity Detection in Videos

An important part of this improvement, and something the YouTube team seemed very proud about during their presentation, was their ability to identify entities in video content. This is a significant improvement in automatic captions.

Entities in YouTube Automatic Transcription

In this example, you’ll notice the effective detection of named entities, such as the capitalized “Stephen King.” Not only that, but they effectively heard the possessive form of the word and added the possessive apostrophe. YouTube also identified the entities 1990 (a year) and Time Curry (actor). Using entity triples and basic natural language processing, Google may be able to develop a very sophisticated understanding of this video content. This can be done using words like “book” and “miniseries,” which have very specific meanings, and words like “starring,” which help define an entity triple. If you’re interested in natural language processing and entities, I’ve given two presentations on them (here and here).

As this improves, many of the advancements we’ve seen in entities and the natural language processing of text may make its way to video search.

Knowledge Graph in YouTube

Entities are already making their way into video search through basic Knowledge Graph panels in YouTube Search.

Knowledge Graph in YouTube

Here you’ll see a search for “Seahawks,” an American football team, produce a Knowledge Graph style result that includes game highlights, as well as related NFL teams.

Integrating Video into Your Content Strategy

Given the growing interest of audiences in video content and Google’s recent and on-going enhancements, video content production is an area where many SEO programs are under-investing. This underinvestment is due, in part, to a limited perspective on how video SEO works, reducing its value to SERP features, instead of the broader implications of the intent-driven weighting of different content forms.

In some ways, traditional text-based blogging, which is at the core of most SEO programs’ content marketing strategies, is an “old man’s game” (or old woman’s) and is giving way to vlogging. As Google improves their determination of user intent, it’s important for us to not only think of how we’re meeting a searcher’s intent from a content perspective but also from its form.

If you’re interested in the full presentation from SearchLove London, you can see it here:

Leave a Comment