Entities. They’re nothing new in SEO, but over the last year, I’ve been ruminating on how moving into entities should fundamentally be changing the way most of us are still thinking about SEO.
To start, let’s first step back and look at the history of the algorithms.
Moving from Pages to Entities
In the early years of the post-Google era, the link graph was focused on page to page hyperlink citations, which lead to work in PageRank, TrustRank, HITS, SpamRank, anchor text, etc. Over time, this continued to domain level analysis, where pages were clumped together and the processes were applied on the domain level. I think this was the first step into entity specific search (with domains being the simplest entity to understand).
I think we’re several years into the on-going interation of entities. We’ve seen it early on with businesses and local results. These were the second easiest to conceptually build. We also saw Google start pushing brands, which is yet another implementation of entities. They’re more unique and verifiable than people’s name as a named entity. Then we saw question / answer one boxes, which started answering questions with known named entities and facts (search for “who wrote harry potter” for an example).
Today, we’re seeing it expanding into AuthorRank/AgentRank, implicit social graphs, Google+, Schema.org, business page verification, and authors in search. They also helped set the stage back in 2010 when they acquired Metaweb (Freebase) and developed Google Squared (which was discontinued in labs, but the technology still continues in search).
I think we’re seeing the growth of this ranking concept, and initatives like Schema.org and Rel=”Author” only helps demonstrates the on-going effort to create a Schema for data related to entities.
I’m also tempted to propose that “entity search” could even be considered its own vertical in a sense, in the same way local, images, and social results are – with entity results being weighted when search queries trigger this vertical.
What Exactly is An Entity
An entity is anything, including real world objects, facts, and concepts, that has a number of documents associated with it. In the historical search models, a document is supported by other documents, but in the entity world, a conceptual object is supported by documents.
Examples of entities are businesses, products, movies, authors, people, places, events, etc.
From here, you end up with a known entity with the following information about it.
Entities are exceptionally powerful for search. The Metaweb promo video does a great job explaining the value and power of entities.
Looking at an entity based search engine conceptually changes many aspects of SEO. It would no longer be about just optimizing against a page, but optimizing for an object.
How Do Search Engines Do This?
Bill Slawski has done a great job discussing entities as they’ve developed over the years. I’d recommend taking time to read about it. (I was excited he did a wrap up post for it this week.)
But if you want to hear it directly from Google, I recommend watching the almost hour and a half long talk Andrew Hogue gave on “The Structured Search Engine”. At the time of the presentation, Andrew was working in New York on Google Squared and entity building with Freebase. I believe he now heads up search at FourSquare.
The presentation was one of the more enlightening hours I’ve spent on SEO in the last few months.
The Reddit Phenomena
Two months ago, a thread went up on Reddit comparing the Google Algorithm to Bing’s, especially when providing the right answer to relatively vague descriptions of movies. It seems as if Google knew what the searcher meant, beyond just hearing the words (which almost mirrors the language Andrew uses in his presentation).
I suspect what we’re seeing in this Reddit thread is entity search. I think Bing does it as well, and can get it right, like Dan Shure mentioned in this tweet. However, I think Google gets this better than Bing does. They have the superior index, larger history, faster speeds, and greater computing power. They also have Freebase, which gives them an exceptional edge on entities like movies, which are very well understood on Freebase.
The thing that set-off a bell in my head was when Director of Bing, Stefan Weitz, addressed a Q&A question after Rand’s presentation at Mozcation Seattle, where, although we were discussing local, he referenced the concept of entities using the example of a sofa.
“Longer term we’re looking at how we think of the web really as a representation of the physical world itself” – Stefan Weitz, Bing (SEOmoz WBF)
How Might This Be Working?
Let’s take this search from Reddit:
that movie that’s backwards and the guy can’t remember anything
The results return Memento, then it leads into random script, quotes, and review websites.
… I don’t think Google is returning pages about Memento…..
I think they are returning “Memento” … the entity.
The difference there may seem slight, but if that’s what is happening, I think it’s significant.
Let’s break it down.
I do think we see advanced language understanding in Google, but I feel the Reddit examples are entity weighted results being pulled into universal results or entities are being used to weight particular documents in the broad match indicies.
Queries with structures like “that movie where X” are easy for understanding intents, which lend themselves well to entity style searching.
How Entities Change SEO
How this changes SEO is a whole series of posts, and one can only speculate on how far this goes, but I think there are a few immediate items that we’re already seeing, or will continue to see, in the next few years.
Going broader than pages:
This conversation has been growing louder over time, but we’re stepping outside the SEO we’ve been doing since the early 2000’s. With 70% of demand being in the longtail, there is nothing SEOs can do to optimize for all of these phrases in the same exact, phase, and broad keyword match paradigm we’ve been using for years.
Google research has shown that on more difficult queries, people start to type their searches as natural language questions. They also searched longer queries on average. This study also stated that, at the time of the study (2010), most of the time the question queries failed to give users the information they were looking for and they would revert back to keyword queries.
Looking at query length compared to query success, we can see a clustering in the bottom right quadrant, which suggests that Google is doing better with short, more specific keyword style searches. When users are trying to find the answer to a difficult query, they tend to go longer and are met with less success.
The study shows how those with unsuccessful searches also tend towards more question words. A sign that they might be looking for a specific answer, which lends itself to entity style search.
I think this study does a good job of demonstrating the value for Google to improve its natural language understanding, especially for longer tail and question specific queries.
As SEOs, we need to look at how IMDB is successfully ranking in some of these longtail, natural language, question style queries. What is happening here is more complicated than longtail keyword targeting or UGC strategies.
It also changes the paradigm of how to rank. Instead of thinking in terms of setting a page to a keyword, getting anchor text, and getting link popularity; you can start to consider aligning a page with a known entity when appropriate.
The right “type” of citations by entity type:
This influences the link building game as well. The old model of PageRank, TrustRank, LRD, anchor text, domain diversity, related content, keyword in title, topic-sensitive PageRank, etc. will change a bit.
It adds a new dimension, which isn’t just to get links that support the topic/theme, but to get citations that support the entity type (and this isn’t always hyperlinks, like we’ve seen in local search).
I don’t think we’re too far off from the days where we’ll see widespread strategies to get authors with strong authority to contribute to a domain, because their person specific entity scores AuthorRank/AgentRank will apply a vector against traditional document scores to elevate content associated with strong entities. This will change the game entirely. In the longterm, how much will link building resources be reallocated to relationship building with strong entities, not websites?
The right “type” of content by entity type:
This also creates a new dimension to content as well. It’s not just about having unique content, or keyword content, or even “great” content. It starts to require needing the “right” content that would fall into the schema typical for the entity type. The robustness of known data attributes both on-site, and off-site, are likely playing a role in results such as those on Reddit. Like Rand discusses in his video about Advanced On-Page Optimization, we continue to move beyond the standard thoughts behind optimization (title, heading, on page, variations, alt text, etc.). Really, some of this isn’t much different than local search optimization, except it could be expanding to more and more objects.
Death of PageRank or Page Link Graph?
The kneejerk reaction is to wonder if I think the link graph is dying.
Not at all.
The model of modern search engines is still built off the core page link citation model, and most changes we see are additions to that – they work in parallel with this model, weight against this model, or resort against this model. PageRank still drives indexation and crawl priority rules as well.
However, I think our concept of a citation is ever growing. They now include local citations, brand mentions, and social shares. They’ll soon include rel author citations and social media profile mentions, which could help Author/Agent Rank. Entity search could mean there is value in schema citations, especially for physical real life objects.