The short answer of “yes” is dangerous.
1) Thinking about Tests
This isn’t because the tests are wrong. It’s because your implementation isn’t setup like these basic tests were. Take the time to test your implementation on a subset of pages to confirm that the content can be properly crawled and indexed before rolling out and risking revenue.
2) Google’s Own Statements
In short, Google said they’re “generally able to render and understand” and “we recommend following the principles of progressive enhancement”.
3) Other Bots & Tools
Luckily, there are some ways to split the difference and minimize the risk (maintain uniquely indexable URLs with title, meta content, and crawl control still served server-side… think of it as optimizing a Flash site).
4) Dealing with Frameworks
This is comparable to evaluating web server platforms, such as Apache vs. IIS vs. nginx, or CMS platforms, such as WordPress vs. Drupal vs. Adobe CQ. While some of these platforms are easier to work with, or come with already optimized features, they can all be made generally SEO-friendly. It’s less important what SEO plugin is used than it is what is in your HTML title tag. This is because Google is looking at the product, the code and the metrics, produced by these platforms.
5) Understanding Crawl Efficiency & Resources Restraints
Fundamentals of an HTML Crawl
In short, the process of a crawl looks like this:
- Bot makes a GET request for a page (they ask the server for the file)
- Bot downloads the raw HTML file (same as your view source)
- Search engine parses the HTML, extracting content and meta data (location, tag, attributes, etc.) associated with that content
- Content is stored (indexed), evaluated, and ranked in a variety of ways
- Initial Request – The browser (and search bot) makes a GET request for the HTML and associated assets.
- Load Event – The load event is fired by the browser when the resource and its dependent resources have finished loading. This is an important event, because it says, generally, that the page is “done.”
- Post-Load Events & User Events – The page can continue to change by content pushed to it or through user-driven events such as onClick. These are permutations on the page after it has completed.
- Google visits your webpage, as a browser
- At the load event (or after 5 seconds), they right click and select Inspect Element
- They select the HTML tag at the top
- They right click -> Copy -> Copy OuterHTML
- They use the copy and pasted HTML (the rendered content) just like they would use the HTML source
Let’s pause on that last point. Once Googlebot has the rendered content (Inspect Element HTML), it uses it like the traditional HTML source. This puts you back into your comfort zone of HTML and CSS.
This gives Google two HTML versions of the page. The pre-DOM HTML Source and the post-DOM rendered HTML. Generally, Google will use the rendered snapshot, but it may need to integrate signals between the two and deal with contradictions between the two.
When you look at the screenshot in Google’s Fetch and Render tool, what you are seeing is the rendering of the page around the time that Googlebot took the snapshot, using this rendered HTML and not the source HTML.
Importance of Events
1) Load Event
You can see this moment within Network Performance in Chrome Developer tools.
This tool shows a timeline of content loaded in the browser. The blue line denotes DOMContentLoaded event and the red line denotes the load event.
To summarize, content rendered to the page by this moment, when the snapshot of the rendered content is taken, should be indexed. Content not on the page by this moment should not be considered indexable. This even includes 3rd party community and rating/review tools. Delays in rendering 3rd party content can cause it to miss the snapshot, which causes the content to not be indexed.
You can also test this with the Fetch and Render tool. Content that comes in well after this point doesn’t appear in the screenshot.
2) User Events
Additionally, events can trigger after the load event that will make changes to the page. A common cause for these are user engagements, like tabbed content, forms, and interactive navigation. These are called user events. The most common is the onClick event.
Content that is dependent upon a user event generally does not get indexed. This new content is a permutation of a page’s content and should be considered non-canonical.
Generally, if content is in by Google’s snapshot, it’s treated just like a traditional page. Of course, there are caveats and edge cases, but Google does a really good job at this.
Here are some of the common issues we see:
- Indexable URLs – Pages still need unique, distinct, and indexable URLs. A pushState does not a URL make. There needs to be a real page, with a 200 OK server response for each individual “page” you want to have indexed. A single page app needs to allow for server-side URLs for each category, article, or product.
- Getting pushState right – Use pushState to represent a URL change. However, this should represent the canonical URL that has server-side support. pushState mistakes and loose server-side implementation can create duplicate content.
- ahref and img src – Pages still need links to them. Google’s crawl and discovery processes are, generally, the same. Put links in href attributes and images in src attributes. Google struggles with some various approaches, like putting URLs in data attributes instead of the typical HTML attribute.
- Content in by the load event (or 5 second timeout) is indexable.
- Content dependent on user events is not indexable.
- Pages require an indexable URL, with server-side support.
- Audit rendered HTML (Inspect Element) using the same SEO best practices you use on traditional pages.
- Avoid contradictions between versions.