JavaScript & SEO: Making Your Bot Experience As Good As Your User Experience

Posted by webpage development and how to code basic JavaScript.

javacssseo.gif

Image sources: 1, 2, 3

JavaScript is either placed in the HTML document within <script> tags (i.e., it is embedded in the HTML) or linked/referenced. There are currently a plethora of JavaScript libraries and frameworks, including jQuery, AngularJS, ReactJS, EmberJS, etc.

JavaScript libraries and frameworks:

What is AJAX?

AJAX, or Asynchronous JavaScript and XML, is a set of web development techniques combining JavaScript and XML that allows web applications to communicate with a server in the background without interfering with the current page. Asynchronous means that other functions or lines of code can run while the async script is running. XML used to be the primary language to pass data; however, the term AJAX is used for all types of data transfers (including JSON; I guess “AJAJ” doesn’t sound as clean as “AJAX” [pun intended]).

A common use of AJAX is to update the content or layout of a webpage without initiating a full page refresh. Normally, when a page loads, all the assets on the page must be requested and fetched from the server and then rendered on the page. However, with AJAX, only the assets that differ between pages need to be loaded, which improves the user experience as they do not have to refresh the entire page.

One can think of AJAX as mini server calls. A good example of AJAX in action is Google Maps. The page updates without a full page reload (i.e., mini server calls are being used to load content as the user navigates).

Related image

Image source

What is the Document Object Model (DOM)?

As an SEO professional, you need to understand what the DOM is, because it’s what Google is using to analyze and understand webpages.

The DOM is what you see when you “Inspect Element” in a browser. Simply put, you can think of the DOM as the steps the browser takes after receiving the HTML document to render the page.

The first thing the browser receives is the HTML document. After that, it will start parsing the content within this document and fetch additional resources, such as images, CSS, and JavaScript files.

The DOM is what forms from this parsing of information and resources. One can think of it as a structured, organized version of the webpage’s code.

Nowadays the DOM is often very different from the initial HTML document, due to what’s collectively called dynamic HTML. Dynamic HTML is the ability for a page to change its content depending on user input, environmental conditions (e.g. time of day), and other variables, leveraging HTML, CSS, and JavaScript.

Simple example with a <title> tag that is populated through JavaScript:

HTML source

DOM

What is headless browsing?

Headless browsing is simply the action of fetching webpages without the user interface. It is important to understand because Google, and now Baidu, leverage headless browsing to gain a better understanding of the user’s experience and the content of webpages.

PhantomJS and Zombie.js are scripted headless browsers, typically used for automating web interaction for testing purposes, and rendering static HTML snapshots for initial requests (pre-rendering).


Why can JavaScript be challenging for SEO? (and how to fix issues)

There are three (3) primary reasons to be concerned about JavaScript on your site:

  1. Crawlability: Bots’ ability to crawl your site.
  2. Obtainability: Bots’ ability to access information and parse your content.
  3. Perceived site latency: AKA the Critical Rendering Path.

Crawlability

Are bots able to find URLs and understand your site’s architecture? There are two important elements here:

  1. Blocking search engines from your JavaScript (even accidentally).
  2. Proper internal linking, not leveraging JavaScript events as a replacement for HTML tags.

Why is blocking JavaScript such a big deal?

If search engines are blocked from crawling JavaScript, they will not be receiving your site’s full experience. This means search engines are not seeing what the end user is seeing. This can reduce your site’s appeal to search engines and could eventually be considered cloaking (if the intent is indeed malicious).

Fetch as Google and TechnicalSEO.com’s robots.txt and Fetch and Render testing tools can help to identify resources that Googlebot is blocked from.

The easiest way to solve this problem is through providing search engines access to the resources they need to understand your user experience.

!!! Important note: Work with your development team to determine which files should and should not be accessible to search engines.

Internal linking

Internal linking should be implemented with regular anchor tags within the HTML or the DOM (using an a hrefs=”www.example.com” HTML tag) versus leveraging JavaScript functions to allow the user to traverse the site.

Essentially: Don’t use JavaScript’s onclick events as a replacement for internal linking. While end URLs might be found and crawled (through strings in JavaScript code or XML sitemaps), they won’t be associated with the global navigation of the site.

Internal linking is a strong signal to search engines regarding the site’s architecture and importance of pages. In fact, internal links are so strong that they can (in certain situations) override “SEO hints” such as canonical tags.

URL structure

Historically, JavaScript-based websites (aka “AJAX sites”) were using fragment identifiers (#) within URLs.

  • Not recommended:
    • The Lone Hash (#) – The lone pound symbol is not crawlable. It is used to identify anchor link (aka jump links). These are the links that allow one to jump to a piece of content on a page. Anything after the lone hash portion of the URL is never sent to the server and will cause the page to automatically scroll to the first element with a matching ID (or the first <a> element with a name of the following information). Google recommends avoiding the use of “#” in URLs.
    • Hashbang (#!) (and escaped_fragments URLs) – Hashbang URLs were a hack to support crawlers (Google wants to avoid now and only Bing supports). Many a moon ago, Google and Bing developed a complicated AJAX solution, whereby a pretty (#!) URL with the UX co-existed with an equivalent escaped_fragment HTML-based experience for bots. Google has since backtracked on this recommendation, preferring to receive the exact user experience. In escaped fragments, there are two experiences here:
      • Original Experience (aka Pretty URL): This URL must either have a #! (hashbang) within the URL to indicate that there is an escaped fragment or a meta element indicating that an escaped fragment exists (<meta name=”fragment” content=”!”>).
      • Escaped Fragment (aka Ugly URL, HTML snapshot): This URL replace the hashbang (#!) with “_escaped_fragment_” and serves the HTML snapshot. It is called the ugly URL because it’s long and looks like (and for all intents and purposes is) a hack.

Image result

Image source

  • Recommended:
    • pushState History API – PushState is navigation-based and part of the History API (think: your web browsing history). Essentially, pushState updates the URL in the address bar and only what needs to change on the page is updated. It allows JS sites to leverage “clean” URLs. PushState is currently supported by Google, when supporting browser navigation for client-side or hybrid rendering.
      • A good use of pushState is for infinite scroll (i.e., as the user hits new parts of the page the URL will update). Ideally, if the user refreshes the page, the experience will land them in the exact same spot. However, they do not need to refresh the page, as the content updates as they scroll down, while the URL is updated in the address bar.
      • Example: A good example of a search engine-friendly infinite scroll implementation, created by Google’s John Mueller (go figure), can be found here. He technically leverages the replaceState(), which doesn’t include the same back button functionality as pushState.
      • Read more: Mozilla PushState History API Documents

Obtainability

Search engines have been shown to employ headless browsing to render the DOM to gain a better understanding of the user’s experience and the content on page. That is to say, Google can process some JavaScript and uses the DOM (instead of the HTML document).

At the same time, there are situations where search engines struggle to comprehend JavaScript. Nobody wants a Hulu situation to happen to their site or a client’s site. It is crucial to understand how bots are interacting with your onsite content. When you aren’t sure, test.

Assuming we’re talking about a search engine bot that executes JavaScript, there are a few important elements for search engines to be able to obtain content:

  • If the user must interact for something to fire, search engines probably aren’t seeing it.
    • Google is a lazy user. It doesn’t click, it doesn’t scroll, and it doesn’t log in. If the full UX demands action from the user, special precautions should be taken to ensure that bots are receiving an equivalent experience.
  • If the JavaScript occurs after the JavaScript load event fires plus ~5-seconds*, search engines may not be seeing it.
    • *John Mueller mentioned that there is no specific timeout value; however, sites should aim to load within five seconds.
    • *Screaming Frog tests show a correlation to five seconds to render content.
    • *The load event plus five seconds is what Google’s PageSpeed Insights, Mobile Friendliness Tool, and Fetch as Google use; check out Max Prin’s test timer.
  • If there are errors within the JavaScript, both browsers and search engines won’t be able to go through and potentially miss sections of pages if the entire code is not executed.

How to make sure Google and other search engines can get your content

1. TEST

The most popular solution to resolving JavaScript is probably not resolving anything (grab a coffee and let Google work its algorithmic brilliance). Providing Google with the same experience as searchers is Google’s preferred scenario.

Google first announced being able to “better understand the web (i.e., JavaScript)” in May 2014. Industry experts suggested that Google could crawl JavaScript way before this announcement. The iPullRank team offered two great pieces on this in 2011: Googlebot is Chrome and How smart are Googlebots? (thank you, Josh and Mike). Adam Audette’s Google can crawl JavaScript and leverages the DOM in 2015 confirmed. Therefore, if you can see your content in the DOM, chances are your content is being parsed by Google.

adamaudette - I don't always JavaScript, but when I do, I know google can crawl the dom and dynamically generated HTML

Recently, Barry Goralewicz performed a cool experiment testing a combination of various JavaScript libraries and frameworks to determine how Google interacts with the pages (e.g., are they indexing URL/content? How does GSC interact? Etc.). It ultimately showed that Google is able to interact with many forms of JavaScript and highlighted certain frameworks as perhaps more challenging. John Mueller even started a JavaScript search group (from what I’ve read, it’s fairly therapeutic).

All of these studies are amazing and help SEOs understand when to be concerned and take a proactive role. However, before you determine that sitting back is the right solution for your site, I recommend being actively cautious by experimenting with small section Think: Jim Collin’s “bullets, then cannonballs” philosophy from his book Great by Choice:

“A bullet is an empirical test aimed at learning what works and meets three criteria: a bullet must be low-cost, low-risk, and low-distraction… 10Xers use bullets to empirically validate what will actually work. Based on that empirical validation, they then concentrate their resources to fire a cannonball, enabling large returns from concentrated bets.”

Consider testing and reviewing through the following:

  1. Confirm that your content is appearing within the DOM.
  2. Test a subset of pages to see if Google can index content.
  • Manually check quotes from your content.
  • Fetch with Google and see if content appears.
  • Fetch with Google supposedly occurs around the load event or before timeout. It’s a great test to check to see if Google will be able to see your content and whether or not you’re blocking JavaScript in your robots.txt. Although Fetch with Google is not foolproof, it’s a good starting point.
  • Note: If you aren’t verified in GSC, try Technicalseo.com’s Fetch and Render As Any Bot Tool.

After you’ve tested all this, what if something’s not working and search engines and bots are struggling to index and obtain your content? Perhaps you’re concerned about alternative search engines (DuckDuckGo, Facebook, LinkedIn, etc.), or maybe you’re leveraging meta information that needs to be parsed by other bots, such as Twitter summary cards or Facebook Open Graph tags. If any of this is identified in testing or presents itself as a concern, an HTML snapshot may be the only decision.

2. HTML SNAPSHOTS
What are HTmL snapshots?

HTML snapshots are a fully rendered page (as one might see in the DOM) that can be returned to search engine bots (think: a static HTML version of the DOM).

Google introduced HTML snapshots 2009, deprecated (but still supported) them in 2015, and awkwardly mentioned them as an element to “avoid” in late 2016. HTML snapshots are a contentious topic with Google. However, they’re important to understand, because in certain situations they’re necessary.

If search engines (or sites like Facebook) cannot grasp your JavaScript, it’s better to return an HTML snapshot than not to have your content indexed and understood at all. Ideally, your site would leverage some form of user-agent detection on the server side and return the HTML snapshot to the bot.

At the same time, one must recognize that Google wants the same experience as the user (i.e., only provide Google with an HTML snapshot if the tests are dire and the JavaScript search group cannot provide support for your situation).

Considerations

When considering HTML snapshots, you must consider that Google has deprecated this AJAX recommendation. Although Google technically still supports it, Google recommends avoiding it. Yes, Google changed its mind and now want to receive the same experience as the user. This direction makes sense, as it allows the bot to receive an experience more true to the user experience.

A second consideration factor relates to the risk of cloaking. If the HTML snapshots are found to not represent the experience on the page, it’s considered a cloaking risk. Straight from the source:

“The HTML snapshot must contain the same content as the end user would see in a browser. If this is not the case, it may be considered cloaking.”
– Google Developer AJAX Crawling FAQs

Benefits

Despite the considerations, HTML snapshots have powerful advantages:

  1. Knowledge that search engines and crawlers will be able to understand the experience.
    • Certain types of JavaScript may be harder for Google to grasp (cough… Angular (also colloquially referred to as AngularJS 2) …cough).
  2. Other search engines and crawlers (think: Bing, Facebook) will be able to understand the experience.
    • Bing, among other search engines, has not stated that it can crawl and index JavaScript. HTML snapshots may be the only solution for a JavaScript-heavy site. As always, test to make sure that this is the case before diving in.

"It's not just Google understanding your JavaScript. It's also about the speed." -DOM - "It's not just about Google understanding your Javascript. it's also about your perceived latency." -DOM

Site latency

When browsers receive an HTML document and create the DOM (although there is some level of pre-scanning), most resources are loaded as they appear within the HTML document. This means that if you have a huge file toward the top of your HTML document, a browser will load that immense file first.

The concept of Google’s critical rendering path is to load what the user needs as soon as possible, which can be translated to → “get everything above-the-fold in front of the user, ASAP.”

Critical Rendering Path – Optimized Rendering Loads Progressively ASAP:

progressive page rendering

Image source

However, if you have unnecessary resources or JavaScript files clogging up the page’s ability to load, you get “render-blocking JavaScript.” Meaning: your JavaScript is blocking the page’s potential to appear as if it’s loading faster (also called: perceived latency).

Render-blocking JavaScript – Solutions

If you analyze your page speed results (through tools like Page Speed Insights Tool, WebPageTest.org, CatchPoint, etc.) and determine that there is a render-blocking JavaScript issue, here are three potential solutions:

  1. Inline: Add the JavaScript in the HTML document.
  2. Async: Make JavaScript asynchronous (i.e., add “async” attribute to HTML tag).
  3. Defer: By placing JavaScript lower within the HTML.

!!! Important note: It’s important to understand that scripts must be arranged in order of precedence. Scripts that are used to load the above-the-fold content must be prioritized and should not be deferred. Also, any script that references another file can only be used after the referenced file has loaded. Make sure to work closely with your development team to confirm that there are no interruptions to the user’s experience.

Read more: Google Developer’s Speed Documentation


TL;DR – Moral of the story

Crawlers and search engines will do their best to crawl, execute, and interpret your JavaScript, but it is not guaranteed. Make sure your content is crawlable, obtainable, and isn’t developing site latency obstructions. The key = every situation demands testing. Based on the results, evaluate potential solutions.

Thanks: Thank you Max Prin (@maxxeight) for reviewing this content piece and sharing your knowledge, insight, and wisdom. It wouldn’t be the same without you.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Posted by webpage development and how to code basic JavaScript.

javacssseo.gif

Image sources: 1, 2, 3

JavaScript is either placed in the HTML document within <script> tags (i.e., it is embedded in the HTML) or linked/referenced. There are currently a plethora of JavaScript libraries and frameworks, including jQuery, AngularJS, ReactJS, EmberJS, etc.

JavaScript libraries and frameworks:

What is AJAX?

AJAX, or Asynchronous JavaScript and XML, is a set of web development techniques combining JavaScript and XML that allows web applications to communicate with a server in the background without interfering with the current page. Asynchronous means that other functions or lines of code can run while the async script is running. XML used to be the primary language to pass data; however, the term AJAX is used for all types of data transfers (including JSON; I guess “AJAJ” doesn’t sound as clean as “AJAX” [pun intended]).

A common use of AJAX is to update the content or layout of a webpage without initiating a full page refresh. Normally, when a page loads, all the assets on the page must be requested and fetched from the server and then rendered on the page. However, with AJAX, only the assets that differ between pages need to be loaded, which improves the user experience as they do not have to refresh the entire page.

One can think of AJAX as mini server calls. A good example of AJAX in action is Google Maps. The page updates without a full page reload (i.e., mini server calls are being used to load content as the user navigates).

Related image

Image source

What is the Document Object Model (DOM)?

As an SEO professional, you need to understand what the DOM is, because it’s what Google is using to analyze and understand webpages.

The DOM is what you see when you “Inspect Element” in a browser. Simply put, you can think of the DOM as the steps the browser takes after receiving the HTML document to render the page.

The first thing the browser receives is the HTML document. After that, it will start parsing the content within this document and fetch additional resources, such as images, CSS, and JavaScript files.

The DOM is what forms from this parsing of information and resources. One can think of it as a structured, organized version of the webpage’s code.

Nowadays the DOM is often very different from the initial HTML document, due to what’s collectively called dynamic HTML. Dynamic HTML is the ability for a page to change its content depending on user input, environmental conditions (e.g. time of day), and other variables, leveraging HTML, CSS, and JavaScript.

Simple example with a <title> tag that is populated through JavaScript:

HTML source

DOM

What is headless browsing?

Headless browsing is simply the action of fetching webpages without the user interface. It is important to understand because Google, and now Baidu, leverage headless browsing to gain a better understanding of the user’s experience and the content of webpages.

PhantomJS and Zombie.js are scripted headless browsers, typically used for automating web interaction for testing purposes, and rendering static HTML snapshots for initial requests (pre-rendering).


Why can JavaScript be challenging for SEO? (and how to fix issues)

There are three (3) primary reasons to be concerned about JavaScript on your site:

  1. Crawlability: Bots’ ability to crawl your site.
  2. Obtainability: Bots’ ability to access information and parse your content.
  3. Perceived site latency: AKA the Critical Rendering Path.

Crawlability

Are bots able to find URLs and understand your site’s architecture? There are two important elements here:

  1. Blocking search engines from your JavaScript (even accidentally).
  2. Proper internal linking, not leveraging JavaScript events as a replacement for HTML tags.

Why is blocking JavaScript such a big deal?

If search engines are blocked from crawling JavaScript, they will not be receiving your site’s full experience. This means search engines are not seeing what the end user is seeing. This can reduce your site’s appeal to search engines and could eventually be considered cloaking (if the intent is indeed malicious).

Fetch as Google and TechnicalSEO.com’s robots.txt and Fetch and Render testing tools can help to identify resources that Googlebot is blocked from.

The easiest way to solve this problem is through providing search engines access to the resources they need to understand your user experience.

!!! Important note: Work with your development team to determine which files should and should not be accessible to search engines.

Internal linking

Internal linking should be implemented with regular anchor tags within the HTML or the DOM (using an a hrefs=”www.example.com” HTML tag) versus leveraging JavaScript functions to allow the user to traverse the site.

Essentially: Don’t use JavaScript’s onclick events as a replacement for internal linking. While end URLs might be found and crawled (through strings in JavaScript code or XML sitemaps), they won’t be associated with the global navigation of the site.

Internal linking is a strong signal to search engines regarding the site’s architecture and importance of pages. In fact, internal links are so strong that they can (in certain situations) override “SEO hints” such as canonical tags.

URL structure

Historically, JavaScript-based websites (aka “AJAX sites”) were using fragment identifiers (#) within URLs.

  • Not recommended:
    • The Lone Hash (#) – The lone pound symbol is not crawlable. It is used to identify anchor link (aka jump links). These are the links that allow one to jump to a piece of content on a page. Anything after the lone hash portion of the URL is never sent to the server and will cause the page to automatically scroll to the first element with a matching ID (or the first <a> element with a name of the following information). Google recommends avoiding the use of “#” in URLs.
    • Hashbang (#!) (and escaped_fragments URLs) – Hashbang URLs were a hack to support crawlers (Google wants to avoid now and only Bing supports). Many a moon ago, Google and Bing developed a complicated AJAX solution, whereby a pretty (#!) URL with the UX co-existed with an equivalent escaped_fragment HTML-based experience for bots. Google has since backtracked on this recommendation, preferring to receive the exact user experience. In escaped fragments, there are two experiences here:
      • Original Experience (aka Pretty URL): This URL must either have a #! (hashbang) within the URL to indicate that there is an escaped fragment or a meta element indicating that an escaped fragment exists (<meta name=”fragment” content=”!”>).
      • Escaped Fragment (aka Ugly URL, HTML snapshot): This URL replace the hashbang (#!) with “_escaped_fragment_” and serves the HTML snapshot. It is called the ugly URL because it’s long and looks like (and for all intents and purposes is) a hack.

Image result

Image source

  • Recommended:
    • pushState History API – PushState is navigation-based and part of the History API (think: your web browsing history). Essentially, pushState updates the URL in the address bar and only what needs to change on the page is updated. It allows JS sites to leverage “clean” URLs. PushState is currently supported by Google, when supporting browser navigation for client-side or hybrid rendering.
      • A good use of pushState is for infinite scroll (i.e., as the user hits new parts of the page the URL will update). Ideally, if the user refreshes the page, the experience will land them in the exact same spot. However, they do not need to refresh the page, as the content updates as they scroll down, while the URL is updated in the address bar.
      • Example: A good example of a search engine-friendly infinite scroll implementation, created by Google’s John Mueller (go figure), can be found here. He technically leverages the replaceState(), which doesn’t include the same back button functionality as pushState.
      • Read more: Mozilla PushState History API Documents

Obtainability

Search engines have been shown to employ headless browsing to render the DOM to gain a better understanding of the user’s experience and the content on page. That is to say, Google can process some JavaScript and uses the DOM (instead of the HTML document).

At the same time, there are situations where search engines struggle to comprehend JavaScript. Nobody wants a Hulu situation to happen to their site or a client’s site. It is crucial to understand how bots are interacting with your onsite content. When you aren’t sure, test.

Assuming we’re talking about a search engine bot that executes JavaScript, there are a few important elements for search engines to be able to obtain content:

  • If the user must interact for something to fire, search engines probably aren’t seeing it.
    • Google is a lazy user. It doesn’t click, it doesn’t scroll, and it doesn’t log in. If the full UX demands action from the user, special precautions should be taken to ensure that bots are receiving an equivalent experience.
  • If the JavaScript occurs after the JavaScript load event fires plus ~5-seconds*, search engines may not be seeing it.
    • *John Mueller mentioned that there is no specific timeout value; however, sites should aim to load within five seconds.
    • *Screaming Frog tests show a correlation to five seconds to render content.
    • *The load event plus five seconds is what Google’s PageSpeed Insights, Mobile Friendliness Tool, and Fetch as Google use; check out Max Prin’s test timer.
  • If there are errors within the JavaScript, both browsers and search engines won’t be able to go through and potentially miss sections of pages if the entire code is not executed.

How to make sure Google and other search engines can get your content

1. TEST

The most popular solution to resolving JavaScript is probably not resolving anything (grab a coffee and let Google work its algorithmic brilliance). Providing Google with the same experience as searchers is Google’s preferred scenario.

Google first announced being able to “better understand the web (i.e., JavaScript)” in May 2014. Industry experts suggested that Google could crawl JavaScript way before this announcement. The iPullRank team offered two great pieces on this in 2011: Googlebot is Chrome and How smart are Googlebots? (thank you, Josh and Mike). Adam Audette’s Google can crawl JavaScript and leverages the DOM in 2015 confirmed. Therefore, if you can see your content in the DOM, chances are your content is being parsed by Google.

adamaudette - I don't always JavaScript, but when I do, I know google can crawl the dom and dynamically generated HTML

Recently, Barry Goralewicz performed a cool experiment testing a combination of various JavaScript libraries and frameworks to determine how Google interacts with the pages (e.g., are they indexing URL/content? How does GSC interact? Etc.). It ultimately showed that Google is able to interact with many forms of JavaScript and highlighted certain frameworks as perhaps more challenging. John Mueller even started a JavaScript search group (from what I’ve read, it’s fairly therapeutic).

All of these studies are amazing and help SEOs understand when to be concerned and take a proactive role. However, before you determine that sitting back is the right solution for your site, I recommend being actively cautious by experimenting with small section Think: Jim Collin’s “bullets, then cannonballs” philosophy from his book Great by Choice:

“A bullet is an empirical test aimed at learning what works and meets three criteria: a bullet must be low-cost, low-risk, and low-distraction… 10Xers use bullets to empirically validate what will actually work. Based on that empirical validation, they then concentrate their resources to fire a cannonball, enabling large returns from concentrated bets.”

Consider testing and reviewing through the following:

  1. Confirm that your content is appearing within the DOM.
  2. Test a subset of pages to see if Google can index content.
  • Manually check quotes from your content.
  • Fetch with Google and see if content appears.
  • Fetch with Google supposedly occurs around the load event or before timeout. It’s a great test to check to see if Google will be able to see your content and whether or not you’re blocking JavaScript in your robots.txt. Although Fetch with Google is not foolproof, it’s a good starting point.
  • Note: If you aren’t verified in GSC, try Technicalseo.com’s Fetch and Render As Any Bot Tool.

After you’ve tested all this, what if something’s not working and search engines and bots are struggling to index and obtain your content? Perhaps you’re concerned about alternative search engines (DuckDuckGo, Facebook, LinkedIn, etc.), or maybe you’re leveraging meta information that needs to be parsed by other bots, such as Twitter summary cards or Facebook Open Graph tags. If any of this is identified in testing or presents itself as a concern, an HTML snapshot may be the only decision.

2. HTML SNAPSHOTS
What are HTmL snapshots?

HTML snapshots are a fully rendered page (as one might see in the DOM) that can be returned to search engine bots (think: a static HTML version of the DOM).

Google introduced HTML snapshots 2009, deprecated (but still supported) them in 2015, and awkwardly mentioned them as an element to “avoid” in late 2016. HTML snapshots are a contentious topic with Google. However, they’re important to understand, because in certain situations they’re necessary.

If search engines (or sites like Facebook) cannot grasp your JavaScript, it’s better to return an HTML snapshot than not to have your content indexed and understood at all. Ideally, your site would leverage some form of user-agent detection on the server side and return the HTML snapshot to the bot.

At the same time, one must recognize that Google wants the same experience as the user (i.e., only provide Google with an HTML snapshot if the tests are dire and the JavaScript search group cannot provide support for your situation).

Considerations

When considering HTML snapshots, you must consider that Google has deprecated this AJAX recommendation. Although Google technically still supports it, Google recommends avoiding it. Yes, Google changed its mind and now want to receive the same experience as the user. This direction makes sense, as it allows the bot to receive an experience more true to the user experience.

A second consideration factor relates to the risk of cloaking. If the HTML snapshots are found to not represent the experience on the page, it’s considered a cloaking risk. Straight from the source:

“The HTML snapshot must contain the same content as the end user would see in a browser. If this is not the case, it may be considered cloaking.”
– Google Developer AJAX Crawling FAQs

Benefits

Despite the considerations, HTML snapshots have powerful advantages:

  1. Knowledge that search engines and crawlers will be able to understand the experience.
    • Certain types of JavaScript may be harder for Google to grasp (cough… Angular (also colloquially referred to as AngularJS 2) …cough).
  2. Other search engines and crawlers (think: Bing, Facebook) will be able to understand the experience.
    • Bing, among other search engines, has not stated that it can crawl and index JavaScript. HTML snapshots may be the only solution for a JavaScript-heavy site. As always, test to make sure that this is the case before diving in.

"It's not just Google understanding your JavaScript. It's also about the speed." -DOM - "It's not just about Google understanding your Javascript. it's also about your perceived latency." -DOM

Site latency

When browsers receive an HTML document and create the DOM (although there is some level of pre-scanning), most resources are loaded as they appear within the HTML document. This means that if you have a huge file toward the top of your HTML document, a browser will load that immense file first.

The concept of Google’s critical rendering path is to load what the user needs as soon as possible, which can be translated to → “get everything above-the-fold in front of the user, ASAP.”

Critical Rendering Path – Optimized Rendering Loads Progressively ASAP:

progressive page rendering

Image source

However, if you have unnecessary resources or JavaScript files clogging up the page’s ability to load, you get “render-blocking JavaScript.” Meaning: your JavaScript is blocking the page’s potential to appear as if it’s loading faster (also called: perceived latency).

Render-blocking JavaScript – Solutions

If you analyze your page speed results (through tools like Page Speed Insights Tool, WebPageTest.org, CatchPoint, etc.) and determine that there is a render-blocking JavaScript issue, here are three potential solutions:

  1. Inline: Add the JavaScript in the HTML document.
  2. Async: Make JavaScript asynchronous (i.e., add “async” attribute to HTML tag).
  3. Defer: By placing JavaScript lower within the HTML.

!!! Important note: It’s important to understand that scripts must be arranged in order of precedence. Scripts that are used to load the above-the-fold content must be prioritized and should not be deferred. Also, any script that references another file can only be used after the referenced file has loaded. Make sure to work closely with your development team to confirm that there are no interruptions to the user’s experience.

Read more: Google Developer’s Speed Documentation


TL;DR – Moral of the story

Crawlers and search engines will do their best to crawl, execute, and interpret your JavaScript, but it is not guaranteed. Make sure your content is crawlable, obtainable, and isn’t developing site latency obstructions. The key = every situation demands testing. Based on the results, evaluate potential solutions.

Thanks: Thank you Max Prin (@maxxeight) for reviewing this content piece and sharing your knowledge, insight, and wisdom. It wouldn’t be the same without you.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

“”

JavaScript & SEO: Making Your Bot Experience As Good As Your User Experience

Posted by webpage development and how to code basic JavaScript.

javacssseo.gif

Image sources: 1, 2, 3

JavaScript is either placed in the HTML document within <script> tags (i.e., it is embedded in the HTML) or linked/referenced. There are currently a plethora of JavaScript libraries and frameworks, including jQuery, AngularJS, ReactJS, EmberJS, etc.

JavaScript libraries and frameworks:

What is AJAX?

AJAX, or Asynchronous JavaScript and XML, is a set of web development techniques combining JavaScript and XML that allows web applications to communicate with a server in the background without interfering with the current page. Asynchronous means that other functions or lines of code can run while the async script is running. XML used to be the primary language to pass data; however, the term AJAX is used for all types of data transfers (including JSON; I guess “AJAJ” doesn’t sound as clean as “AJAX” [pun intended]).

A common use of AJAX is to update the content or layout of a webpage without initiating a full page refresh. Normally, when a page loads, all the assets on the page must be requested and fetched from the server and then rendered on the page. However, with AJAX, only the assets that differ between pages need to be loaded, which improves the user experience as they do not have to refresh the entire page.

One can think of AJAX as mini server calls. A good example of AJAX in action is Google Maps. The page updates without a full page reload (i.e., mini server calls are being used to load content as the user navigates).

Related image

Image source

What is the Document Object Model (DOM)?

As an SEO professional, you need to understand what the DOM is, because it’s what Google is using to analyze and understand webpages.

The DOM is what you see when you “Inspect Element” in a browser. Simply put, you can think of the DOM as the steps the browser takes after receiving the HTML document to render the page.

The first thing the browser receives is the HTML document. After that, it will start parsing the content within this document and fetch additional resources, such as images, CSS, and JavaScript files.

The DOM is what forms from this parsing of information and resources. One can think of it as a structured, organized version of the webpage’s code.

Nowadays the DOM is often very different from the initial HTML document, due to what’s collectively called dynamic HTML. Dynamic HTML is the ability for a page to change its content depending on user input, environmental conditions (e.g. time of day), and other variables, leveraging HTML, CSS, and JavaScript.

Simple example with a <title> tag that is populated through JavaScript:

HTML source

DOM

What is headless browsing?

Headless browsing is simply the action of fetching webpages without the user interface. It is important to understand because Google, and now Baidu, leverage headless browsing to gain a better understanding of the user’s experience and the content of webpages.

PhantomJS and Zombie.js are scripted headless browsers, typically used for automating web interaction for testing purposes, and rendering static HTML snapshots for initial requests (pre-rendering).


Why can JavaScript be challenging for SEO? (and how to fix issues)

There are three (3) primary reasons to be concerned about JavaScript on your site:

  1. Crawlability: Bots’ ability to crawl your site.
  2. Obtainability: Bots’ ability to access information and parse your content.
  3. Perceived site latency: AKA the Critical Rendering Path.

Crawlability

Are bots able to find URLs and understand your site’s architecture? There are two important elements here:

  1. Blocking search engines from your JavaScript (even accidentally).
  2. Proper internal linking, not leveraging JavaScript events as a replacement for HTML tags.

Why is blocking JavaScript such a big deal?

If search engines are blocked from crawling JavaScript, they will not be receiving your site’s full experience. This means search engines are not seeing what the end user is seeing. This can reduce your site’s appeal to search engines and could eventually be considered cloaking (if the intent is indeed malicious).

Fetch as Google and TechnicalSEO.com’s robots.txt and Fetch and Render testing tools can help to identify resources that Googlebot is blocked from.

The easiest way to solve this problem is through providing search engines access to the resources they need to understand your user experience.

!!! Important note: Work with your development team to determine which files should and should not be accessible to search engines.

Internal linking

Internal linking should be implemented with regular anchor tags within the HTML or the DOM (using an a hrefs=”www.example.com” HTML tag) versus leveraging JavaScript functions to allow the user to traverse the site.

Essentially: Don’t use JavaScript’s onclick events as a replacement for internal linking. While end URLs might be found and crawled (through strings in JavaScript code or XML sitemaps), they won’t be associated with the global navigation of the site.

Internal linking is a strong signal to search engines regarding the site’s architecture and importance of pages. In fact, internal links are so strong that they can (in certain situations) override “SEO hints” such as canonical tags.

URL structure

Historically, JavaScript-based websites (aka “AJAX sites”) were using fragment identifiers (#) within URLs.

  • Not recommended:
    • The Lone Hash (#) – The lone pound symbol is not crawlable. It is used to identify anchor link (aka jump links). These are the links that allow one to jump to a piece of content on a page. Anything after the lone hash portion of the URL is never sent to the server and will cause the page to automatically scroll to the first element with a matching ID (or the first <a> element with a name of the following information). Google recommends avoiding the use of “#” in URLs.
    • Hashbang (#!) (and escaped_fragments URLs) – Hashbang URLs were a hack to support crawlers (Google wants to avoid now and only Bing supports). Many a moon ago, Google and Bing developed a complicated AJAX solution, whereby a pretty (#!) URL with the UX co-existed with an equivalent escaped_fragment HTML-based experience for bots. Google has since backtracked on this recommendation, preferring to receive the exact user experience. In escaped fragments, there are two experiences here:
      • Original Experience (aka Pretty URL): This URL must either have a #! (hashbang) within the URL to indicate that there is an escaped fragment or a meta element indicating that an escaped fragment exists (<meta name=”fragment” content=”!”>).
      • Escaped Fragment (aka Ugly URL, HTML snapshot): This URL replace the hashbang (#!) with “_escaped_fragment_” and serves the HTML snapshot. It is called the ugly URL because it’s long and looks like (and for all intents and purposes is) a hack.

Image result

Image source

  • Recommended:
    • pushState History API – PushState is navigation-based and part of the History API (think: your web browsing history). Essentially, pushState updates the URL in the address bar and only what needs to change on the page is updated. It allows JS sites to leverage “clean” URLs. PushState is currently supported by Google, when supporting browser navigation for client-side or hybrid rendering.
      • A good use of pushState is for infinite scroll (i.e., as the user hits new parts of the page the URL will update). Ideally, if the user refreshes the page, the experience will land them in the exact same spot. However, they do not need to refresh the page, as the content updates as they scroll down, while the URL is updated in the address bar.
      • Example: A good example of a search engine-friendly infinite scroll implementation, created by Google’s John Mueller (go figure), can be found here. He technically leverages the replaceState(), which doesn’t include the same back button functionality as pushState.
      • Read more: Mozilla PushState History API Documents

Obtainability

Search engines have been shown to employ headless browsing to render the DOM to gain a better understanding of the user’s experience and the content on page. That is to say, Google can process some JavaScript and uses the DOM (instead of the HTML document).

At the same time, there are situations where search engines struggle to comprehend JavaScript. Nobody wants a Hulu situation to happen to their site or a client’s site. It is crucial to understand how bots are interacting with your onsite content. When you aren’t sure, test.

Assuming we’re talking about a search engine bot that executes JavaScript, there are a few important elements for search engines to be able to obtain content:

  • If the user must interact for something to fire, search engines probably aren’t seeing it.
    • Google is a lazy user. It doesn’t click, it doesn’t scroll, and it doesn’t log in. If the full UX demands action from the user, special precautions should be taken to ensure that bots are receiving an equivalent experience.
  • If the JavaScript occurs after the JavaScript load event fires plus ~5-seconds*, search engines may not be seeing it.
    • *John Mueller mentioned that there is no specific timeout value; however, sites should aim to load within five seconds.
    • *Screaming Frog tests show a correlation to five seconds to render content.
    • *The load event plus five seconds is what Google’s PageSpeed Insights, Mobile Friendliness Tool, and Fetch as Google use; check out Max Prin’s test timer.
  • If there are errors within the JavaScript, both browsers and search engines won’t be able to go through and potentially miss sections of pages if the entire code is not executed.

How to make sure Google and other search engines can get your content

1. TEST

The most popular solution to resolving JavaScript is probably not resolving anything (grab a coffee and let Google work its algorithmic brilliance). Providing Google with the same experience as searchers is Google’s preferred scenario.

Google first announced being able to “better understand the web (i.e., JavaScript)” in May 2014. Industry experts suggested that Google could crawl JavaScript way before this announcement. The iPullRank team offered two great pieces on this in 2011: Googlebot is Chrome and How smart are Googlebots? (thank you, Josh and Mike). Adam Audette’s Google can crawl JavaScript and leverages the DOM in 2015 confirmed. Therefore, if you can see your content in the DOM, chances are your content is being parsed by Google.

adamaudette - I don't always JavaScript, but when I do, I know google can crawl the dom and dynamically generated HTML

Recently, Barry Goralewicz performed a cool experiment testing a combination of various JavaScript libraries and frameworks to determine how Google interacts with the pages (e.g., are they indexing URL/content? How does GSC interact? Etc.). It ultimately showed that Google is able to interact with many forms of JavaScript and highlighted certain frameworks as perhaps more challenging. John Mueller even started a JavaScript search group (from what I’ve read, it’s fairly therapeutic).

All of these studies are amazing and help SEOs understand when to be concerned and take a proactive role. However, before you determine that sitting back is the right solution for your site, I recommend being actively cautious by experimenting with small section Think: Jim Collin’s “bullets, then cannonballs” philosophy from his book Great by Choice:

“A bullet is an empirical test aimed at learning what works and meets three criteria: a bullet must be low-cost, low-risk, and low-distraction… 10Xers use bullets to empirically validate what will actually work. Based on that empirical validation, they then concentrate their resources to fire a cannonball, enabling large returns from concentrated bets.”

Consider testing and reviewing through the following:

  1. Confirm that your content is appearing within the DOM.
  2. Test a subset of pages to see if Google can index content.
  • Manually check quotes from your content.
  • Fetch with Google and see if content appears.
  • Fetch with Google supposedly occurs around the load event or before timeout. It’s a great test to check to see if Google will be able to see your content and whether or not you’re blocking JavaScript in your robots.txt. Although Fetch with Google is not foolproof, it’s a good starting point.
  • Note: If you aren’t verified in GSC, try Technicalseo.com’s Fetch and Render As Any Bot Tool.

After you’ve tested all this, what if something’s not working and search engines and bots are struggling to index and obtain your content? Perhaps you’re concerned about alternative search engines (DuckDuckGo, Facebook, LinkedIn, etc.), or maybe you’re leveraging meta information that needs to be parsed by other bots, such as Twitter summary cards or Facebook Open Graph tags. If any of this is identified in testing or presents itself as a concern, an HTML snapshot may be the only decision.

2. HTML SNAPSHOTS
What are HTmL snapshots?

HTML snapshots are a fully rendered page (as one might see in the DOM) that can be returned to search engine bots (think: a static HTML version of the DOM).

Google introduced HTML snapshots 2009, deprecated (but still supported) them in 2015, and awkwardly mentioned them as an element to “avoid” in late 2016. HTML snapshots are a contentious topic with Google. However, they’re important to understand, because in certain situations they’re necessary.

If search engines (or sites like Facebook) cannot grasp your JavaScript, it’s better to return an HTML snapshot than not to have your content indexed and understood at all. Ideally, your site would leverage some form of user-agent detection on the server side and return the HTML snapshot to the bot.

At the same time, one must recognize that Google wants the same experience as the user (i.e., only provide Google with an HTML snapshot if the tests are dire and the JavaScript search group cannot provide support for your situation).

Considerations

When considering HTML snapshots, you must consider that Google has deprecated this AJAX recommendation. Although Google technically still supports it, Google recommends avoiding it. Yes, Google changed its mind and now want to receive the same experience as the user. This direction makes sense, as it allows the bot to receive an experience more true to the user experience.

A second consideration factor relates to the risk of cloaking. If the HTML snapshots are found to not represent the experience on the page, it’s considered a cloaking risk. Straight from the source:

“The HTML snapshot must contain the same content as the end user would see in a browser. If this is not the case, it may be considered cloaking.”
– Google Developer AJAX Crawling FAQs

Benefits

Despite the considerations, HTML snapshots have powerful advantages:

  1. Knowledge that search engines and crawlers will be able to understand the experience.
    • Certain types of JavaScript may be harder for Google to grasp (cough… Angular (also colloquially referred to as AngularJS 2) …cough).
  2. Other search engines and crawlers (think: Bing, Facebook) will be able to understand the experience.
    • Bing, among other search engines, has not stated that it can crawl and index JavaScript. HTML snapshots may be the only solution for a JavaScript-heavy site. As always, test to make sure that this is the case before diving in.

"It's not just Google understanding your JavaScript. It's also about the speed." -DOM - "It's not just about Google understanding your Javascript. it's also about your perceived latency." -DOM

Site latency

When browsers receive an HTML document and create the DOM (although there is some level of pre-scanning), most resources are loaded as they appear within the HTML document. This means that if you have a huge file toward the top of your HTML document, a browser will load that immense file first.

The concept of Google’s critical rendering path is to load what the user needs as soon as possible, which can be translated to → “get everything above-the-fold in front of the user, ASAP.”

Critical Rendering Path – Optimized Rendering Loads Progressively ASAP:

progressive page rendering

Image source

However, if you have unnecessary resources or JavaScript files clogging up the page’s ability to load, you get “render-blocking JavaScript.” Meaning: your JavaScript is blocking the page’s potential to appear as if it’s loading faster (also called: perceived latency).

Render-blocking JavaScript – Solutions

If you analyze your page speed results (through tools like Page Speed Insights Tool, WebPageTest.org, CatchPoint, etc.) and determine that there is a render-blocking JavaScript issue, here are three potential solutions:

  1. Inline: Add the JavaScript in the HTML document.
  2. Async: Make JavaScript asynchronous (i.e., add “async” attribute to HTML tag).
  3. Defer: By placing JavaScript lower within the HTML.

!!! Important note: It’s important to understand that scripts must be arranged in order of precedence. Scripts that are used to load the above-the-fold content must be prioritized and should not be deferred. Also, any script that references another file can only be used after the referenced file has loaded. Make sure to work closely with your development team to confirm that there are no interruptions to the user’s experience.

Read more: Google Developer’s Speed Documentation


TL;DR – Moral of the story

Crawlers and search engines will do their best to crawl, execute, and interpret your JavaScript, but it is not guaranteed. Make sure your content is crawlable, obtainable, and isn’t developing site latency obstructions. The key = every situation demands testing. Based on the results, evaluate potential solutions.

Thanks: Thank you Max Prin (@maxxeight) for reviewing this content piece and sharing your knowledge, insight, and wisdom. It wouldn’t be the same without you.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Posted by webpage development and how to code basic JavaScript.

javacssseo.gif

Image sources: 1, 2, 3

JavaScript is either placed in the HTML document within <script> tags (i.e., it is embedded in the HTML) or linked/referenced. There are currently a plethora of JavaScript libraries and frameworks, including jQuery, AngularJS, ReactJS, EmberJS, etc.

JavaScript libraries and frameworks:

What is AJAX?

AJAX, or Asynchronous JavaScript and XML, is a set of web development techniques combining JavaScript and XML that allows web applications to communicate with a server in the background without interfering with the current page. Asynchronous means that other functions or lines of code can run while the async script is running. XML used to be the primary language to pass data; however, the term AJAX is used for all types of data transfers (including JSON; I guess “AJAJ” doesn’t sound as clean as “AJAX” [pun intended]).

A common use of AJAX is to update the content or layout of a webpage without initiating a full page refresh. Normally, when a page loads, all the assets on the page must be requested and fetched from the server and then rendered on the page. However, with AJAX, only the assets that differ between pages need to be loaded, which improves the user experience as they do not have to refresh the entire page.

One can think of AJAX as mini server calls. A good example of AJAX in action is Google Maps. The page updates without a full page reload (i.e., mini server calls are being used to load content as the user navigates).

Related image

Image source

What is the Document Object Model (DOM)?

As an SEO professional, you need to understand what the DOM is, because it’s what Google is using to analyze and understand webpages.

The DOM is what you see when you “Inspect Element” in a browser. Simply put, you can think of the DOM as the steps the browser takes after receiving the HTML document to render the page.

The first thing the browser receives is the HTML document. After that, it will start parsing the content within this document and fetch additional resources, such as images, CSS, and JavaScript files.

The DOM is what forms from this parsing of information and resources. One can think of it as a structured, organized version of the webpage’s code.

Nowadays the DOM is often very different from the initial HTML document, due to what’s collectively called dynamic HTML. Dynamic HTML is the ability for a page to change its content depending on user input, environmental conditions (e.g. time of day), and other variables, leveraging HTML, CSS, and JavaScript.

Simple example with a <title> tag that is populated through JavaScript:

HTML source

DOM

What is headless browsing?

Headless browsing is simply the action of fetching webpages without the user interface. It is important to understand because Google, and now Baidu, leverage headless browsing to gain a better understanding of the user’s experience and the content of webpages.

PhantomJS and Zombie.js are scripted headless browsers, typically used for automating web interaction for testing purposes, and rendering static HTML snapshots for initial requests (pre-rendering).


Why can JavaScript be challenging for SEO? (and how to fix issues)

There are three (3) primary reasons to be concerned about JavaScript on your site:

  1. Crawlability: Bots’ ability to crawl your site.
  2. Obtainability: Bots’ ability to access information and parse your content.
  3. Perceived site latency: AKA the Critical Rendering Path.

Crawlability

Are bots able to find URLs and understand your site’s architecture? There are two important elements here:

  1. Blocking search engines from your JavaScript (even accidentally).
  2. Proper internal linking, not leveraging JavaScript events as a replacement for HTML tags.

Why is blocking JavaScript such a big deal?

If search engines are blocked from crawling JavaScript, they will not be receiving your site’s full experience. This means search engines are not seeing what the end user is seeing. This can reduce your site’s appeal to search engines and could eventually be considered cloaking (if the intent is indeed malicious).

Fetch as Google and TechnicalSEO.com’s robots.txt and Fetch and Render testing tools can help to identify resources that Googlebot is blocked from.

The easiest way to solve this problem is through providing search engines access to the resources they need to understand your user experience.

!!! Important note: Work with your development team to determine which files should and should not be accessible to search engines.

Internal linking

Internal linking should be implemented with regular anchor tags within the HTML or the DOM (using an a hrefs=”www.example.com” HTML tag) versus leveraging JavaScript functions to allow the user to traverse the site.

Essentially: Don’t use JavaScript’s onclick events as a replacement for internal linking. While end URLs might be found and crawled (through strings in JavaScript code or XML sitemaps), they won’t be associated with the global navigation of the site.

Internal linking is a strong signal to search engines regarding the site’s architecture and importance of pages. In fact, internal links are so strong that they can (in certain situations) override “SEO hints” such as canonical tags.

URL structure

Historically, JavaScript-based websites (aka “AJAX sites”) were using fragment identifiers (#) within URLs.

  • Not recommended:
    • The Lone Hash (#) – The lone pound symbol is not crawlable. It is used to identify anchor link (aka jump links). These are the links that allow one to jump to a piece of content on a page. Anything after the lone hash portion of the URL is never sent to the server and will cause the page to automatically scroll to the first element with a matching ID (or the first <a> element with a name of the following information). Google recommends avoiding the use of “#” in URLs.
    • Hashbang (#!) (and escaped_fragments URLs) – Hashbang URLs were a hack to support crawlers (Google wants to avoid now and only Bing supports). Many a moon ago, Google and Bing developed a complicated AJAX solution, whereby a pretty (#!) URL with the UX co-existed with an equivalent escaped_fragment HTML-based experience for bots. Google has since backtracked on this recommendation, preferring to receive the exact user experience. In escaped fragments, there are two experiences here:
      • Original Experience (aka Pretty URL): This URL must either have a #! (hashbang) within the URL to indicate that there is an escaped fragment or a meta element indicating that an escaped fragment exists (<meta name=”fragment” content=”!”>).
      • Escaped Fragment (aka Ugly URL, HTML snapshot): This URL replace the hashbang (#!) with “_escaped_fragment_” and serves the HTML snapshot. It is called the ugly URL because it’s long and looks like (and for all intents and purposes is) a hack.

Image result

Image source

  • Recommended:
    • pushState History API – PushState is navigation-based and part of the History API (think: your web browsing history). Essentially, pushState updates the URL in the address bar and only what needs to change on the page is updated. It allows JS sites to leverage “clean” URLs. PushState is currently supported by Google, when supporting browser navigation for client-side or hybrid rendering.
      • A good use of pushState is for infinite scroll (i.e., as the user hits new parts of the page the URL will update). Ideally, if the user refreshes the page, the experience will land them in the exact same spot. However, they do not need to refresh the page, as the content updates as they scroll down, while the URL is updated in the address bar.
      • Example: A good example of a search engine-friendly infinite scroll implementation, created by Google’s John Mueller (go figure), can be found here. He technically leverages the replaceState(), which doesn’t include the same back button functionality as pushState.
      • Read more: Mozilla PushState History API Documents

Obtainability

Search engines have been shown to employ headless browsing to render the DOM to gain a better understanding of the user’s experience and the content on page. That is to say, Google can process some JavaScript and uses the DOM (instead of the HTML document).

At the same time, there are situations where search engines struggle to comprehend JavaScript. Nobody wants a Hulu situation to happen to their site or a client’s site. It is crucial to understand how bots are interacting with your onsite content. When you aren’t sure, test.

Assuming we’re talking about a search engine bot that executes JavaScript, there are a few important elements for search engines to be able to obtain content:

  • If the user must interact for something to fire, search engines probably aren’t seeing it.
    • Google is a lazy user. It doesn’t click, it doesn’t scroll, and it doesn’t log in. If the full UX demands action from the user, special precautions should be taken to ensure that bots are receiving an equivalent experience.
  • If the JavaScript occurs after the JavaScript load event fires plus ~5-seconds*, search engines may not be seeing it.
    • *John Mueller mentioned that there is no specific timeout value; however, sites should aim to load within five seconds.
    • *Screaming Frog tests show a correlation to five seconds to render content.
    • *The load event plus five seconds is what Google’s PageSpeed Insights, Mobile Friendliness Tool, and Fetch as Google use; check out Max Prin’s test timer.
  • If there are errors within the JavaScript, both browsers and search engines won’t be able to go through and potentially miss sections of pages if the entire code is not executed.

How to make sure Google and other search engines can get your content

1. TEST

The most popular solution to resolving JavaScript is probably not resolving anything (grab a coffee and let Google work its algorithmic brilliance). Providing Google with the same experience as searchers is Google’s preferred scenario.

Google first announced being able to “better understand the web (i.e., JavaScript)” in May 2014. Industry experts suggested that Google could crawl JavaScript way before this announcement. The iPullRank team offered two great pieces on this in 2011: Googlebot is Chrome and How smart are Googlebots? (thank you, Josh and Mike). Adam Audette’s Google can crawl JavaScript and leverages the DOM in 2015 confirmed. Therefore, if you can see your content in the DOM, chances are your content is being parsed by Google.

adamaudette - I don't always JavaScript, but when I do, I know google can crawl the dom and dynamically generated HTML

Recently, Barry Goralewicz performed a cool experiment testing a combination of various JavaScript libraries and frameworks to determine how Google interacts with the pages (e.g., are they indexing URL/content? How does GSC interact? Etc.). It ultimately showed that Google is able to interact with many forms of JavaScript and highlighted certain frameworks as perhaps more challenging. John Mueller even started a JavaScript search group (from what I’ve read, it’s fairly therapeutic).

All of these studies are amazing and help SEOs understand when to be concerned and take a proactive role. However, before you determine that sitting back is the right solution for your site, I recommend being actively cautious by experimenting with small section Think: Jim Collin’s “bullets, then cannonballs” philosophy from his book Great by Choice:

“A bullet is an empirical test aimed at learning what works and meets three criteria: a bullet must be low-cost, low-risk, and low-distraction… 10Xers use bullets to empirically validate what will actually work. Based on that empirical validation, they then concentrate their resources to fire a cannonball, enabling large returns from concentrated bets.”

Consider testing and reviewing through the following:

  1. Confirm that your content is appearing within the DOM.
  2. Test a subset of pages to see if Google can index content.
  • Manually check quotes from your content.
  • Fetch with Google and see if content appears.
  • Fetch with Google supposedly occurs around the load event or before timeout. It’s a great test to check to see if Google will be able to see your content and whether or not you’re blocking JavaScript in your robots.txt. Although Fetch with Google is not foolproof, it’s a good starting point.
  • Note: If you aren’t verified in GSC, try Technicalseo.com’s Fetch and Render As Any Bot Tool.

After you’ve tested all this, what if something’s not working and search engines and bots are struggling to index and obtain your content? Perhaps you’re concerned about alternative search engines (DuckDuckGo, Facebook, LinkedIn, etc.), or maybe you’re leveraging meta information that needs to be parsed by other bots, such as Twitter summary cards or Facebook Open Graph tags. If any of this is identified in testing or presents itself as a concern, an HTML snapshot may be the only decision.

2. HTML SNAPSHOTS
What are HTmL snapshots?

HTML snapshots are a fully rendered page (as one might see in the DOM) that can be returned to search engine bots (think: a static HTML version of the DOM).

Google introduced HTML snapshots 2009, deprecated (but still supported) them in 2015, and awkwardly mentioned them as an element to “avoid” in late 2016. HTML snapshots are a contentious topic with Google. However, they’re important to understand, because in certain situations they’re necessary.

If search engines (or sites like Facebook) cannot grasp your JavaScript, it’s better to return an HTML snapshot than not to have your content indexed and understood at all. Ideally, your site would leverage some form of user-agent detection on the server side and return the HTML snapshot to the bot.

At the same time, one must recognize that Google wants the same experience as the user (i.e., only provide Google with an HTML snapshot if the tests are dire and the JavaScript search group cannot provide support for your situation).

Considerations

When considering HTML snapshots, you must consider that Google has deprecated this AJAX recommendation. Although Google technically still supports it, Google recommends avoiding it. Yes, Google changed its mind and now want to receive the same experience as the user. This direction makes sense, as it allows the bot to receive an experience more true to the user experience.

A second consideration factor relates to the risk of cloaking. If the HTML snapshots are found to not represent the experience on the page, it’s considered a cloaking risk. Straight from the source:

“The HTML snapshot must contain the same content as the end user would see in a browser. If this is not the case, it may be considered cloaking.”
– Google Developer AJAX Crawling FAQs

Benefits

Despite the considerations, HTML snapshots have powerful advantages:

  1. Knowledge that search engines and crawlers will be able to understand the experience.
    • Certain types of JavaScript may be harder for Google to grasp (cough… Angular (also colloquially referred to as AngularJS 2) …cough).
  2. Other search engines and crawlers (think: Bing, Facebook) will be able to understand the experience.
    • Bing, among other search engines, has not stated that it can crawl and index JavaScript. HTML snapshots may be the only solution for a JavaScript-heavy site. As always, test to make sure that this is the case before diving in.

"It's not just Google understanding your JavaScript. It's also about the speed." -DOM - "It's not just about Google understanding your Javascript. it's also about your perceived latency." -DOM

Site latency

When browsers receive an HTML document and create the DOM (although there is some level of pre-scanning), most resources are loaded as they appear within the HTML document. This means that if you have a huge file toward the top of your HTML document, a browser will load that immense file first.

The concept of Google’s critical rendering path is to load what the user needs as soon as possible, which can be translated to → “get everything above-the-fold in front of the user, ASAP.”

Critical Rendering Path – Optimized Rendering Loads Progressively ASAP:

progressive page rendering

Image source

However, if you have unnecessary resources or JavaScript files clogging up the page’s ability to load, you get “render-blocking JavaScript.” Meaning: your JavaScript is blocking the page’s potential to appear as if it’s loading faster (also called: perceived latency).

Render-blocking JavaScript – Solutions

If you analyze your page speed results (through tools like Page Speed Insights Tool, WebPageTest.org, CatchPoint, etc.) and determine that there is a render-blocking JavaScript issue, here are three potential solutions:

  1. Inline: Add the JavaScript in the HTML document.
  2. Async: Make JavaScript asynchronous (i.e., add “async” attribute to HTML tag).
  3. Defer: By placing JavaScript lower within the HTML.

!!! Important note: It’s important to understand that scripts must be arranged in order of precedence. Scripts that are used to load the above-the-fold content must be prioritized and should not be deferred. Also, any script that references another file can only be used after the referenced file has loaded. Make sure to work closely with your development team to confirm that there are no interruptions to the user’s experience.

Read more: Google Developer’s Speed Documentation


TL;DR – Moral of the story

Crawlers and search engines will do their best to crawl, execute, and interpret your JavaScript, but it is not guaranteed. Make sure your content is crawlable, obtainable, and isn’t developing site latency obstructions. The key = every situation demands testing. Based on the results, evaluate potential solutions.

Thanks: Thank you Max Prin (@maxxeight) for reviewing this content piece and sharing your knowledge, insight, and wisdom. It wouldn’t be the same without you.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

“”

Paint by Figures: Using Data to create Great Content

Published by “data”-tagged blogs have earned a large number of solid backlinks and alluring traffic. They at OK Cupid accomplishes this by tapping their huge source of unique data, generated by their users list. Let us take a look at one quick example: Congratulations Graduates: Nobody Provides a Sh*t.

22% of female and 16% of male millenials say a college degree is mandatory for dating.

Your blog publish is rather straightforward (and never particularly lengthy) however it used unique data that is not really open to an average joe. Because OK Cupid is within a fortunate position, they are able to provide this sort of insight for their audience in particular.

But maybe you do not have millions of customers with profiles in your site where can to consider first party data? Well, here are a handful of ideas of the kinds of data your organization or organization may have which may be easily switched into interesting content:

  • Google Analytics, Search Console data and Adwords data: Would you see trends around holidays which are interesting? Possibly you see more people finder for several keywords at certain occasions. This may be much more interesting should there be a nearby holiday (just like a festival or event) which makes your computer data unique from all of those other country.
  • Sales data: When do profits increase or lower? Will they coincide with occasions? Or will they occur to coincide with completely various kinds of keywords? Use Google Correlate, that will identify keywords such as the following exactly the same patterns as the data.
  • Survey data: Make use of your sales or lead history to operate surveys and generate insightful content.
    • A clothing store could compare responses to questions regarding personality through the colors of clothing that individuals purchase (Potential headline: Could It Be True The Things They Say About Red?)
    • A vehicle parts store could compare how big certain accessories to favorite sports (Potential headline: Big Trucks and large Hits)
    • An insurance coverage provider could compare the kind of insurance requested versus. the amount of education (Potential headline: What Smart People Do Differently with Insurance)

You will find most likely tons more causes of unique, first-party data that you and your business have generated through the years which may be switched into great content. Should you search through the information lengthy enough, you’ll hit pay dirt.

Relate

Information is foreign. It is a language very little one speaks when they were young-to-day conversations, a notation intended for machines. This consideration mandates that we make data immediately relatable to the readers. We should not just ask “Exactly what does the information say?”, but rather “Exactly what does the information tell me?” The way we make data relatable is straightforward — organize your computer data because when people identify themselves. This is often geographic, economic, biological, social, or cultural distinctions that we frequently classify ourselves.

Most of the best types of this sort of strategy involve geography (possibly because everybody lives somewhere, and it is pretty non-questionable to create generic claims about one location or any other). Check out a roadmap of the country and do not look first towards where you reside. I am a North Carolinian, and that i quickly find myself thinking about something that compares my condition to other people.

So perhaps you are not OK Cupid with countless users and also you aren’t able to find unique data to talk about — don’t be concerned, there’s still hope. The instance here is a rather ingenious approach to using Pay Per Click data to construct a geographical story that’s relatable to the possible client within the U . s . States. The webmasters at Opulent used condition-level Keyword Planner to visualise recognition across the nation inside a piece they call the “Condition of fashion.Inch

After I found this on Reddit’s DataIsBeautiful (where many of these examples originate from), I immediately checked to determine what performed very best in New York. I honestly could not care less about popular fashion or jewellery brands, but my curiosity about New York eclipsed that insufficient interest. Geography-based data visualization has created effective content associated with in sports, politics, beer, as well as knitting.

Should you leave with any practical ideas out of this publish, I believe this situation has to be it. Turn on an Adwords campaign and discover how consumer demand breaks lower inside your industry in a condition-by-condition level. Are you currently an internet marketer and wish to attract clients inside a particular sector? Here is your opportunity to write a whitepaper on national demand. If you are a nearby business, one can market to Google Keyword Planner for your city and compare it with other metropolitan areas round the country.

Surprise

Possibly the finest chance with data-focused submissions are the opportunity to truly surprise your readers. There is something exciting about learning a fascinating fact (who has not seen one of these simple laying around and did not get it?). So, how can you help make your data “pop?” How can you make figures fascinating?

Perspective.

Let us begin with an easy statistic:

The price of ending polio between 2013 and 2018 is

$5.5 Billion Dollars.

So how exactly does time feel for you? Will it feel big or little? Could it be interesting by itself? Most likely not, let us try to spice up a little.

$5.5 billion dollars does not appear much whenever you realize people spend that quantity on iPhones every 2 days. We’re able to rid the field of polio for your much! Or, let’s say we present it such as this…

Within this light, it appears almost insane to invest much money stopping just a couple of more polio cases in accordance with the large gains we’re able to make on malaria. Obviously, the data don’t tell the entire story. Polio is incorporated in the finish-stages of eradication in which the cost-per-situation is a lot greater, so that as malaria is attacked, it too might find cost-per-situation increase. However the point continues to be the same: by providing the polio figures some kind of context, some kind of forced perspective, we result in the data much more intriguing, notable and appealing.

Just how would the work with content for your own personel site? Let us take a look at a good example from BestPlay.co, which authored a bit aboard Games are becoming Worse. Games aren’t an information-centric industry, however that does not have them from producing awesome quite happy with data. Here is a generic graph they offer within the piece which showcases average game ratings.

There really is not much to determine here. There is nothing intrinsically shocking concerning the data once we view it. Just how will they add perspective to create their point and provide the consumer intrigue? Simple — use a historic perspective.

With this particular historic perspective, we are able to see game scores improving up to 2012, once they started to consider a dive — the very first multi-year join in their recorded history. To attract users in, you utilize comparison to supply surprising perspectives.

Share

This final technique is the one which I believe is most overlooked. Once you have produced your fancy bit of content, enable your audience perform some leg meet your needs by releasing the information set. Likely to entire community from the Internet just searching for excellent data sets that could make the most of your computer data and cite your articles in their own individual publications. You’ll find everything coming from all Jesse Trump’s Tweets to Everything Lost at TSA to Hands-attracted Images of Pineapples. While there’s a high probability your computer data set will not be used, it may get a few extra links when it will.

Putting it altogether

What goes on whenever a website owner combines these kinds of methods — exposing unique data, which makes it relatable and surprising, for a subject that appears averse to data? You receive something similar to this: Jeans versus. Leggings.

This piece performed the geography card for relatability:

They compared user curiosity about jeans to provide perspective towards the development of interest in leggings:

Slice.com reveals their first-party data to create interesting, data-driven content that ultimately scores them links from sites as with Style Magazine, Shape.com, and also the NY Publish. Searching at fashion with the lens of information meant great traffic and great shares.

How do you get began?

Get lower and dirty using the data. Don’t hold back until you finish track of a pleasant report inside your hands, but start slicing and dicing things searching for interesting patterns or results. You can begin using the data you have: Google Analytics, Search Console, Pay Per Click, and, if you are a Moz customer, even your rank tracking data or market and keyword research data. If none of those avenues work, search through the astonishing data sources available on Reddit or WebHose. Locate a story within the figures by relating the information for your audience and making comparisons to supply perspective. It is not a foolproof formula, but it’s pretty close. The best slice of information will cut straight through writer’s block.

Join The Moz Top Ten, a semimonthly mailer updating you on top ten hottest bits of Search engine optimization news, tips, and rad links uncovered through the Moz team. Consider it as being your exclusive digest of stuff you do not have time for you to search lower but wish to read!

Published by “data”-tagged blogs have earned a large number of solid backlinks and alluring traffic. They at OK Cupid accomplishes this by tapping their huge source of unique data, generated by their users list. Let us take a look at one quick example: Congratulations Graduates: Nobody Provides a Sh*t.

22% of female and 16% of male millenials say a college degree is mandatory for dating.

Your blog publish is rather straightforward (and never particularly lengthy) however it used unique data that is not really open to an average joe. Because OK Cupid is within a fortunate position, they are able to provide this sort of insight for their audience in particular.

But maybe you do not have millions of customers with profiles in your site where can to consider first party data? Well, here are a handful of ideas of the kinds of data your organization or organization may have which may be easily switched into interesting content:

  • Google Analytics, Search Console data and Adwords data: Would you see trends around holidays which are interesting? Possibly you see more people finder for several keywords at certain occasions. This may be much more interesting should there be a nearby holiday (just like a festival or event) which makes your computer data unique from all of those other country.
  • Sales data: When do profits increase or lower? Will they coincide with occasions? Or will they occur to coincide with completely various kinds of keywords? Use Google Correlate, that will identify keywords such as the following exactly the same patterns as the data.
  • Survey data: Make use of your sales or lead history to operate surveys and generate insightful content.
    • A clothing store could compare responses to questions regarding personality through the colors of clothing that individuals purchase (Potential headline: Could It Be True The Things They Say About Red?)
    • A vehicle parts store could compare how big certain accessories to favorite sports (Potential headline: Big Trucks and large Hits)
    • An insurance coverage provider could compare the kind of insurance requested versus. the amount of education (Potential headline: What Smart People Do Differently with Insurance)

You will find most likely tons more causes of unique, first-party data that you and your business have generated through the years which may be switched into great content. Should you search through the information lengthy enough, you’ll hit pay dirt.

Relate

Information is foreign. It is a language very little one speaks when they were young-to-day conversations, a notation intended for machines. This consideration mandates that we make data immediately relatable to the readers. We should not just ask “Exactly what does the information say?”, but rather “Exactly what does the information tell me?” The way we make data relatable is straightforward — organize your computer data because when people identify themselves. This is often geographic, economic, biological, social, or cultural distinctions that we frequently classify ourselves.

Most of the best types of this sort of strategy involve geography (possibly because everybody lives somewhere, and it is pretty non-questionable to create generic claims about one location or any other). Check out a roadmap of the country and do not look first towards where you reside. I am a North Carolinian, and that i quickly find myself thinking about something that compares my condition to other people.

So perhaps you are not OK Cupid with countless users and also you aren’t able to find unique data to talk about — don’t be concerned, there’s still hope. The instance here is a rather ingenious approach to using Pay Per Click data to construct a geographical story that’s relatable to the possible client within the U . s . States. The webmasters at Opulent used condition-level Keyword Planner to visualise recognition across the nation inside a piece they call the “Condition of fashion.Inch

After I found this on Reddit’s DataIsBeautiful (where many of these examples originate from), I immediately checked to determine what performed very best in New York. I honestly could not care less about popular fashion or jewellery brands, but my curiosity about New York eclipsed that insufficient interest. Geography-based data visualization has created effective content associated with in sports, politics, beer, as well as knitting.

Should you leave with any practical ideas out of this publish, I believe this situation has to be it. Turn on an Adwords campaign and discover how consumer demand breaks lower inside your industry in a condition-by-condition level. Are you currently an internet marketer and wish to attract clients inside a particular sector? Here is your opportunity to write a whitepaper on national demand. If you are a nearby business, one can market to Google Keyword Planner for your city and compare it with other metropolitan areas round the country.

Surprise

Possibly the finest chance with data-focused submissions are the opportunity to truly surprise your readers. There is something exciting about learning a fascinating fact (who has not seen one of these simple laying around and did not get it?). So, how can you help make your data “pop?” How can you make figures fascinating?

Perspective.

Let us begin with an easy statistic:

The price of ending polio between 2013 and 2018 is

$5.5 Billion Dollars.

So how exactly does time feel for you? Will it feel big or little? Could it be interesting by itself? Most likely not, let us try to spice up a little.

$5.5 billion dollars does not appear much whenever you realize people spend that quantity on iPhones every 2 days. We’re able to rid the field of polio for your much! Or, let’s say we present it such as this…

Within this light, it appears almost insane to invest much money stopping just a couple of more polio cases in accordance with the large gains we’re able to make on malaria. Obviously, the data don’t tell the entire story. Polio is incorporated in the finish-stages of eradication in which the cost-per-situation is a lot greater, so that as malaria is attacked, it too might find cost-per-situation increase. However the point continues to be the same: by providing the polio figures some kind of context, some kind of forced perspective, we result in the data much more intriguing, notable and appealing.

Just how would the work with content for your own personel site? Let us take a look at a good example from BestPlay.co, which authored a bit aboard Games are becoming Worse. Games aren’t an information-centric industry, however that does not have them from producing awesome quite happy with data. Here is a generic graph they offer within the piece which showcases average game ratings.

There really is not much to determine here. There is nothing intrinsically shocking concerning the data once we view it. Just how will they add perspective to create their point and provide the consumer intrigue? Simple — use a historic perspective.

With this particular historic perspective, we are able to see game scores improving up to 2012, once they started to consider a dive — the very first multi-year join in their recorded history. To attract users in, you utilize comparison to supply surprising perspectives.

Share

This final technique is the one which I believe is most overlooked. Once you have produced your fancy bit of content, enable your audience perform some leg meet your needs by releasing the information set. Likely to entire community from the Internet just searching for excellent data sets that could make the most of your computer data and cite your articles in their own individual publications. You’ll find everything coming from all Jesse Trump’s Tweets to Everything Lost at TSA to Hands-attracted Images of Pineapples. While there’s a high probability your computer data set will not be used, it may get a few extra links when it will.

Putting it altogether

What goes on whenever a website owner combines these kinds of methods — exposing unique data, which makes it relatable and surprising, for a subject that appears averse to data? You receive something similar to this: Jeans versus. Leggings.

This piece performed the geography card for relatability:

They compared user curiosity about jeans to provide perspective towards the development of interest in leggings:

Slice.com reveals their first-party data to create interesting, data-driven content that ultimately scores them links from sites as with Style Magazine, Shape.com, and also the NY Publish. Searching at fashion with the lens of information meant great traffic and great shares.

How do you get began?

Get lower and dirty using the data. Don’t hold back until you finish track of a pleasant report inside your hands, but start slicing and dicing things searching for interesting patterns or results. You can begin using the data you have: Google Analytics, Search Console, Pay Per Click, and, if you are a Moz customer, even your rank tracking data or market and keyword research data. If none of those avenues work, search through the astonishing data sources available on Reddit or WebHose. Locate a story within the figures by relating the information for your audience and making comparisons to supply perspective. It is not a foolproof formula, but it’s pretty close. The best slice of information will cut straight through writer’s block.

Join The Moz Top Ten, a semimonthly mailer updating you on top ten hottest bits of Search engine optimization news, tips, and rad links uncovered through the Moz team. Consider it as being your exclusive digest of stuff you do not have time for you to search lower but wish to read!

“”

JavaScript & SEO: Making Your Bot Experience As Good As Your User Experience

Posted by webpage development and how to code basic JavaScript.

javacssseo.gif

Image sources: 1, 2, 3

JavaScript is either placed in the HTML document within <script> tags (i.e., it is embedded in the HTML) or linked/referenced. There are currently a plethora of JavaScript libraries and frameworks, including jQuery, AngularJS, ReactJS, EmberJS, etc.

JavaScript libraries and frameworks:

What is AJAX?

AJAX, or Asynchronous JavaScript and XML, is a set of web development techniques combining JavaScript and XML that allows web applications to communicate with a server in the background without interfering with the current page. Asynchronous means that other functions or lines of code can run while the async script is running. XML used to be the primary language to pass data; however, the term AJAX is used for all types of data transfers (including JSON; I guess “AJAJ” doesn’t sound as clean as “AJAX” [pun intended]).

A common use of AJAX is to update the content or layout of a webpage without initiating a full page refresh. Normally, when a page loads, all the assets on the page must be requested and fetched from the server and then rendered on the page. However, with AJAX, only the assets that differ between pages need to be loaded, which improves the user experience as they do not have to refresh the entire page.

One can think of AJAX as mini server calls. A good example of AJAX in action is Google Maps. The page updates without a full page reload (i.e., mini server calls are being used to load content as the user navigates).

Related image

Image source

What is the Document Object Model (DOM)?

As an SEO professional, you need to understand what the DOM is, because it’s what Google is using to analyze and understand webpages.

The DOM is what you see when you “Inspect Element” in a browser. Simply put, you can think of the DOM as the steps the browser takes after receiving the HTML document to render the page.

The first thing the browser receives is the HTML document. After that, it will start parsing the content within this document and fetch additional resources, such as images, CSS, and JavaScript files.

The DOM is what forms from this parsing of information and resources. One can think of it as a structured, organized version of the webpage’s code.

Nowadays the DOM is often very different from the initial HTML document, due to what’s collectively called dynamic HTML. Dynamic HTML is the ability for a page to change its content depending on user input, environmental conditions (e.g. time of day), and other variables, leveraging HTML, CSS, and JavaScript.

Simple example with a <title> tag that is populated through JavaScript:

HTML source

DOM

What is headless browsing?

Headless browsing is simply the action of fetching webpages without the user interface. It is important to understand because Google, and now Baidu, leverage headless browsing to gain a better understanding of the user’s experience and the content of webpages.

PhantomJS and Zombie.js are scripted headless browsers, typically used for automating web interaction for testing purposes, and rendering static HTML snapshots for initial requests (pre-rendering).


Why can JavaScript be challenging for SEO? (and how to fix issues)

There are three (3) primary reasons to be concerned about JavaScript on your site:

  1. Crawlability: Bots’ ability to crawl your site.
  2. Obtainability: Bots’ ability to access information and parse your content.
  3. Perceived site latency: AKA the Critical Rendering Path.

Crawlability

Are bots able to find URLs and understand your site’s architecture? There are two important elements here:

  1. Blocking search engines from your JavaScript (even accidentally).
  2. Proper internal linking, not leveraging JavaScript events as a replacement for HTML tags.

Why is blocking JavaScript such a big deal?

If search engines are blocked from crawling JavaScript, they will not be receiving your site’s full experience. This means search engines are not seeing what the end user is seeing. This can reduce your site’s appeal to search engines and could eventually be considered cloaking (if the intent is indeed malicious).

Fetch as Google and TechnicalSEO.com’s robots.txt and Fetch and Render testing tools can help to identify resources that Googlebot is blocked from.

The easiest way to solve this problem is through providing search engines access to the resources they need to understand your user experience.

!!! Important note: Work with your development team to determine which files should and should not be accessible to search engines.

Internal linking

Internal linking should be implemented with regular anchor tags within the HTML or the DOM (using an a hrefs=”www.example.com” HTML tag) versus leveraging JavaScript functions to allow the user to traverse the site.

Essentially: Don’t use JavaScript’s onclick events as a replacement for internal linking. While end URLs might be found and crawled (through strings in JavaScript code or XML sitemaps), they won’t be associated with the global navigation of the site.

Internal linking is a strong signal to search engines regarding the site’s architecture and importance of pages. In fact, internal links are so strong that they can (in certain situations) override “SEO hints” such as canonical tags.

URL structure

Historically, JavaScript-based websites (aka “AJAX sites”) were using fragment identifiers (#) within URLs.

  • Not recommended:
    • The Lone Hash (#) – The lone pound symbol is not crawlable. It is used to identify anchor link (aka jump links). These are the links that allow one to jump to a piece of content on a page. Anything after the lone hash portion of the URL is never sent to the server and will cause the page to automatically scroll to the first element with a matching ID (or the first <a> element with a name of the following information). Google recommends avoiding the use of “#” in URLs.
    • Hashbang (#!) (and escaped_fragments URLs) – Hashbang URLs were a hack to support crawlers (Google wants to avoid now and only Bing supports). Many a moon ago, Google and Bing developed a complicated AJAX solution, whereby a pretty (#!) URL with the UX co-existed with an equivalent escaped_fragment HTML-based experience for bots. Google has since backtracked on this recommendation, preferring to receive the exact user experience. In escaped fragments, there are two experiences here:
      • Original Experience (aka Pretty URL): This URL must either have a #! (hashbang) within the URL to indicate that there is an escaped fragment or a meta element indicating that an escaped fragment exists (<meta name=”fragment” content=”!”>).
      • Escaped Fragment (aka Ugly URL, HTML snapshot): This URL replace the hashbang (#!) with “_escaped_fragment_” and serves the HTML snapshot. It is called the ugly URL because it’s long and looks like (and for all intents and purposes is) a hack.

Image result

Image source

  • Recommended:
    • pushState History API – PushState is navigation-based and part of the History API (think: your web browsing history). Essentially, pushState updates the URL in the address bar and only what needs to change on the page is updated. It allows JS sites to leverage “clean” URLs. PushState is currently supported by Google, when supporting browser navigation for client-side or hybrid rendering.
      • A good use of pushState is for infinite scroll (i.e., as the user hits new parts of the page the URL will update). Ideally, if the user refreshes the page, the experience will land them in the exact same spot. However, they do not need to refresh the page, as the content updates as they scroll down, while the URL is updated in the address bar.
      • Example: A good example of a search engine-friendly infinite scroll implementation, created by Google’s John Mueller (go figure), can be found here. He technically leverages the replaceState(), which doesn’t include the same back button functionality as pushState.
      • Read more: Mozilla PushState History API Documents

Obtainability

Search engines have been shown to employ headless browsing to render the DOM to gain a better understanding of the user’s experience and the content on page. That is to say, Google can process some JavaScript and uses the DOM (instead of the HTML document).

At the same time, there are situations where search engines struggle to comprehend JavaScript. Nobody wants a Hulu situation to happen to their site or a client’s site. It is crucial to understand how bots are interacting with your onsite content. When you aren’t sure, test.

Assuming we’re talking about a search engine bot that executes JavaScript, there are a few important elements for search engines to be able to obtain content:

  • If the user must interact for something to fire, search engines probably aren’t seeing it.
    • Google is a lazy user. It doesn’t click, it doesn’t scroll, and it doesn’t log in. If the full UX demands action from the user, special precautions should be taken to ensure that bots are receiving an equivalent experience.
  • If the JavaScript occurs after the JavaScript load event fires plus ~5-seconds*, search engines may not be seeing it.
    • *John Mueller mentioned that there is no specific timeout value; however, sites should aim to load within five seconds.
    • *Screaming Frog tests show a correlation to five seconds to render content.
    • *The load event plus five seconds is what Google’s PageSpeed Insights, Mobile Friendliness Tool, and Fetch as Google use; check out Max Prin’s test timer.
  • If there are errors within the JavaScript, both browsers and search engines won’t be able to go through and potentially miss sections of pages if the entire code is not executed.

How to make sure Google and other search engines can get your content

1. TEST

The most popular solution to resolving JavaScript is probably not resolving anything (grab a coffee and let Google work its algorithmic brilliance). Providing Google with the same experience as searchers is Google’s preferred scenario.

Google first announced being able to “better understand the web (i.e., JavaScript)” in May 2014. Industry experts suggested that Google could crawl JavaScript way before this announcement. The iPullRank team offered two great pieces on this in 2011: Googlebot is Chrome and How smart are Googlebots? (thank you, Josh and Mike). Adam Audette’s Google can crawl JavaScript and leverages the DOM in 2015 confirmed. Therefore, if you can see your content in the DOM, chances are your content is being parsed by Google.

adamaudette - I don't always JavaScript, but when I do, I know google can crawl the dom and dynamically generated HTML

Recently, Barry Goralewicz performed a cool experiment testing a combination of various JavaScript libraries and frameworks to determine how Google interacts with the pages (e.g., are they indexing URL/content? How does GSC interact? Etc.). It ultimately showed that Google is able to interact with many forms of JavaScript and highlighted certain frameworks as perhaps more challenging. John Mueller even started a JavaScript search group (from what I’ve read, it’s fairly therapeutic).

All of these studies are amazing and help SEOs understand when to be concerned and take a proactive role. However, before you determine that sitting back is the right solution for your site, I recommend being actively cautious by experimenting with small section Think: Jim Collin’s “bullets, then cannonballs” philosophy from his book Great by Choice:

“A bullet is an empirical test aimed at learning what works and meets three criteria: a bullet must be low-cost, low-risk, and low-distraction… 10Xers use bullets to empirically validate what will actually work. Based on that empirical validation, they then concentrate their resources to fire a cannonball, enabling large returns from concentrated bets.”

Consider testing and reviewing through the following:

  1. Confirm that your content is appearing within the DOM.
  2. Test a subset of pages to see if Google can index content.
  • Manually check quotes from your content.
  • Fetch with Google and see if content appears.
  • Fetch with Google supposedly occurs around the load event or before timeout. It’s a great test to check to see if Google will be able to see your content and whether or not you’re blocking JavaScript in your robots.txt. Although Fetch with Google is not foolproof, it’s a good starting point.
  • Note: If you aren’t verified in GSC, try Technicalseo.com’s Fetch and Render As Any Bot Tool.

After you’ve tested all this, what if something’s not working and search engines and bots are struggling to index and obtain your content? Perhaps you’re concerned about alternative search engines (DuckDuckGo, Facebook, LinkedIn, etc.), or maybe you’re leveraging meta information that needs to be parsed by other bots, such as Twitter summary cards or Facebook Open Graph tags. If any of this is identified in testing or presents itself as a concern, an HTML snapshot may be the only decision.

2. HTML SNAPSHOTS
What are HTmL snapshots?

HTML snapshots are a fully rendered page (as one might see in the DOM) that can be returned to search engine bots (think: a static HTML version of the DOM).

Google introduced HTML snapshots 2009, deprecated (but still supported) them in 2015, and awkwardly mentioned them as an element to “avoid” in late 2016. HTML snapshots are a contentious topic with Google. However, they’re important to understand, because in certain situations they’re necessary.

If search engines (or sites like Facebook) cannot grasp your JavaScript, it’s better to return an HTML snapshot than not to have your content indexed and understood at all. Ideally, your site would leverage some form of user-agent detection on the server side and return the HTML snapshot to the bot.

At the same time, one must recognize that Google wants the same experience as the user (i.e., only provide Google with an HTML snapshot if the tests are dire and the JavaScript search group cannot provide support for your situation).

Considerations

When considering HTML snapshots, you must consider that Google has deprecated this AJAX recommendation. Although Google technically still supports it, Google recommends avoiding it. Yes, Google changed its mind and now want to receive the same experience as the user. This direction makes sense, as it allows the bot to receive an experience more true to the user experience.

A second consideration factor relates to the risk of cloaking. If the HTML snapshots are found to not represent the experience on the page, it’s considered a cloaking risk. Straight from the source:

“The HTML snapshot must contain the same content as the end user would see in a browser. If this is not the case, it may be considered cloaking.”
– Google Developer AJAX Crawling FAQs

Benefits

Despite the considerations, HTML snapshots have powerful advantages:

  1. Knowledge that search engines and crawlers will be able to understand the experience.
    • Certain types of JavaScript may be harder for Google to grasp (cough… Angular (also colloquially referred to as AngularJS 2) …cough).
  2. Other search engines and crawlers (think: Bing, Facebook) will be able to understand the experience.
    • Bing, among other search engines, has not stated that it can crawl and index JavaScript. HTML snapshots may be the only solution for a JavaScript-heavy site. As always, test to make sure that this is the case before diving in.

"It's not just Google understanding your JavaScript. It's also about the speed." -DOM - "It's not just about Google understanding your Javascript. it's also about your perceived latency." -DOM

Site latency

When browsers receive an HTML document and create the DOM (although there is some level of pre-scanning), most resources are loaded as they appear within the HTML document. This means that if you have a huge file toward the top of your HTML document, a browser will load that immense file first.

The concept of Google’s critical rendering path is to load what the user needs as soon as possible, which can be translated to → “get everything above-the-fold in front of the user, ASAP.”

Critical Rendering Path – Optimized Rendering Loads Progressively ASAP:

progressive page rendering

Image source

However, if you have unnecessary resources or JavaScript files clogging up the page’s ability to load, you get “render-blocking JavaScript.” Meaning: your JavaScript is blocking the page’s potential to appear as if it’s loading faster (also called: perceived latency).

Render-blocking JavaScript – Solutions

If you analyze your page speed results (through tools like Page Speed Insights Tool, WebPageTest.org, CatchPoint, etc.) and determine that there is a render-blocking JavaScript issue, here are three potential solutions:

  1. Inline: Add the JavaScript in the HTML document.
  2. Async: Make JavaScript asynchronous (i.e., add “async” attribute to HTML tag).
  3. Defer: By placing JavaScript lower within the HTML.

!!! Important note: It’s important to understand that scripts must be arranged in order of precedence. Scripts that are used to load the above-the-fold content must be prioritized and should not be deferred. Also, any script that references another file can only be used after the referenced file has loaded. Make sure to work closely with your development team to confirm that there are no interruptions to the user’s experience.

Read more: Google Developer’s Speed Documentation


TL;DR – Moral of the story

Crawlers and search engines will do their best to crawl, execute, and interpret your JavaScript, but it is not guaranteed. Make sure your content is crawlable, obtainable, and isn’t developing site latency obstructions. The key = every situation demands testing. Based on the results, evaluate potential solutions.

Thanks: Thank you Max Prin (@maxxeight) for reviewing this content piece and sharing your knowledge, insight, and wisdom. It wouldn’t be the same without you.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Posted by webpage development and how to code basic JavaScript.

javacssseo.gif

Image sources: 1, 2, 3

JavaScript is either placed in the HTML document within <script> tags (i.e., it is embedded in the HTML) or linked/referenced. There are currently a plethora of JavaScript libraries and frameworks, including jQuery, AngularJS, ReactJS, EmberJS, etc.

JavaScript libraries and frameworks:

What is AJAX?

AJAX, or Asynchronous JavaScript and XML, is a set of web development techniques combining JavaScript and XML that allows web applications to communicate with a server in the background without interfering with the current page. Asynchronous means that other functions or lines of code can run while the async script is running. XML used to be the primary language to pass data; however, the term AJAX is used for all types of data transfers (including JSON; I guess “AJAJ” doesn’t sound as clean as “AJAX” [pun intended]).

A common use of AJAX is to update the content or layout of a webpage without initiating a full page refresh. Normally, when a page loads, all the assets on the page must be requested and fetched from the server and then rendered on the page. However, with AJAX, only the assets that differ between pages need to be loaded, which improves the user experience as they do not have to refresh the entire page.

One can think of AJAX as mini server calls. A good example of AJAX in action is Google Maps. The page updates without a full page reload (i.e., mini server calls are being used to load content as the user navigates).

Related image

Image source

What is the Document Object Model (DOM)?

As an SEO professional, you need to understand what the DOM is, because it’s what Google is using to analyze and understand webpages.

The DOM is what you see when you “Inspect Element” in a browser. Simply put, you can think of the DOM as the steps the browser takes after receiving the HTML document to render the page.

The first thing the browser receives is the HTML document. After that, it will start parsing the content within this document and fetch additional resources, such as images, CSS, and JavaScript files.

The DOM is what forms from this parsing of information and resources. One can think of it as a structured, organized version of the webpage’s code.

Nowadays the DOM is often very different from the initial HTML document, due to what’s collectively called dynamic HTML. Dynamic HTML is the ability for a page to change its content depending on user input, environmental conditions (e.g. time of day), and other variables, leveraging HTML, CSS, and JavaScript.

Simple example with a <title> tag that is populated through JavaScript:

HTML source

DOM

What is headless browsing?

Headless browsing is simply the action of fetching webpages without the user interface. It is important to understand because Google, and now Baidu, leverage headless browsing to gain a better understanding of the user’s experience and the content of webpages.

PhantomJS and Zombie.js are scripted headless browsers, typically used for automating web interaction for testing purposes, and rendering static HTML snapshots for initial requests (pre-rendering).


Why can JavaScript be challenging for SEO? (and how to fix issues)

There are three (3) primary reasons to be concerned about JavaScript on your site:

  1. Crawlability: Bots’ ability to crawl your site.
  2. Obtainability: Bots’ ability to access information and parse your content.
  3. Perceived site latency: AKA the Critical Rendering Path.

Crawlability

Are bots able to find URLs and understand your site’s architecture? There are two important elements here:

  1. Blocking search engines from your JavaScript (even accidentally).
  2. Proper internal linking, not leveraging JavaScript events as a replacement for HTML tags.

Why is blocking JavaScript such a big deal?

If search engines are blocked from crawling JavaScript, they will not be receiving your site’s full experience. This means search engines are not seeing what the end user is seeing. This can reduce your site’s appeal to search engines and could eventually be considered cloaking (if the intent is indeed malicious).

Fetch as Google and TechnicalSEO.com’s robots.txt and Fetch and Render testing tools can help to identify resources that Googlebot is blocked from.

The easiest way to solve this problem is through providing search engines access to the resources they need to understand your user experience.

!!! Important note: Work with your development team to determine which files should and should not be accessible to search engines.

Internal linking

Internal linking should be implemented with regular anchor tags within the HTML or the DOM (using an a hrefs=”www.example.com” HTML tag) versus leveraging JavaScript functions to allow the user to traverse the site.

Essentially: Don’t use JavaScript’s onclick events as a replacement for internal linking. While end URLs might be found and crawled (through strings in JavaScript code or XML sitemaps), they won’t be associated with the global navigation of the site.

Internal linking is a strong signal to search engines regarding the site’s architecture and importance of pages. In fact, internal links are so strong that they can (in certain situations) override “SEO hints” such as canonical tags.

URL structure

Historically, JavaScript-based websites (aka “AJAX sites”) were using fragment identifiers (#) within URLs.

  • Not recommended:
    • The Lone Hash (#) – The lone pound symbol is not crawlable. It is used to identify anchor link (aka jump links). These are the links that allow one to jump to a piece of content on a page. Anything after the lone hash portion of the URL is never sent to the server and will cause the page to automatically scroll to the first element with a matching ID (or the first <a> element with a name of the following information). Google recommends avoiding the use of “#” in URLs.
    • Hashbang (#!) (and escaped_fragments URLs) – Hashbang URLs were a hack to support crawlers (Google wants to avoid now and only Bing supports). Many a moon ago, Google and Bing developed a complicated AJAX solution, whereby a pretty (#!) URL with the UX co-existed with an equivalent escaped_fragment HTML-based experience for bots. Google has since backtracked on this recommendation, preferring to receive the exact user experience. In escaped fragments, there are two experiences here:
      • Original Experience (aka Pretty URL): This URL must either have a #! (hashbang) within the URL to indicate that there is an escaped fragment or a meta element indicating that an escaped fragment exists (<meta name=”fragment” content=”!”>).
      • Escaped Fragment (aka Ugly URL, HTML snapshot): This URL replace the hashbang (#!) with “_escaped_fragment_” and serves the HTML snapshot. It is called the ugly URL because it’s long and looks like (and for all intents and purposes is) a hack.

Image result

Image source

  • Recommended:
    • pushState History API – PushState is navigation-based and part of the History API (think: your web browsing history). Essentially, pushState updates the URL in the address bar and only what needs to change on the page is updated. It allows JS sites to leverage “clean” URLs. PushState is currently supported by Google, when supporting browser navigation for client-side or hybrid rendering.
      • A good use of pushState is for infinite scroll (i.e., as the user hits new parts of the page the URL will update). Ideally, if the user refreshes the page, the experience will land them in the exact same spot. However, they do not need to refresh the page, as the content updates as they scroll down, while the URL is updated in the address bar.
      • Example: A good example of a search engine-friendly infinite scroll implementation, created by Google’s John Mueller (go figure), can be found here. He technically leverages the replaceState(), which doesn’t include the same back button functionality as pushState.
      • Read more: Mozilla PushState History API Documents

Obtainability

Search engines have been shown to employ headless browsing to render the DOM to gain a better understanding of the user’s experience and the content on page. That is to say, Google can process some JavaScript and uses the DOM (instead of the HTML document).

At the same time, there are situations where search engines struggle to comprehend JavaScript. Nobody wants a Hulu situation to happen to their site or a client’s site. It is crucial to understand how bots are interacting with your onsite content. When you aren’t sure, test.

Assuming we’re talking about a search engine bot that executes JavaScript, there are a few important elements for search engines to be able to obtain content:

  • If the user must interact for something to fire, search engines probably aren’t seeing it.
    • Google is a lazy user. It doesn’t click, it doesn’t scroll, and it doesn’t log in. If the full UX demands action from the user, special precautions should be taken to ensure that bots are receiving an equivalent experience.
  • If the JavaScript occurs after the JavaScript load event fires plus ~5-seconds*, search engines may not be seeing it.
    • *John Mueller mentioned that there is no specific timeout value; however, sites should aim to load within five seconds.
    • *Screaming Frog tests show a correlation to five seconds to render content.
    • *The load event plus five seconds is what Google’s PageSpeed Insights, Mobile Friendliness Tool, and Fetch as Google use; check out Max Prin’s test timer.
  • If there are errors within the JavaScript, both browsers and search engines won’t be able to go through and potentially miss sections of pages if the entire code is not executed.

How to make sure Google and other search engines can get your content

1. TEST

The most popular solution to resolving JavaScript is probably not resolving anything (grab a coffee and let Google work its algorithmic brilliance). Providing Google with the same experience as searchers is Google’s preferred scenario.

Google first announced being able to “better understand the web (i.e., JavaScript)” in May 2014. Industry experts suggested that Google could crawl JavaScript way before this announcement. The iPullRank team offered two great pieces on this in 2011: Googlebot is Chrome and How smart are Googlebots? (thank you, Josh and Mike). Adam Audette’s Google can crawl JavaScript and leverages the DOM in 2015 confirmed. Therefore, if you can see your content in the DOM, chances are your content is being parsed by Google.

adamaudette - I don't always JavaScript, but when I do, I know google can crawl the dom and dynamically generated HTML

Recently, Barry Goralewicz performed a cool experiment testing a combination of various JavaScript libraries and frameworks to determine how Google interacts with the pages (e.g., are they indexing URL/content? How does GSC interact? Etc.). It ultimately showed that Google is able to interact with many forms of JavaScript and highlighted certain frameworks as perhaps more challenging. John Mueller even started a JavaScript search group (from what I’ve read, it’s fairly therapeutic).

All of these studies are amazing and help SEOs understand when to be concerned and take a proactive role. However, before you determine that sitting back is the right solution for your site, I recommend being actively cautious by experimenting with small section Think: Jim Collin’s “bullets, then cannonballs” philosophy from his book Great by Choice:

“A bullet is an empirical test aimed at learning what works and meets three criteria: a bullet must be low-cost, low-risk, and low-distraction… 10Xers use bullets to empirically validate what will actually work. Based on that empirical validation, they then concentrate their resources to fire a cannonball, enabling large returns from concentrated bets.”

Consider testing and reviewing through the following:

  1. Confirm that your content is appearing within the DOM.
  2. Test a subset of pages to see if Google can index content.
  • Manually check quotes from your content.
  • Fetch with Google and see if content appears.
  • Fetch with Google supposedly occurs around the load event or before timeout. It’s a great test to check to see if Google will be able to see your content and whether or not you’re blocking JavaScript in your robots.txt. Although Fetch with Google is not foolproof, it’s a good starting point.
  • Note: If you aren’t verified in GSC, try Technicalseo.com’s Fetch and Render As Any Bot Tool.

After you’ve tested all this, what if something’s not working and search engines and bots are struggling to index and obtain your content? Perhaps you’re concerned about alternative search engines (DuckDuckGo, Facebook, LinkedIn, etc.), or maybe you’re leveraging meta information that needs to be parsed by other bots, such as Twitter summary cards or Facebook Open Graph tags. If any of this is identified in testing or presents itself as a concern, an HTML snapshot may be the only decision.

2. HTML SNAPSHOTS
What are HTmL snapshots?

HTML snapshots are a fully rendered page (as one might see in the DOM) that can be returned to search engine bots (think: a static HTML version of the DOM).

Google introduced HTML snapshots 2009, deprecated (but still supported) them in 2015, and awkwardly mentioned them as an element to “avoid” in late 2016. HTML snapshots are a contentious topic with Google. However, they’re important to understand, because in certain situations they’re necessary.

If search engines (or sites like Facebook) cannot grasp your JavaScript, it’s better to return an HTML snapshot than not to have your content indexed and understood at all. Ideally, your site would leverage some form of user-agent detection on the server side and return the HTML snapshot to the bot.

At the same time, one must recognize that Google wants the same experience as the user (i.e., only provide Google with an HTML snapshot if the tests are dire and the JavaScript search group cannot provide support for your situation).

Considerations

When considering HTML snapshots, you must consider that Google has deprecated this AJAX recommendation. Although Google technically still supports it, Google recommends avoiding it. Yes, Google changed its mind and now want to receive the same experience as the user. This direction makes sense, as it allows the bot to receive an experience more true to the user experience.

A second consideration factor relates to the risk of cloaking. If the HTML snapshots are found to not represent the experience on the page, it’s considered a cloaking risk. Straight from the source:

“The HTML snapshot must contain the same content as the end user would see in a browser. If this is not the case, it may be considered cloaking.”
– Google Developer AJAX Crawling FAQs

Benefits

Despite the considerations, HTML snapshots have powerful advantages:

  1. Knowledge that search engines and crawlers will be able to understand the experience.
    • Certain types of JavaScript may be harder for Google to grasp (cough… Angular (also colloquially referred to as AngularJS 2) …cough).
  2. Other search engines and crawlers (think: Bing, Facebook) will be able to understand the experience.
    • Bing, among other search engines, has not stated that it can crawl and index JavaScript. HTML snapshots may be the only solution for a JavaScript-heavy site. As always, test to make sure that this is the case before diving in.

"It's not just Google understanding your JavaScript. It's also about the speed." -DOM - "It's not just about Google understanding your Javascript. it's also about your perceived latency." -DOM

Site latency

When browsers receive an HTML document and create the DOM (although there is some level of pre-scanning), most resources are loaded as they appear within the HTML document. This means that if you have a huge file toward the top of your HTML document, a browser will load that immense file first.

The concept of Google’s critical rendering path is to load what the user needs as soon as possible, which can be translated to → “get everything above-the-fold in front of the user, ASAP.”

Critical Rendering Path – Optimized Rendering Loads Progressively ASAP:

progressive page rendering

Image source

However, if you have unnecessary resources or JavaScript files clogging up the page’s ability to load, you get “render-blocking JavaScript.” Meaning: your JavaScript is blocking the page’s potential to appear as if it’s loading faster (also called: perceived latency).

Render-blocking JavaScript – Solutions

If you analyze your page speed results (through tools like Page Speed Insights Tool, WebPageTest.org, CatchPoint, etc.) and determine that there is a render-blocking JavaScript issue, here are three potential solutions:

  1. Inline: Add the JavaScript in the HTML document.
  2. Async: Make JavaScript asynchronous (i.e., add “async” attribute to HTML tag).
  3. Defer: By placing JavaScript lower within the HTML.

!!! Important note: It’s important to understand that scripts must be arranged in order of precedence. Scripts that are used to load the above-the-fold content must be prioritized and should not be deferred. Also, any script that references another file can only be used after the referenced file has loaded. Make sure to work closely with your development team to confirm that there are no interruptions to the user’s experience.

Read more: Google Developer’s Speed Documentation


TL;DR – Moral of the story

Crawlers and search engines will do their best to crawl, execute, and interpret your JavaScript, but it is not guaranteed. Make sure your content is crawlable, obtainable, and isn’t developing site latency obstructions. The key = every situation demands testing. Based on the results, evaluate potential solutions.

Thanks: Thank you Max Prin (@maxxeight) for reviewing this content piece and sharing your knowledge, insight, and wisdom. It wouldn’t be the same without you.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

“”

JavaScript & SEO: Making Your Bot Experience As Good As Your User Experience

Posted by webpage development and how to code basic JavaScript.

javacssseo.gif

Image sources: 1, 2, 3

JavaScript is either placed in the HTML document within <script> tags (i.e., it is embedded in the HTML) or linked/referenced. There are currently a plethora of JavaScript libraries and frameworks, including jQuery, AngularJS, ReactJS, EmberJS, etc.

JavaScript libraries and frameworks:

What is AJAX?

AJAX, or Asynchronous JavaScript and XML, is a set of web development techniques combining JavaScript and XML that allows web applications to communicate with a server in the background without interfering with the current page. Asynchronous means that other functions or lines of code can run while the async script is running. XML used to be the primary language to pass data; however, the term AJAX is used for all types of data transfers (including JSON; I guess “AJAJ” doesn’t sound as clean as “AJAX” [pun intended]).

A common use of AJAX is to update the content or layout of a webpage without initiating a full page refresh. Normally, when a page loads, all the assets on the page must be requested and fetched from the server and then rendered on the page. However, with AJAX, only the assets that differ between pages need to be loaded, which improves the user experience as they do not have to refresh the entire page.

One can think of AJAX as mini server calls. A good example of AJAX in action is Google Maps. The page updates without a full page reload (i.e., mini server calls are being used to load content as the user navigates).

Related image

Image source

What is the Document Object Model (DOM)?

As an SEO professional, you need to understand what the DOM is, because it’s what Google is using to analyze and understand webpages.

The DOM is what you see when you “Inspect Element” in a browser. Simply put, you can think of the DOM as the steps the browser takes after receiving the HTML document to render the page.

The first thing the browser receives is the HTML document. After that, it will start parsing the content within this document and fetch additional resources, such as images, CSS, and JavaScript files.

The DOM is what forms from this parsing of information and resources. One can think of it as a structured, organized version of the webpage’s code.

Nowadays the DOM is often very different from the initial HTML document, due to what’s collectively called dynamic HTML. Dynamic HTML is the ability for a page to change its content depending on user input, environmental conditions (e.g. time of day), and other variables, leveraging HTML, CSS, and JavaScript.

Simple example with a <title> tag that is populated through JavaScript:

HTML source

DOM

What is headless browsing?

Headless browsing is simply the action of fetching webpages without the user interface. It is important to understand because Google, and now Baidu, leverage headless browsing to gain a better understanding of the user’s experience and the content of webpages.

PhantomJS and Zombie.js are scripted headless browsers, typically used for automating web interaction for testing purposes, and rendering static HTML snapshots for initial requests (pre-rendering).


Why can JavaScript be challenging for SEO? (and how to fix issues)

There are three (3) primary reasons to be concerned about JavaScript on your site:

  1. Crawlability: Bots’ ability to crawl your site.
  2. Obtainability: Bots’ ability to access information and parse your content.
  3. Perceived site latency: AKA the Critical Rendering Path.

Crawlability

Are bots able to find URLs and understand your site’s architecture? There are two important elements here:

  1. Blocking search engines from your JavaScript (even accidentally).
  2. Proper internal linking, not leveraging JavaScript events as a replacement for HTML tags.

Why is blocking JavaScript such a big deal?

If search engines are blocked from crawling JavaScript, they will not be receiving your site’s full experience. This means search engines are not seeing what the end user is seeing. This can reduce your site’s appeal to search engines and could eventually be considered cloaking (if the intent is indeed malicious).

Fetch as Google and TechnicalSEO.com’s robots.txt and Fetch and Render testing tools can help to identify resources that Googlebot is blocked from.

The easiest way to solve this problem is through providing search engines access to the resources they need to understand your user experience.

!!! Important note: Work with your development team to determine which files should and should not be accessible to search engines.

Internal linking

Internal linking should be implemented with regular anchor tags within the HTML or the DOM (using an a hrefs=”www.example.com” HTML tag) versus leveraging JavaScript functions to allow the user to traverse the site.

Essentially: Don’t use JavaScript’s onclick events as a replacement for internal linking. While end URLs might be found and crawled (through strings in JavaScript code or XML sitemaps), they won’t be associated with the global navigation of the site.

Internal linking is a strong signal to search engines regarding the site’s architecture and importance of pages. In fact, internal links are so strong that they can (in certain situations) override “SEO hints” such as canonical tags.

URL structure

Historically, JavaScript-based websites (aka “AJAX sites”) were using fragment identifiers (#) within URLs.

  • Not recommended:
    • The Lone Hash (#) – The lone pound symbol is not crawlable. It is used to identify anchor link (aka jump links). These are the links that allow one to jump to a piece of content on a page. Anything after the lone hash portion of the URL is never sent to the server and will cause the page to automatically scroll to the first element with a matching ID (or the first <a> element with a name of the following information). Google recommends avoiding the use of “#” in URLs.
    • Hashbang (#!) (and escaped_fragments URLs) – Hashbang URLs were a hack to support crawlers (Google wants to avoid now and only Bing supports). Many a moon ago, Google and Bing developed a complicated AJAX solution, whereby a pretty (#!) URL with the UX co-existed with an equivalent escaped_fragment HTML-based experience for bots. Google has since backtracked on this recommendation, preferring to receive the exact user experience. In escaped fragments, there are two experiences here:
      • Original Experience (aka Pretty URL): This URL must either have a #! (hashbang) within the URL to indicate that there is an escaped fragment or a meta element indicating that an escaped fragment exists (<meta name=”fragment” content=”!”>).
      • Escaped Fragment (aka Ugly URL, HTML snapshot): This URL replace the hashbang (#!) with “_escaped_fragment_” and serves the HTML snapshot. It is called the ugly URL because it’s long and looks like (and for all intents and purposes is) a hack.

Image result

Image source

  • Recommended:
    • pushState History API – PushState is navigation-based and part of the History API (think: your web browsing history). Essentially, pushState updates the URL in the address bar and only what needs to change on the page is updated. It allows JS sites to leverage “clean” URLs. PushState is currently supported by Google, when supporting browser navigation for client-side or hybrid rendering.
      • A good use of pushState is for infinite scroll (i.e., as the user hits new parts of the page the URL will update). Ideally, if the user refreshes the page, the experience will land them in the exact same spot. However, they do not need to refresh the page, as the content updates as they scroll down, while the URL is updated in the address bar.
      • Example: A good example of a search engine-friendly infinite scroll implementation, created by Google’s John Mueller (go figure), can be found here. He technically leverages the replaceState(), which doesn’t include the same back button functionality as pushState.
      • Read more: Mozilla PushState History API Documents

Obtainability

Search engines have been shown to employ headless browsing to render the DOM to gain a better understanding of the user’s experience and the content on page. That is to say, Google can process some JavaScript and uses the DOM (instead of the HTML document).

At the same time, there are situations where search engines struggle to comprehend JavaScript. Nobody wants a Hulu situation to happen to their site or a client’s site. It is crucial to understand how bots are interacting with your onsite content. When you aren’t sure, test.

Assuming we’re talking about a search engine bot that executes JavaScript, there are a few important elements for search engines to be able to obtain content:

  • If the user must interact for something to fire, search engines probably aren’t seeing it.
    • Google is a lazy user. It doesn’t click, it doesn’t scroll, and it doesn’t log in. If the full UX demands action from the user, special precautions should be taken to ensure that bots are receiving an equivalent experience.
  • If the JavaScript occurs after the JavaScript load event fires plus ~5-seconds*, search engines may not be seeing it.
    • *John Mueller mentioned that there is no specific timeout value; however, sites should aim to load within five seconds.
    • *Screaming Frog tests show a correlation to five seconds to render content.
    • *The load event plus five seconds is what Google’s PageSpeed Insights, Mobile Friendliness Tool, and Fetch as Google use; check out Max Prin’s test timer.
  • If there are errors within the JavaScript, both browsers and search engines won’t be able to go through and potentially miss sections of pages if the entire code is not executed.

How to make sure Google and other search engines can get your content

1. TEST

The most popular solution to resolving JavaScript is probably not resolving anything (grab a coffee and let Google work its algorithmic brilliance). Providing Google with the same experience as searchers is Google’s preferred scenario.

Google first announced being able to “better understand the web (i.e., JavaScript)” in May 2014. Industry experts suggested that Google could crawl JavaScript way before this announcement. The iPullRank team offered two great pieces on this in 2011: Googlebot is Chrome and How smart are Googlebots? (thank you, Josh and Mike). Adam Audette’s Google can crawl JavaScript and leverages the DOM in 2015 confirmed. Therefore, if you can see your content in the DOM, chances are your content is being parsed by Google.

adamaudette - I don't always JavaScript, but when I do, I know google can crawl the dom and dynamically generated HTML

Recently, Barry Goralewicz performed a cool experiment testing a combination of various JavaScript libraries and frameworks to determine how Google interacts with the pages (e.g., are they indexing URL/content? How does GSC interact? Etc.). It ultimately showed that Google is able to interact with many forms of JavaScript and highlighted certain frameworks as perhaps more challenging. John Mueller even started a JavaScript search group (from what I’ve read, it’s fairly therapeutic).

All of these studies are amazing and help SEOs understand when to be concerned and take a proactive role. However, before you determine that sitting back is the right solution for your site, I recommend being actively cautious by experimenting with small section Think: Jim Collin’s “bullets, then cannonballs” philosophy from his book Great by Choice:

“A bullet is an empirical test aimed at learning what works and meets three criteria: a bullet must be low-cost, low-risk, and low-distraction… 10Xers use bullets to empirically validate what will actually work. Based on that empirical validation, they then concentrate their resources to fire a cannonball, enabling large returns from concentrated bets.”

Consider testing and reviewing through the following:

  1. Confirm that your content is appearing within the DOM.
  2. Test a subset of pages to see if Google can index content.
  • Manually check quotes from your content.
  • Fetch with Google and see if content appears.
  • Fetch with Google supposedly occurs around the load event or before timeout. It’s a great test to check to see if Google will be able to see your content and whether or not you’re blocking JavaScript in your robots.txt. Although Fetch with Google is not foolproof, it’s a good starting point.
  • Note: If you aren’t verified in GSC, try Technicalseo.com’s Fetch and Render As Any Bot Tool.

After you’ve tested all this, what if something’s not working and search engines and bots are struggling to index and obtain your content? Perhaps you’re concerned about alternative search engines (DuckDuckGo, Facebook, LinkedIn, etc.), or maybe you’re leveraging meta information that needs to be parsed by other bots, such as Twitter summary cards or Facebook Open Graph tags. If any of this is identified in testing or presents itself as a concern, an HTML snapshot may be the only decision.

2. HTML SNAPSHOTS
What are HTmL snapshots?

HTML snapshots are a fully rendered page (as one might see in the DOM) that can be returned to search engine bots (think: a static HTML version of the DOM).

Google introduced HTML snapshots 2009, deprecated (but still supported) them in 2015, and awkwardly mentioned them as an element to “avoid” in late 2016. HTML snapshots are a contentious topic with Google. However, they’re important to understand, because in certain situations they’re necessary.

If search engines (or sites like Facebook) cannot grasp your JavaScript, it’s better to return an HTML snapshot than not to have your content indexed and understood at all. Ideally, your site would leverage some form of user-agent detection on the server side and return the HTML snapshot to the bot.

At the same time, one must recognize that Google wants the same experience as the user (i.e., only provide Google with an HTML snapshot if the tests are dire and the JavaScript search group cannot provide support for your situation).

Considerations

When considering HTML snapshots, you must consider that Google has deprecated this AJAX recommendation. Although Google technically still supports it, Google recommends avoiding it. Yes, Google changed its mind and now want to receive the same experience as the user. This direction makes sense, as it allows the bot to receive an experience more true to the user experience.

A second consideration factor relates to the risk of cloaking. If the HTML snapshots are found to not represent the experience on the page, it’s considered a cloaking risk. Straight from the source:

“The HTML snapshot must contain the same content as the end user would see in a browser. If this is not the case, it may be considered cloaking.”
– Google Developer AJAX Crawling FAQs

Benefits

Despite the considerations, HTML snapshots have powerful advantages:

  1. Knowledge that search engines and crawlers will be able to understand the experience.
    • Certain types of JavaScript may be harder for Google to grasp (cough… Angular (also colloquially referred to as AngularJS 2) …cough).
  2. Other search engines and crawlers (think: Bing, Facebook) will be able to understand the experience.
    • Bing, among other search engines, has not stated that it can crawl and index JavaScript. HTML snapshots may be the only solution for a JavaScript-heavy site. As always, test to make sure that this is the case before diving in.

"It's not just Google understanding your JavaScript. It's also about the speed." -DOM - "It's not just about Google understanding your Javascript. it's also about your perceived latency." -DOM

Site latency

When browsers receive an HTML document and create the DOM (although there is some level of pre-scanning), most resources are loaded as they appear within the HTML document. This means that if you have a huge file toward the top of your HTML document, a browser will load that immense file first.

The concept of Google’s critical rendering path is to load what the user needs as soon as possible, which can be translated to → “get everything above-the-fold in front of the user, ASAP.”

Critical Rendering Path – Optimized Rendering Loads Progressively ASAP:

progressive page rendering

Image source

However, if you have unnecessary resources or JavaScript files clogging up the page’s ability to load, you get “render-blocking JavaScript.” Meaning: your JavaScript is blocking the page’s potential to appear as if it’s loading faster (also called: perceived latency).

Render-blocking JavaScript – Solutions

If you analyze your page speed results (through tools like Page Speed Insights Tool, WebPageTest.org, CatchPoint, etc.) and determine that there is a render-blocking JavaScript issue, here are three potential solutions:

  1. Inline: Add the JavaScript in the HTML document.
  2. Async: Make JavaScript asynchronous (i.e., add “async” attribute to HTML tag).
  3. Defer: By placing JavaScript lower within the HTML.

!!! Important note: It’s important to understand that scripts must be arranged in order of precedence. Scripts that are used to load the above-the-fold content must be prioritized and should not be deferred. Also, any script that references another file can only be used after the referenced file has loaded. Make sure to work closely with your development team to confirm that there are no interruptions to the user’s experience.

Read more: Google Developer’s Speed Documentation


TL;DR – Moral of the story

Crawlers and search engines will do their best to crawl, execute, and interpret your JavaScript, but it is not guaranteed. Make sure your content is crawlable, obtainable, and isn’t developing site latency obstructions. The key = every situation demands testing. Based on the results, evaluate potential solutions.

Thanks: Thank you Max Prin (@maxxeight) for reviewing this content piece and sharing your knowledge, insight, and wisdom. It wouldn’t be the same without you.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Posted by webpage development and how to code basic JavaScript.

javacssseo.gif

Image sources: 1, 2, 3

JavaScript is either placed in the HTML document within <script> tags (i.e., it is embedded in the HTML) or linked/referenced. There are currently a plethora of JavaScript libraries and frameworks, including jQuery, AngularJS, ReactJS, EmberJS, etc.

JavaScript libraries and frameworks:

What is AJAX?

AJAX, or Asynchronous JavaScript and XML, is a set of web development techniques combining JavaScript and XML that allows web applications to communicate with a server in the background without interfering with the current page. Asynchronous means that other functions or lines of code can run while the async script is running. XML used to be the primary language to pass data; however, the term AJAX is used for all types of data transfers (including JSON; I guess “AJAJ” doesn’t sound as clean as “AJAX” [pun intended]).

A common use of AJAX is to update the content or layout of a webpage without initiating a full page refresh. Normally, when a page loads, all the assets on the page must be requested and fetched from the server and then rendered on the page. However, with AJAX, only the assets that differ between pages need to be loaded, which improves the user experience as they do not have to refresh the entire page.

One can think of AJAX as mini server calls. A good example of AJAX in action is Google Maps. The page updates without a full page reload (i.e., mini server calls are being used to load content as the user navigates).

Related image

Image source

What is the Document Object Model (DOM)?

As an SEO professional, you need to understand what the DOM is, because it’s what Google is using to analyze and understand webpages.

The DOM is what you see when you “Inspect Element” in a browser. Simply put, you can think of the DOM as the steps the browser takes after receiving the HTML document to render the page.

The first thing the browser receives is the HTML document. After that, it will start parsing the content within this document and fetch additional resources, such as images, CSS, and JavaScript files.

The DOM is what forms from this parsing of information and resources. One can think of it as a structured, organized version of the webpage’s code.

Nowadays the DOM is often very different from the initial HTML document, due to what’s collectively called dynamic HTML. Dynamic HTML is the ability for a page to change its content depending on user input, environmental conditions (e.g. time of day), and other variables, leveraging HTML, CSS, and JavaScript.

Simple example with a <title> tag that is populated through JavaScript:

HTML source

DOM

What is headless browsing?

Headless browsing is simply the action of fetching webpages without the user interface. It is important to understand because Google, and now Baidu, leverage headless browsing to gain a better understanding of the user’s experience and the content of webpages.

PhantomJS and Zombie.js are scripted headless browsers, typically used for automating web interaction for testing purposes, and rendering static HTML snapshots for initial requests (pre-rendering).


Why can JavaScript be challenging for SEO? (and how to fix issues)

There are three (3) primary reasons to be concerned about JavaScript on your site:

  1. Crawlability: Bots’ ability to crawl your site.
  2. Obtainability: Bots’ ability to access information and parse your content.
  3. Perceived site latency: AKA the Critical Rendering Path.

Crawlability

Are bots able to find URLs and understand your site’s architecture? There are two important elements here:

  1. Blocking search engines from your JavaScript (even accidentally).
  2. Proper internal linking, not leveraging JavaScript events as a replacement for HTML tags.

Why is blocking JavaScript such a big deal?

If search engines are blocked from crawling JavaScript, they will not be receiving your site’s full experience. This means search engines are not seeing what the end user is seeing. This can reduce your site’s appeal to search engines and could eventually be considered cloaking (if the intent is indeed malicious).

Fetch as Google and TechnicalSEO.com’s robots.txt and Fetch and Render testing tools can help to identify resources that Googlebot is blocked from.

The easiest way to solve this problem is through providing search engines access to the resources they need to understand your user experience.

!!! Important note: Work with your development team to determine which files should and should not be accessible to search engines.

Internal linking

Internal linking should be implemented with regular anchor tags within the HTML or the DOM (using an a hrefs=”www.example.com” HTML tag) versus leveraging JavaScript functions to allow the user to traverse the site.

Essentially: Don’t use JavaScript’s onclick events as a replacement for internal linking. While end URLs might be found and crawled (through strings in JavaScript code or XML sitemaps), they won’t be associated with the global navigation of the site.

Internal linking is a strong signal to search engines regarding the site’s architecture and importance of pages. In fact, internal links are so strong that they can (in certain situations) override “SEO hints” such as canonical tags.

URL structure

Historically, JavaScript-based websites (aka “AJAX sites”) were using fragment identifiers (#) within URLs.

  • Not recommended:
    • The Lone Hash (#) – The lone pound symbol is not crawlable. It is used to identify anchor link (aka jump links). These are the links that allow one to jump to a piece of content on a page. Anything after the lone hash portion of the URL is never sent to the server and will cause the page to automatically scroll to the first element with a matching ID (or the first <a> element with a name of the following information). Google recommends avoiding the use of “#” in URLs.
    • Hashbang (#!) (and escaped_fragments URLs) – Hashbang URLs were a hack to support crawlers (Google wants to avoid now and only Bing supports). Many a moon ago, Google and Bing developed a complicated AJAX solution, whereby a pretty (#!) URL with the UX co-existed with an equivalent escaped_fragment HTML-based experience for bots. Google has since backtracked on this recommendation, preferring to receive the exact user experience. In escaped fragments, there are two experiences here:
      • Original Experience (aka Pretty URL): This URL must either have a #! (hashbang) within the URL to indicate that there is an escaped fragment or a meta element indicating that an escaped fragment exists (<meta name=”fragment” content=”!”>).
      • Escaped Fragment (aka Ugly URL, HTML snapshot): This URL replace the hashbang (#!) with “_escaped_fragment_” and serves the HTML snapshot. It is called the ugly URL because it’s long and looks like (and for all intents and purposes is) a hack.

Image result

Image source

  • Recommended:
    • pushState History API – PushState is navigation-based and part of the History API (think: your web browsing history). Essentially, pushState updates the URL in the address bar and only what needs to change on the page is updated. It allows JS sites to leverage “clean” URLs. PushState is currently supported by Google, when supporting browser navigation for client-side or hybrid rendering.
      • A good use of pushState is for infinite scroll (i.e., as the user hits new parts of the page the URL will update). Ideally, if the user refreshes the page, the experience will land them in the exact same spot. However, they do not need to refresh the page, as the content updates as they scroll down, while the URL is updated in the address bar.
      • Example: A good example of a search engine-friendly infinite scroll implementation, created by Google’s John Mueller (go figure), can be found here. He technically leverages the replaceState(), which doesn’t include the same back button functionality as pushState.
      • Read more: Mozilla PushState History API Documents

Obtainability

Search engines have been shown to employ headless browsing to render the DOM to gain a better understanding of the user’s experience and the content on page. That is to say, Google can process some JavaScript and uses the DOM (instead of the HTML document).

At the same time, there are situations where search engines struggle to comprehend JavaScript. Nobody wants a Hulu situation to happen to their site or a client’s site. It is crucial to understand how bots are interacting with your onsite content. When you aren’t sure, test.

Assuming we’re talking about a search engine bot that executes JavaScript, there are a few important elements for search engines to be able to obtain content:

  • If the user must interact for something to fire, search engines probably aren’t seeing it.
    • Google is a lazy user. It doesn’t click, it doesn’t scroll, and it doesn’t log in. If the full UX demands action from the user, special precautions should be taken to ensure that bots are receiving an equivalent experience.
  • If the JavaScript occurs after the JavaScript load event fires plus ~5-seconds*, search engines may not be seeing it.
    • *John Mueller mentioned that there is no specific timeout value; however, sites should aim to load within five seconds.
    • *Screaming Frog tests show a correlation to five seconds to render content.
    • *The load event plus five seconds is what Google’s PageSpeed Insights, Mobile Friendliness Tool, and Fetch as Google use; check out Max Prin’s test timer.
  • If there are errors within the JavaScript, both browsers and search engines won’t be able to go through and potentially miss sections of pages if the entire code is not executed.

How to make sure Google and other search engines can get your content

1. TEST

The most popular solution to resolving JavaScript is probably not resolving anything (grab a coffee and let Google work its algorithmic brilliance). Providing Google with the same experience as searchers is Google’s preferred scenario.

Google first announced being able to “better understand the web (i.e., JavaScript)” in May 2014. Industry experts suggested that Google could crawl JavaScript way before this announcement. The iPullRank team offered two great pieces on this in 2011: Googlebot is Chrome and How smart are Googlebots? (thank you, Josh and Mike). Adam Audette’s Google can crawl JavaScript and leverages the DOM in 2015 confirmed. Therefore, if you can see your content in the DOM, chances are your content is being parsed by Google.

adamaudette - I don't always JavaScript, but when I do, I know google can crawl the dom and dynamically generated HTML

Recently, Barry Goralewicz performed a cool experiment testing a combination of various JavaScript libraries and frameworks to determine how Google interacts with the pages (e.g., are they indexing URL/content? How does GSC interact? Etc.). It ultimately showed that Google is able to interact with many forms of JavaScript and highlighted certain frameworks as perhaps more challenging. John Mueller even started a JavaScript search group (from what I’ve read, it’s fairly therapeutic).

All of these studies are amazing and help SEOs understand when to be concerned and take a proactive role. However, before you determine that sitting back is the right solution for your site, I recommend being actively cautious by experimenting with small section Think: Jim Collin’s “bullets, then cannonballs” philosophy from his book Great by Choice:

“A bullet is an empirical test aimed at learning what works and meets three criteria: a bullet must be low-cost, low-risk, and low-distraction… 10Xers use bullets to empirically validate what will actually work. Based on that empirical validation, they then concentrate their resources to fire a cannonball, enabling large returns from concentrated bets.”

Consider testing and reviewing through the following:

  1. Confirm that your content is appearing within the DOM.
  2. Test a subset of pages to see if Google can index content.
  • Manually check quotes from your content.
  • Fetch with Google and see if content appears.
  • Fetch with Google supposedly occurs around the load event or before timeout. It’s a great test to check to see if Google will be able to see your content and whether or not you’re blocking JavaScript in your robots.txt. Although Fetch with Google is not foolproof, it’s a good starting point.
  • Note: If you aren’t verified in GSC, try Technicalseo.com’s Fetch and Render As Any Bot Tool.

After you’ve tested all this, what if something’s not working and search engines and bots are struggling to index and obtain your content? Perhaps you’re concerned about alternative search engines (DuckDuckGo, Facebook, LinkedIn, etc.), or maybe you’re leveraging meta information that needs to be parsed by other bots, such as Twitter summary cards or Facebook Open Graph tags. If any of this is identified in testing or presents itself as a concern, an HTML snapshot may be the only decision.

2. HTML SNAPSHOTS
What are HTmL snapshots?

HTML snapshots are a fully rendered page (as one might see in the DOM) that can be returned to search engine bots (think: a static HTML version of the DOM).

Google introduced HTML snapshots 2009, deprecated (but still supported) them in 2015, and awkwardly mentioned them as an element to “avoid” in late 2016. HTML snapshots are a contentious topic with Google. However, they’re important to understand, because in certain situations they’re necessary.

If search engines (or sites like Facebook) cannot grasp your JavaScript, it’s better to return an HTML snapshot than not to have your content indexed and understood at all. Ideally, your site would leverage some form of user-agent detection on the server side and return the HTML snapshot to the bot.

At the same time, one must recognize that Google wants the same experience as the user (i.e., only provide Google with an HTML snapshot if the tests are dire and the JavaScript search group cannot provide support for your situation).

Considerations

When considering HTML snapshots, you must consider that Google has deprecated this AJAX recommendation. Although Google technically still supports it, Google recommends avoiding it. Yes, Google changed its mind and now want to receive the same experience as the user. This direction makes sense, as it allows the bot to receive an experience more true to the user experience.

A second consideration factor relates to the risk of cloaking. If the HTML snapshots are found to not represent the experience on the page, it’s considered a cloaking risk. Straight from the source:

“The HTML snapshot must contain the same content as the end user would see in a browser. If this is not the case, it may be considered cloaking.”
– Google Developer AJAX Crawling FAQs

Benefits

Despite the considerations, HTML snapshots have powerful advantages:

  1. Knowledge that search engines and crawlers will be able to understand the experience.
    • Certain types of JavaScript may be harder for Google to grasp (cough… Angular (also colloquially referred to as AngularJS 2) …cough).
  2. Other search engines and crawlers (think: Bing, Facebook) will be able to understand the experience.
    • Bing, among other search engines, has not stated that it can crawl and index JavaScript. HTML snapshots may be the only solution for a JavaScript-heavy site. As always, test to make sure that this is the case before diving in.

"It's not just Google understanding your JavaScript. It's also about the speed." -DOM - "It's not just about Google understanding your Javascript. it's also about your perceived latency." -DOM

Site latency

When browsers receive an HTML document and create the DOM (although there is some level of pre-scanning), most resources are loaded as they appear within the HTML document. This means that if you have a huge file toward the top of your HTML document, a browser will load that immense file first.

The concept of Google’s critical rendering path is to load what the user needs as soon as possible, which can be translated to → “get everything above-the-fold in front of the user, ASAP.”

Critical Rendering Path – Optimized Rendering Loads Progressively ASAP:

progressive page rendering

Image source

However, if you have unnecessary resources or JavaScript files clogging up the page’s ability to load, you get “render-blocking JavaScript.” Meaning: your JavaScript is blocking the page’s potential to appear as if it’s loading faster (also called: perceived latency).

Render-blocking JavaScript – Solutions

If you analyze your page speed results (through tools like Page Speed Insights Tool, WebPageTest.org, CatchPoint, etc.) and determine that there is a render-blocking JavaScript issue, here are three potential solutions:

  1. Inline: Add the JavaScript in the HTML document.
  2. Async: Make JavaScript asynchronous (i.e., add “async” attribute to HTML tag).
  3. Defer: By placing JavaScript lower within the HTML.

!!! Important note: It’s important to understand that scripts must be arranged in order of precedence. Scripts that are used to load the above-the-fold content must be prioritized and should not be deferred. Also, any script that references another file can only be used after the referenced file has loaded. Make sure to work closely with your development team to confirm that there are no interruptions to the user’s experience.

Read more: Google Developer’s Speed Documentation


TL;DR – Moral of the story

Crawlers and search engines will do their best to crawl, execute, and interpret your JavaScript, but it is not guaranteed. Make sure your content is crawlable, obtainable, and isn’t developing site latency obstructions. The key = every situation demands testing. Based on the results, evaluate potential solutions.

Thanks: Thank you Max Prin (@maxxeight) for reviewing this content piece and sharing your knowledge, insight, and wisdom. It wouldn’t be the same without you.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

“”

JavaScript & SEO: Making Your Bot Experience As Good As Your User Experience

Posted by webpage development and how to code basic JavaScript.

javacssseo.gif

Image sources: 1, 2, 3

JavaScript is either placed in the HTML document within <script> tags (i.e., it is embedded in the HTML) or linked/referenced. There are currently a plethora of JavaScript libraries and frameworks, including jQuery, AngularJS, ReactJS, EmberJS, etc.

JavaScript libraries and frameworks:

What is AJAX?

AJAX, or Asynchronous JavaScript and XML, is a set of web development techniques combining JavaScript and XML that allows web applications to communicate with a server in the background without interfering with the current page. Asynchronous means that other functions or lines of code can run while the async script is running. XML used to be the primary language to pass data; however, the term AJAX is used for all types of data transfers (including JSON; I guess “AJAJ” doesn’t sound as clean as “AJAX” [pun intended]).

A common use of AJAX is to update the content or layout of a webpage without initiating a full page refresh. Normally, when a page loads, all the assets on the page must be requested and fetched from the server and then rendered on the page. However, with AJAX, only the assets that differ between pages need to be loaded, which improves the user experience as they do not have to refresh the entire page.

One can think of AJAX as mini server calls. A good example of AJAX in action is Google Maps. The page updates without a full page reload (i.e., mini server calls are being used to load content as the user navigates).

Related image

Image source

What is the Document Object Model (DOM)?

As an SEO professional, you need to understand what the DOM is, because it’s what Google is using to analyze and understand webpages.

The DOM is what you see when you “Inspect Element” in a browser. Simply put, you can think of the DOM as the steps the browser takes after receiving the HTML document to render the page.

The first thing the browser receives is the HTML document. After that, it will start parsing the content within this document and fetch additional resources, such as images, CSS, and JavaScript files.

The DOM is what forms from this parsing of information and resources. One can think of it as a structured, organized version of the webpage’s code.

Nowadays the DOM is often very different from the initial HTML document, due to what’s collectively called dynamic HTML. Dynamic HTML is the ability for a page to change its content depending on user input, environmental conditions (e.g. time of day), and other variables, leveraging HTML, CSS, and JavaScript.

Simple example with a <title> tag that is populated through JavaScript:

HTML source

DOM

What is headless browsing?

Headless browsing is simply the action of fetching webpages without the user interface. It is important to understand because Google, and now Baidu, leverage headless browsing to gain a better understanding of the user’s experience and the content of webpages.

PhantomJS and Zombie.js are scripted headless browsers, typically used for automating web interaction for testing purposes, and rendering static HTML snapshots for initial requests (pre-rendering).


Why can JavaScript be challenging for SEO? (and how to fix issues)

There are three (3) primary reasons to be concerned about JavaScript on your site:

  1. Crawlability: Bots’ ability to crawl your site.
  2. Obtainability: Bots’ ability to access information and parse your content.
  3. Perceived site latency: AKA the Critical Rendering Path.

Crawlability

Are bots able to find URLs and understand your site’s architecture? There are two important elements here:

  1. Blocking search engines from your JavaScript (even accidentally).
  2. Proper internal linking, not leveraging JavaScript events as a replacement for HTML tags.

Why is blocking JavaScript such a big deal?

If search engines are blocked from crawling JavaScript, they will not be receiving your site’s full experience. This means search engines are not seeing what the end user is seeing. This can reduce your site’s appeal to search engines and could eventually be considered cloaking (if the intent is indeed malicious).

Fetch as Google and TechnicalSEO.com’s robots.txt and Fetch and Render testing tools can help to identify resources that Googlebot is blocked from.

The easiest way to solve this problem is through providing search engines access to the resources they need to understand your user experience.

!!! Important note: Work with your development team to determine which files should and should not be accessible to search engines.

Internal linking

Internal linking should be implemented with regular anchor tags within the HTML or the DOM (using an a hrefs=”www.example.com” HTML tag) versus leveraging JavaScript functions to allow the user to traverse the site.

Essentially: Don’t use JavaScript’s onclick events as a replacement for internal linking. While end URLs might be found and crawled (through strings in JavaScript code or XML sitemaps), they won’t be associated with the global navigation of the site.

Internal linking is a strong signal to search engines regarding the site’s architecture and importance of pages. In fact, internal links are so strong that they can (in certain situations) override “SEO hints” such as canonical tags.

URL structure

Historically, JavaScript-based websites (aka “AJAX sites”) were using fragment identifiers (#) within URLs.

  • Not recommended:
    • The Lone Hash (#) – The lone pound symbol is not crawlable. It is used to identify anchor link (aka jump links). These are the links that allow one to jump to a piece of content on a page. Anything after the lone hash portion of the URL is never sent to the server and will cause the page to automatically scroll to the first element with a matching ID (or the first <a> element with a name of the following information). Google recommends avoiding the use of “#” in URLs.
    • Hashbang (#!) (and escaped_fragments URLs) – Hashbang URLs were a hack to support crawlers (Google wants to avoid now and only Bing supports). Many a moon ago, Google and Bing developed a complicated AJAX solution, whereby a pretty (#!) URL with the UX co-existed with an equivalent escaped_fragment HTML-based experience for bots. Google has since backtracked on this recommendation, preferring to receive the exact user experience. In escaped fragments, there are two experiences here:
      • Original Experience (aka Pretty URL): This URL must either have a #! (hashbang) within the URL to indicate that there is an escaped fragment or a meta element indicating that an escaped fragment exists (<meta name=”fragment” content=”!”>).
      • Escaped Fragment (aka Ugly URL, HTML snapshot): This URL replace the hashbang (#!) with “_escaped_fragment_” and serves the HTML snapshot. It is called the ugly URL because it’s long and looks like (and for all intents and purposes is) a hack.

Image result

Image source

  • Recommended:
    • pushState History API – PushState is navigation-based and part of the History API (think: your web browsing history). Essentially, pushState updates the URL in the address bar and only what needs to change on the page is updated. It allows JS sites to leverage “clean” URLs. PushState is currently supported by Google, when supporting browser navigation for client-side or hybrid rendering.
      • A good use of pushState is for infinite scroll (i.e., as the user hits new parts of the page the URL will update). Ideally, if the user refreshes the page, the experience will land them in the exact same spot. However, they do not need to refresh the page, as the content updates as they scroll down, while the URL is updated in the address bar.
      • Example: A good example of a search engine-friendly infinite scroll implementation, created by Google’s John Mueller (go figure), can be found here. He technically leverages the replaceState(), which doesn’t include the same back button functionality as pushState.
      • Read more: Mozilla PushState History API Documents

Obtainability

Search engines have been shown to employ headless browsing to render the DOM to gain a better understanding of the user’s experience and the content on page. That is to say, Google can process some JavaScript and uses the DOM (instead of the HTML document).

At the same time, there are situations where search engines struggle to comprehend JavaScript. Nobody wants a Hulu situation to happen to their site or a client’s site. It is crucial to understand how bots are interacting with your onsite content. When you aren’t sure, test.

Assuming we’re talking about a search engine bot that executes JavaScript, there are a few important elements for search engines to be able to obtain content:

  • If the user must interact for something to fire, search engines probably aren’t seeing it.
    • Google is a lazy user. It doesn’t click, it doesn’t scroll, and it doesn’t log in. If the full UX demands action from the user, special precautions should be taken to ensure that bots are receiving an equivalent experience.
  • If the JavaScript occurs after the JavaScript load event fires plus ~5-seconds*, search engines may not be seeing it.
    • *John Mueller mentioned that there is no specific timeout value; however, sites should aim to load within five seconds.
    • *Screaming Frog tests show a correlation to five seconds to render content.
    • *The load event plus five seconds is what Google’s PageSpeed Insights, Mobile Friendliness Tool, and Fetch as Google use; check out Max Prin’s test timer.
  • If there are errors within the JavaScript, both browsers and search engines won’t be able to go through and potentially miss sections of pages if the entire code is not executed.

How to make sure Google and other search engines can get your content

1. TEST

The most popular solution to resolving JavaScript is probably not resolving anything (grab a coffee and let Google work its algorithmic brilliance). Providing Google with the same experience as searchers is Google’s preferred scenario.

Google first announced being able to “better understand the web (i.e., JavaScript)” in May 2014. Industry experts suggested that Google could crawl JavaScript way before this announcement. The iPullRank team offered two great pieces on this in 2011: Googlebot is Chrome and How smart are Googlebots? (thank you, Josh and Mike). Adam Audette’s Google can crawl JavaScript and leverages the DOM in 2015 confirmed. Therefore, if you can see your content in the DOM, chances are your content is being parsed by Google.

adamaudette - I don't always JavaScript, but when I do, I know google can crawl the dom and dynamically generated HTML

Recently, Barry Goralewicz performed a cool experiment testing a combination of various JavaScript libraries and frameworks to determine how Google interacts with the pages (e.g., are they indexing URL/content? How does GSC interact? Etc.). It ultimately showed that Google is able to interact with many forms of JavaScript and highlighted certain frameworks as perhaps more challenging. John Mueller even started a JavaScript search group (from what I’ve read, it’s fairly therapeutic).

All of these studies are amazing and help SEOs understand when to be concerned and take a proactive role. However, before you determine that sitting back is the right solution for your site, I recommend being actively cautious by experimenting with small section Think: Jim Collin’s “bullets, then cannonballs” philosophy from his book Great by Choice:

“A bullet is an empirical test aimed at learning what works and meets three criteria: a bullet must be low-cost, low-risk, and low-distraction… 10Xers use bullets to empirically validate what will actually work. Based on that empirical validation, they then concentrate their resources to fire a cannonball, enabling large returns from concentrated bets.”

Consider testing and reviewing through the following:

  1. Confirm that your content is appearing within the DOM.
  2. Test a subset of pages to see if Google can index content.
  • Manually check quotes from your content.
  • Fetch with Google and see if content appears.
  • Fetch with Google supposedly occurs around the load event or before timeout. It’s a great test to check to see if Google will be able to see your content and whether or not you’re blocking JavaScript in your robots.txt. Although Fetch with Google is not foolproof, it’s a good starting point.
  • Note: If you aren’t verified in GSC, try Technicalseo.com’s Fetch and Render As Any Bot Tool.

After you’ve tested all this, what if something’s not working and search engines and bots are struggling to index and obtain your content? Perhaps you’re concerned about alternative search engines (DuckDuckGo, Facebook, LinkedIn, etc.), or maybe you’re leveraging meta information that needs to be parsed by other bots, such as Twitter summary cards or Facebook Open Graph tags. If any of this is identified in testing or presents itself as a concern, an HTML snapshot may be the only decision.

2. HTML SNAPSHOTS
What are HTmL snapshots?

HTML snapshots are a fully rendered page (as one might see in the DOM) that can be returned to search engine bots (think: a static HTML version of the DOM).

Google introduced HTML snapshots 2009, deprecated (but still supported) them in 2015, and awkwardly mentioned them as an element to “avoid” in late 2016. HTML snapshots are a contentious topic with Google. However, they’re important to understand, because in certain situations they’re necessary.

If search engines (or sites like Facebook) cannot grasp your JavaScript, it’s better to return an HTML snapshot than not to have your content indexed and understood at all. Ideally, your site would leverage some form of user-agent detection on the server side and return the HTML snapshot to the bot.

At the same time, one must recognize that Google wants the same experience as the user (i.e., only provide Google with an HTML snapshot if the tests are dire and the JavaScript search group cannot provide support for your situation).

Considerations

When considering HTML snapshots, you must consider that Google has deprecated this AJAX recommendation. Although Google technically still supports it, Google recommends avoiding it. Yes, Google changed its mind and now want to receive the same experience as the user. This direction makes sense, as it allows the bot to receive an experience more true to the user experience.

A second consideration factor relates to the risk of cloaking. If the HTML snapshots are found to not represent the experience on the page, it’s considered a cloaking risk. Straight from the source:

“The HTML snapshot must contain the same content as the end user would see in a browser. If this is not the case, it may be considered cloaking.”
– Google Developer AJAX Crawling FAQs

Benefits

Despite the considerations, HTML snapshots have powerful advantages:

  1. Knowledge that search engines and crawlers will be able to understand the experience.
    • Certain types of JavaScript may be harder for Google to grasp (cough… Angular (also colloquially referred to as AngularJS 2) …cough).
  2. Other search engines and crawlers (think: Bing, Facebook) will be able to understand the experience.
    • Bing, among other search engines, has not stated that it can crawl and index JavaScript. HTML snapshots may be the only solution for a JavaScript-heavy site. As always, test to make sure that this is the case before diving in.

"It's not just Google understanding your JavaScript. It's also about the speed." -DOM - "It's not just about Google understanding your Javascript. it's also about your perceived latency." -DOM

Site latency

When browsers receive an HTML document and create the DOM (although there is some level of pre-scanning), most resources are loaded as they appear within the HTML document. This means that if you have a huge file toward the top of your HTML document, a browser will load that immense file first.

The concept of Google’s critical rendering path is to load what the user needs as soon as possible, which can be translated to → “get everything above-the-fold in front of the user, ASAP.”

Critical Rendering Path – Optimized Rendering Loads Progressively ASAP:

progressive page rendering

Image source

However, if you have unnecessary resources or JavaScript files clogging up the page’s ability to load, you get “render-blocking JavaScript.” Meaning: your JavaScript is blocking the page’s potential to appear as if it’s loading faster (also called: perceived latency).

Render-blocking JavaScript – Solutions

If you analyze your page speed results (through tools like Page Speed Insights Tool, WebPageTest.org, CatchPoint, etc.) and determine that there is a render-blocking JavaScript issue, here are three potential solutions:

  1. Inline: Add the JavaScript in the HTML document.
  2. Async: Make JavaScript asynchronous (i.e., add “async” attribute to HTML tag).
  3. Defer: By placing JavaScript lower within the HTML.

!!! Important note: It’s important to understand that scripts must be arranged in order of precedence. Scripts that are used to load the above-the-fold content must be prioritized and should not be deferred. Also, any script that references another file can only be used after the referenced file has loaded. Make sure to work closely with your development team to confirm that there are no interruptions to the user’s experience.

Read more: Google Developer’s Speed Documentation


TL;DR – Moral of the story

Crawlers and search engines will do their best to crawl, execute, and interpret your JavaScript, but it is not guaranteed. Make sure your content is crawlable, obtainable, and isn’t developing site latency obstructions. The key = every situation demands testing. Based on the results, evaluate potential solutions.

Thanks: Thank you Max Prin (@maxxeight) for reviewing this content piece and sharing your knowledge, insight, and wisdom. It wouldn’t be the same without you.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Posted by webpage development and how to code basic JavaScript.

javacssseo.gif

Image sources: 1, 2, 3

JavaScript is either placed in the HTML document within <script> tags (i.e., it is embedded in the HTML) or linked/referenced. There are currently a plethora of JavaScript libraries and frameworks, including jQuery, AngularJS, ReactJS, EmberJS, etc.

JavaScript libraries and frameworks:

What is AJAX?

AJAX, or Asynchronous JavaScript and XML, is a set of web development techniques combining JavaScript and XML that allows web applications to communicate with a server in the background without interfering with the current page. Asynchronous means that other functions or lines of code can run while the async script is running. XML used to be the primary language to pass data; however, the term AJAX is used for all types of data transfers (including JSON; I guess “AJAJ” doesn’t sound as clean as “AJAX” [pun intended]).

A common use of AJAX is to update the content or layout of a webpage without initiating a full page refresh. Normally, when a page loads, all the assets on the page must be requested and fetched from the server and then rendered on the page. However, with AJAX, only the assets that differ between pages need to be loaded, which improves the user experience as they do not have to refresh the entire page.

One can think of AJAX as mini server calls. A good example of AJAX in action is Google Maps. The page updates without a full page reload (i.e., mini server calls are being used to load content as the user navigates).

Related image

Image source

What is the Document Object Model (DOM)?

As an SEO professional, you need to understand what the DOM is, because it’s what Google is using to analyze and understand webpages.

The DOM is what you see when you “Inspect Element” in a browser. Simply put, you can think of the DOM as the steps the browser takes after receiving the HTML document to render the page.

The first thing the browser receives is the HTML document. After that, it will start parsing the content within this document and fetch additional resources, such as images, CSS, and JavaScript files.

The DOM is what forms from this parsing of information and resources. One can think of it as a structured, organized version of the webpage’s code.

Nowadays the DOM is often very different from the initial HTML document, due to what’s collectively called dynamic HTML. Dynamic HTML is the ability for a page to change its content depending on user input, environmental conditions (e.g. time of day), and other variables, leveraging HTML, CSS, and JavaScript.

Simple example with a <title> tag that is populated through JavaScript:

HTML source

DOM

What is headless browsing?

Headless browsing is simply the action of fetching webpages without the user interface. It is important to understand because Google, and now Baidu, leverage headless browsing to gain a better understanding of the user’s experience and the content of webpages.

PhantomJS and Zombie.js are scripted headless browsers, typically used for automating web interaction for testing purposes, and rendering static HTML snapshots for initial requests (pre-rendering).


Why can JavaScript be challenging for SEO? (and how to fix issues)

There are three (3) primary reasons to be concerned about JavaScript on your site:

  1. Crawlability: Bots’ ability to crawl your site.
  2. Obtainability: Bots’ ability to access information and parse your content.
  3. Perceived site latency: AKA the Critical Rendering Path.

Crawlability

Are bots able to find URLs and understand your site’s architecture? There are two important elements here:

  1. Blocking search engines from your JavaScript (even accidentally).
  2. Proper internal linking, not leveraging JavaScript events as a replacement for HTML tags.

Why is blocking JavaScript such a big deal?

If search engines are blocked from crawling JavaScript, they will not be receiving your site’s full experience. This means search engines are not seeing what the end user is seeing. This can reduce your site’s appeal to search engines and could eventually be considered cloaking (if the intent is indeed malicious).

Fetch as Google and TechnicalSEO.com’s robots.txt and Fetch and Render testing tools can help to identify resources that Googlebot is blocked from.

The easiest way to solve this problem is through providing search engines access to the resources they need to understand your user experience.

!!! Important note: Work with your development team to determine which files should and should not be accessible to search engines.

Internal linking

Internal linking should be implemented with regular anchor tags within the HTML or the DOM (using an a hrefs=”www.example.com” HTML tag) versus leveraging JavaScript functions to allow the user to traverse the site.

Essentially: Don’t use JavaScript’s onclick events as a replacement for internal linking. While end URLs might be found and crawled (through strings in JavaScript code or XML sitemaps), they won’t be associated with the global navigation of the site.

Internal linking is a strong signal to search engines regarding the site’s architecture and importance of pages. In fact, internal links are so strong that they can (in certain situations) override “SEO hints” such as canonical tags.

URL structure

Historically, JavaScript-based websites (aka “AJAX sites”) were using fragment identifiers (#) within URLs.

  • Not recommended:
    • The Lone Hash (#) – The lone pound symbol is not crawlable. It is used to identify anchor link (aka jump links). These are the links that allow one to jump to a piece of content on a page. Anything after the lone hash portion of the URL is never sent to the server and will cause the page to automatically scroll to the first element with a matching ID (or the first <a> element with a name of the following information). Google recommends avoiding the use of “#” in URLs.
    • Hashbang (#!) (and escaped_fragments URLs) – Hashbang URLs were a hack to support crawlers (Google wants to avoid now and only Bing supports). Many a moon ago, Google and Bing developed a complicated AJAX solution, whereby a pretty (#!) URL with the UX co-existed with an equivalent escaped_fragment HTML-based experience for bots. Google has since backtracked on this recommendation, preferring to receive the exact user experience. In escaped fragments, there are two experiences here:
      • Original Experience (aka Pretty URL): This URL must either have a #! (hashbang) within the URL to indicate that there is an escaped fragment or a meta element indicating that an escaped fragment exists (<meta name=”fragment” content=”!”>).
      • Escaped Fragment (aka Ugly URL, HTML snapshot): This URL replace the hashbang (#!) with “_escaped_fragment_” and serves the HTML snapshot. It is called the ugly URL because it’s long and looks like (and for all intents and purposes is) a hack.

Image result

Image source

  • Recommended:
    • pushState History API – PushState is navigation-based and part of the History API (think: your web browsing history). Essentially, pushState updates the URL in the address bar and only what needs to change on the page is updated. It allows JS sites to leverage “clean” URLs. PushState is currently supported by Google, when supporting browser navigation for client-side or hybrid rendering.
      • A good use of pushState is for infinite scroll (i.e., as the user hits new parts of the page the URL will update). Ideally, if the user refreshes the page, the experience will land them in the exact same spot. However, they do not need to refresh the page, as the content updates as they scroll down, while the URL is updated in the address bar.
      • Example: A good example of a search engine-friendly infinite scroll implementation, created by Google’s John Mueller (go figure), can be found here. He technically leverages the replaceState(), which doesn’t include the same back button functionality as pushState.
      • Read more: Mozilla PushState History API Documents

Obtainability

Search engines have been shown to employ headless browsing to render the DOM to gain a better understanding of the user’s experience and the content on page. That is to say, Google can process some JavaScript and uses the DOM (instead of the HTML document).

At the same time, there are situations where search engines struggle to comprehend JavaScript. Nobody wants a Hulu situation to happen to their site or a client’s site. It is crucial to understand how bots are interacting with your onsite content. When you aren’t sure, test.

Assuming we’re talking about a search engine bot that executes JavaScript, there are a few important elements for search engines to be able to obtain content:

  • If the user must interact for something to fire, search engines probably aren’t seeing it.
    • Google is a lazy user. It doesn’t click, it doesn’t scroll, and it doesn’t log in. If the full UX demands action from the user, special precautions should be taken to ensure that bots are receiving an equivalent experience.
  • If the JavaScript occurs after the JavaScript load event fires plus ~5-seconds*, search engines may not be seeing it.
    • *John Mueller mentioned that there is no specific timeout value; however, sites should aim to load within five seconds.
    • *Screaming Frog tests show a correlation to five seconds to render content.
    • *The load event plus five seconds is what Google’s PageSpeed Insights, Mobile Friendliness Tool, and Fetch as Google use; check out Max Prin’s test timer.
  • If there are errors within the JavaScript, both browsers and search engines won’t be able to go through and potentially miss sections of pages if the entire code is not executed.

How to make sure Google and other search engines can get your content

1. TEST

The most popular solution to resolving JavaScript is probably not resolving anything (grab a coffee and let Google work its algorithmic brilliance). Providing Google with the same experience as searchers is Google’s preferred scenario.

Google first announced being able to “better understand the web (i.e., JavaScript)” in May 2014. Industry experts suggested that Google could crawl JavaScript way before this announcement. The iPullRank team offered two great pieces on this in 2011: Googlebot is Chrome and How smart are Googlebots? (thank you, Josh and Mike). Adam Audette’s Google can crawl JavaScript and leverages the DOM in 2015 confirmed. Therefore, if you can see your content in the DOM, chances are your content is being parsed by Google.

adamaudette - I don't always JavaScript, but when I do, I know google can crawl the dom and dynamically generated HTML

Recently, Barry Goralewicz performed a cool experiment testing a combination of various JavaScript libraries and frameworks to determine how Google interacts with the pages (e.g., are they indexing URL/content? How does GSC interact? Etc.). It ultimately showed that Google is able to interact with many forms of JavaScript and highlighted certain frameworks as perhaps more challenging. John Mueller even started a JavaScript search group (from what I’ve read, it’s fairly therapeutic).

All of these studies are amazing and help SEOs understand when to be concerned and take a proactive role. However, before you determine that sitting back is the right solution for your site, I recommend being actively cautious by experimenting with small section Think: Jim Collin’s “bullets, then cannonballs” philosophy from his book Great by Choice:

“A bullet is an empirical test aimed at learning what works and meets three criteria: a bullet must be low-cost, low-risk, and low-distraction… 10Xers use bullets to empirically validate what will actually work. Based on that empirical validation, they then concentrate their resources to fire a cannonball, enabling large returns from concentrated bets.”

Consider testing and reviewing through the following:

  1. Confirm that your content is appearing within the DOM.
  2. Test a subset of pages to see if Google can index content.
  • Manually check quotes from your content.
  • Fetch with Google and see if content appears.
  • Fetch with Google supposedly occurs around the load event or before timeout. It’s a great test to check to see if Google will be able to see your content and whether or not you’re blocking JavaScript in your robots.txt. Although Fetch with Google is not foolproof, it’s a good starting point.
  • Note: If you aren’t verified in GSC, try Technicalseo.com’s Fetch and Render As Any Bot Tool.

After you’ve tested all this, what if something’s not working and search engines and bots are struggling to index and obtain your content? Perhaps you’re concerned about alternative search engines (DuckDuckGo, Facebook, LinkedIn, etc.), or maybe you’re leveraging meta information that needs to be parsed by other bots, such as Twitter summary cards or Facebook Open Graph tags. If any of this is identified in testing or presents itself as a concern, an HTML snapshot may be the only decision.

2. HTML SNAPSHOTS
What are HTmL snapshots?

HTML snapshots are a fully rendered page (as one might see in the DOM) that can be returned to search engine bots (think: a static HTML version of the DOM).

Google introduced HTML snapshots 2009, deprecated (but still supported) them in 2015, and awkwardly mentioned them as an element to “avoid” in late 2016. HTML snapshots are a contentious topic with Google. However, they’re important to understand, because in certain situations they’re necessary.

If search engines (or sites like Facebook) cannot grasp your JavaScript, it’s better to return an HTML snapshot than not to have your content indexed and understood at all. Ideally, your site would leverage some form of user-agent detection on the server side and return the HTML snapshot to the bot.

At the same time, one must recognize that Google wants the same experience as the user (i.e., only provide Google with an HTML snapshot if the tests are dire and the JavaScript search group cannot provide support for your situation).

Considerations

When considering HTML snapshots, you must consider that Google has deprecated this AJAX recommendation. Although Google technically still supports it, Google recommends avoiding it. Yes, Google changed its mind and now want to receive the same experience as the user. This direction makes sense, as it allows the bot to receive an experience more true to the user experience.

A second consideration factor relates to the risk of cloaking. If the HTML snapshots are found to not represent the experience on the page, it’s considered a cloaking risk. Straight from the source:

“The HTML snapshot must contain the same content as the end user would see in a browser. If this is not the case, it may be considered cloaking.”
– Google Developer AJAX Crawling FAQs

Benefits

Despite the considerations, HTML snapshots have powerful advantages:

  1. Knowledge that search engines and crawlers will be able to understand the experience.
    • Certain types of JavaScript may be harder for Google to grasp (cough… Angular (also colloquially referred to as AngularJS 2) …cough).
  2. Other search engines and crawlers (think: Bing, Facebook) will be able to understand the experience.
    • Bing, among other search engines, has not stated that it can crawl and index JavaScript. HTML snapshots may be the only solution for a JavaScript-heavy site. As always, test to make sure that this is the case before diving in.

"It's not just Google understanding your JavaScript. It's also about the speed." -DOM - "It's not just about Google understanding your Javascript. it's also about your perceived latency." -DOM

Site latency

When browsers receive an HTML document and create the DOM (although there is some level of pre-scanning), most resources are loaded as they appear within the HTML document. This means that if you have a huge file toward the top of your HTML document, a browser will load that immense file first.

The concept of Google’s critical rendering path is to load what the user needs as soon as possible, which can be translated to → “get everything above-the-fold in front of the user, ASAP.”

Critical Rendering Path – Optimized Rendering Loads Progressively ASAP:

progressive page rendering

Image source

However, if you have unnecessary resources or JavaScript files clogging up the page’s ability to load, you get “render-blocking JavaScript.” Meaning: your JavaScript is blocking the page’s potential to appear as if it’s loading faster (also called: perceived latency).

Render-blocking JavaScript – Solutions

If you analyze your page speed results (through tools like Page Speed Insights Tool, WebPageTest.org, CatchPoint, etc.) and determine that there is a render-blocking JavaScript issue, here are three potential solutions:

  1. Inline: Add the JavaScript in the HTML document.
  2. Async: Make JavaScript asynchronous (i.e., add “async” attribute to HTML tag).
  3. Defer: By placing JavaScript lower within the HTML.

!!! Important note: It’s important to understand that scripts must be arranged in order of precedence. Scripts that are used to load the above-the-fold content must be prioritized and should not be deferred. Also, any script that references another file can only be used after the referenced file has loaded. Make sure to work closely with your development team to confirm that there are no interruptions to the user’s experience.

Read more: Google Developer’s Speed Documentation


TL;DR – Moral of the story

Crawlers and search engines will do their best to crawl, execute, and interpret your JavaScript, but it is not guaranteed. Make sure your content is crawlable, obtainable, and isn’t developing site latency obstructions. The key = every situation demands testing. Based on the results, evaluate potential solutions.

Thanks: Thank you Max Prin (@maxxeight) for reviewing this content piece and sharing your knowledge, insight, and wisdom. It wouldn’t be the same without you.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

“”

JavaScript & SEO: Making Your Bot Experience As Good As Your User Experience

Posted by webpage development and how to code basic JavaScript.

javacssseo.gif

Image sources: 1, 2, 3

JavaScript is either placed in the HTML document within <script> tags (i.e., it is embedded in the HTML) or linked/referenced. There are currently a plethora of JavaScript libraries and frameworks, including jQuery, AngularJS, ReactJS, EmberJS, etc.

JavaScript libraries and frameworks:

What is AJAX?

AJAX, or Asynchronous JavaScript and XML, is a set of web development techniques combining JavaScript and XML that allows web applications to communicate with a server in the background without interfering with the current page. Asynchronous means that other functions or lines of code can run while the async script is running. XML used to be the primary language to pass data; however, the term AJAX is used for all types of data transfers (including JSON; I guess “AJAJ” doesn’t sound as clean as “AJAX” [pun intended]).

A common use of AJAX is to update the content or layout of a webpage without initiating a full page refresh. Normally, when a page loads, all the assets on the page must be requested and fetched from the server and then rendered on the page. However, with AJAX, only the assets that differ between pages need to be loaded, which improves the user experience as they do not have to refresh the entire page.

One can think of AJAX as mini server calls. A good example of AJAX in action is Google Maps. The page updates without a full page reload (i.e., mini server calls are being used to load content as the user navigates).

Related image

Image source

What is the Document Object Model (DOM)?

As an SEO professional, you need to understand what the DOM is, because it’s what Google is using to analyze and understand webpages.

The DOM is what you see when you “Inspect Element” in a browser. Simply put, you can think of the DOM as the steps the browser takes after receiving the HTML document to render the page.

The first thing the browser receives is the HTML document. After that, it will start parsing the content within this document and fetch additional resources, such as images, CSS, and JavaScript files.

The DOM is what forms from this parsing of information and resources. One can think of it as a structured, organized version of the webpage’s code.

Nowadays the DOM is often very different from the initial HTML document, due to what’s collectively called dynamic HTML. Dynamic HTML is the ability for a page to change its content depending on user input, environmental conditions (e.g. time of day), and other variables, leveraging HTML, CSS, and JavaScript.

Simple example with a <title> tag that is populated through JavaScript:

HTML source

DOM

What is headless browsing?

Headless browsing is simply the action of fetching webpages without the user interface. It is important to understand because Google, and now Baidu, leverage headless browsing to gain a better understanding of the user’s experience and the content of webpages.

PhantomJS and Zombie.js are scripted headless browsers, typically used for automating web interaction for testing purposes, and rendering static HTML snapshots for initial requests (pre-rendering).


Why can JavaScript be challenging for SEO? (and how to fix issues)

There are three (3) primary reasons to be concerned about JavaScript on your site:

  1. Crawlability: Bots’ ability to crawl your site.
  2. Obtainability: Bots’ ability to access information and parse your content.
  3. Perceived site latency: AKA the Critical Rendering Path.

Crawlability

Are bots able to find URLs and understand your site’s architecture? There are two important elements here:

  1. Blocking search engines from your JavaScript (even accidentally).
  2. Proper internal linking, not leveraging JavaScript events as a replacement for HTML tags.

Why is blocking JavaScript such a big deal?

If search engines are blocked from crawling JavaScript, they will not be receiving your site’s full experience. This means search engines are not seeing what the end user is seeing. This can reduce your site’s appeal to search engines and could eventually be considered cloaking (if the intent is indeed malicious).

Fetch as Google and TechnicalSEO.com’s robots.txt and Fetch and Render testing tools can help to identify resources that Googlebot is blocked from.

The easiest way to solve this problem is through providing search engines access to the resources they need to understand your user experience.

!!! Important note: Work with your development team to determine which files should and should not be accessible to search engines.

Internal linking

Internal linking should be implemented with regular anchor tags within the HTML or the DOM (using an a hrefs=”www.example.com” HTML tag) versus leveraging JavaScript functions to allow the user to traverse the site.

Essentially: Don’t use JavaScript’s onclick events as a replacement for internal linking. While end URLs might be found and crawled (through strings in JavaScript code or XML sitemaps), they won’t be associated with the global navigation of the site.

Internal linking is a strong signal to search engines regarding the site’s architecture and importance of pages. In fact, internal links are so strong that they can (in certain situations) override “SEO hints” such as canonical tags.

URL structure

Historically, JavaScript-based websites (aka “AJAX sites”) were using fragment identifiers (#) within URLs.

  • Not recommended:
    • The Lone Hash (#) – The lone pound symbol is not crawlable. It is used to identify anchor link (aka jump links). These are the links that allow one to jump to a piece of content on a page. Anything after the lone hash portion of the URL is never sent to the server and will cause the page to automatically scroll to the first element with a matching ID (or the first <a> element with a name of the following information). Google recommends avoiding the use of “#” in URLs.
    • Hashbang (#!) (and escaped_fragments URLs) – Hashbang URLs were a hack to support crawlers (Google wants to avoid now and only Bing supports). Many a moon ago, Google and Bing developed a complicated AJAX solution, whereby a pretty (#!) URL with the UX co-existed with an equivalent escaped_fragment HTML-based experience for bots. Google has since backtracked on this recommendation, preferring to receive the exact user experience. In escaped fragments, there are two experiences here:
      • Original Experience (aka Pretty URL): This URL must either have a #! (hashbang) within the URL to indicate that there is an escaped fragment or a meta element indicating that an escaped fragment exists (<meta name=”fragment” content=”!”>).
      • Escaped Fragment (aka Ugly URL, HTML snapshot): This URL replace the hashbang (#!) with “_escaped_fragment_” and serves the HTML snapshot. It is called the ugly URL because it’s long and looks like (and for all intents and purposes is) a hack.

Image result

Image source

  • Recommended:
    • pushState History API – PushState is navigation-based and part of the History API (think: your web browsing history). Essentially, pushState updates the URL in the address bar and only what needs to change on the page is updated. It allows JS sites to leverage “clean” URLs. PushState is currently supported by Google, when supporting browser navigation for client-side or hybrid rendering.
      • A good use of pushState is for infinite scroll (i.e., as the user hits new parts of the page the URL will update). Ideally, if the user refreshes the page, the experience will land them in the exact same spot. However, they do not need to refresh the page, as the content updates as they scroll down, while the URL is updated in the address bar.
      • Example: A good example of a search engine-friendly infinite scroll implementation, created by Google’s John Mueller (go figure), can be found here. He technically leverages the replaceState(), which doesn’t include the same back button functionality as pushState.
      • Read more: Mozilla PushState History API Documents

Obtainability

Search engines have been shown to employ headless browsing to render the DOM to gain a better understanding of the user’s experience and the content on page. That is to say, Google can process some JavaScript and uses the DOM (instead of the HTML document).

At the same time, there are situations where search engines struggle to comprehend JavaScript. Nobody wants a Hulu situation to happen to their site or a client’s site. It is crucial to understand how bots are interacting with your onsite content. When you aren’t sure, test.

Assuming we’re talking about a search engine bot that executes JavaScript, there are a few important elements for search engines to be able to obtain content:

  • If the user must interact for something to fire, search engines probably aren’t seeing it.
    • Google is a lazy user. It doesn’t click, it doesn’t scroll, and it doesn’t log in. If the full UX demands action from the user, special precautions should be taken to ensure that bots are receiving an equivalent experience.
  • If the JavaScript occurs after the JavaScript load event fires plus ~5-seconds*, search engines may not be seeing it.
    • *John Mueller mentioned that there is no specific timeout value; however, sites should aim to load within five seconds.
    • *Screaming Frog tests show a correlation to five seconds to render content.
    • *The load event plus five seconds is what Google’s PageSpeed Insights, Mobile Friendliness Tool, and Fetch as Google use; check out Max Prin’s test timer.
  • If there are errors within the JavaScript, both browsers and search engines won’t be able to go through and potentially miss sections of pages if the entire code is not executed.

How to make sure Google and other search engines can get your content

1. TEST

The most popular solution to resolving JavaScript is probably not resolving anything (grab a coffee and let Google work its algorithmic brilliance). Providing Google with the same experience as searchers is Google’s preferred scenario.

Google first announced being able to “better understand the web (i.e., JavaScript)” in May 2014. Industry experts suggested that Google could crawl JavaScript way before this announcement. The iPullRank team offered two great pieces on this in 2011: Googlebot is Chrome and How smart are Googlebots? (thank you, Josh and Mike). Adam Audette’s Google can crawl JavaScript and leverages the DOM in 2015 confirmed. Therefore, if you can see your content in the DOM, chances are your content is being parsed by Google.

adamaudette - I don't always JavaScript, but when I do, I know google can crawl the dom and dynamically generated HTML

Recently, Barry Goralewicz performed a cool experiment testing a combination of various JavaScript libraries and frameworks to determine how Google interacts with the pages (e.g., are they indexing URL/content? How does GSC interact? Etc.). It ultimately showed that Google is able to interact with many forms of JavaScript and highlighted certain frameworks as perhaps more challenging. John Mueller even started a JavaScript search group (from what I’ve read, it’s fairly therapeutic).

All of these studies are amazing and help SEOs understand when to be concerned and take a proactive role. However, before you determine that sitting back is the right solution for your site, I recommend being actively cautious by experimenting with small section Think: Jim Collin’s “bullets, then cannonballs” philosophy from his book Great by Choice:

“A bullet is an empirical test aimed at learning what works and meets three criteria: a bullet must be low-cost, low-risk, and low-distraction… 10Xers use bullets to empirically validate what will actually work. Based on that empirical validation, they then concentrate their resources to fire a cannonball, enabling large returns from concentrated bets.”

Consider testing and reviewing through the following:

  1. Confirm that your content is appearing within the DOM.
  2. Test a subset of pages to see if Google can index content.
  • Manually check quotes from your content.
  • Fetch with Google and see if content appears.
  • Fetch with Google supposedly occurs around the load event or before timeout. It’s a great test to check to see if Google will be able to see your content and whether or not you’re blocking JavaScript in your robots.txt. Although Fetch with Google is not foolproof, it’s a good starting point.
  • Note: If you aren’t verified in GSC, try Technicalseo.com’s Fetch and Render As Any Bot Tool.

After you’ve tested all this, what if something’s not working and search engines and bots are struggling to index and obtain your content? Perhaps you’re concerned about alternative search engines (DuckDuckGo, Facebook, LinkedIn, etc.), or maybe you’re leveraging meta information that needs to be parsed by other bots, such as Twitter summary cards or Facebook Open Graph tags. If any of this is identified in testing or presents itself as a concern, an HTML snapshot may be the only decision.

2. HTML SNAPSHOTS
What are HTmL snapshots?

HTML snapshots are a fully rendered page (as one might see in the DOM) that can be returned to search engine bots (think: a static HTML version of the DOM).

Google introduced HTML snapshots 2009, deprecated (but still supported) them in 2015, and awkwardly mentioned them as an element to “avoid” in late 2016. HTML snapshots are a contentious topic with Google. However, they’re important to understand, because in certain situations they’re necessary.

If search engines (or sites like Facebook) cannot grasp your JavaScript, it’s better to return an HTML snapshot than not to have your content indexed and understood at all. Ideally, your site would leverage some form of user-agent detection on the server side and return the HTML snapshot to the bot.

At the same time, one must recognize that Google wants the same experience as the user (i.e., only provide Google with an HTML snapshot if the tests are dire and the JavaScript search group cannot provide support for your situation).

Considerations

When considering HTML snapshots, you must consider that Google has deprecated this AJAX recommendation. Although Google technically still supports it, Google recommends avoiding it. Yes, Google changed its mind and now want to receive the same experience as the user. This direction makes sense, as it allows the bot to receive an experience more true to the user experience.

A second consideration factor relates to the risk of cloaking. If the HTML snapshots are found to not represent the experience on the page, it’s considered a cloaking risk. Straight from the source:

“The HTML snapshot must contain the same content as the end user would see in a browser. If this is not the case, it may be considered cloaking.”
– Google Developer AJAX Crawling FAQs

Benefits

Despite the considerations, HTML snapshots have powerful advantages:

  1. Knowledge that search engines and crawlers will be able to understand the experience.
    • Certain types of JavaScript may be harder for Google to grasp (cough… Angular (also colloquially referred to as AngularJS 2) …cough).
  2. Other search engines and crawlers (think: Bing, Facebook) will be able to understand the experience.
    • Bing, among other search engines, has not stated that it can crawl and index JavaScript. HTML snapshots may be the only solution for a JavaScript-heavy site. As always, test to make sure that this is the case before diving in.

"It's not just Google understanding your JavaScript. It's also about the speed." -DOM - "It's not just about Google understanding your Javascript. it's also about your perceived latency." -DOM

Site latency

When browsers receive an HTML document and create the DOM (although there is some level of pre-scanning), most resources are loaded as they appear within the HTML document. This means that if you have a huge file toward the top of your HTML document, a browser will load that immense file first.

The concept of Google’s critical rendering path is to load what the user needs as soon as possible, which can be translated to → “get everything above-the-fold in front of the user, ASAP.”

Critical Rendering Path – Optimized Rendering Loads Progressively ASAP:

progressive page rendering

Image source

However, if you have unnecessary resources or JavaScript files clogging up the page’s ability to load, you get “render-blocking JavaScript.” Meaning: your JavaScript is blocking the page’s potential to appear as if it’s loading faster (also called: perceived latency).

Render-blocking JavaScript – Solutions

If you analyze your page speed results (through tools like Page Speed Insights Tool, WebPageTest.org, CatchPoint, etc.) and determine that there is a render-blocking JavaScript issue, here are three potential solutions:

  1. Inline: Add the JavaScript in the HTML document.
  2. Async: Make JavaScript asynchronous (i.e., add “async” attribute to HTML tag).
  3. Defer: By placing JavaScript lower within the HTML.

!!! Important note: It’s important to understand that scripts must be arranged in order of precedence. Scripts that are used to load the above-the-fold content must be prioritized and should not be deferred. Also, any script that references another file can only be used after the referenced file has loaded. Make sure to work closely with your development team to confirm that there are no interruptions to the user’s experience.

Read more: Google Developer’s Speed Documentation


TL;DR – Moral of the story

Crawlers and search engines will do their best to crawl, execute, and interpret your JavaScript, but it is not guaranteed. Make sure your content is crawlable, obtainable, and isn’t developing site latency obstructions. The key = every situation demands testing. Based on the results, evaluate potential solutions.

Thanks: Thank you Max Prin (@maxxeight) for reviewing this content piece and sharing your knowledge, insight, and wisdom. It wouldn’t be the same without you.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Posted by webpage development and how to code basic JavaScript.

javacssseo.gif

Image sources: 1, 2, 3

JavaScript is either placed in the HTML document within <script> tags (i.e., it is embedded in the HTML) or linked/referenced. There are currently a plethora of JavaScript libraries and frameworks, including jQuery, AngularJS, ReactJS, EmberJS, etc.

JavaScript libraries and frameworks:

What is AJAX?

AJAX, or Asynchronous JavaScript and XML, is a set of web development techniques combining JavaScript and XML that allows web applications to communicate with a server in the background without interfering with the current page. Asynchronous means that other functions or lines of code can run while the async script is running. XML used to be the primary language to pass data; however, the term AJAX is used for all types of data transfers (including JSON; I guess “AJAJ” doesn’t sound as clean as “AJAX” [pun intended]).

A common use of AJAX is to update the content or layout of a webpage without initiating a full page refresh. Normally, when a page loads, all the assets on the page must be requested and fetched from the server and then rendered on the page. However, with AJAX, only the assets that differ between pages need to be loaded, which improves the user experience as they do not have to refresh the entire page.

One can think of AJAX as mini server calls. A good example of AJAX in action is Google Maps. The page updates without a full page reload (i.e., mini server calls are being used to load content as the user navigates).

Related image

Image source

What is the Document Object Model (DOM)?

As an SEO professional, you need to understand what the DOM is, because it’s what Google is using to analyze and understand webpages.

The DOM is what you see when you “Inspect Element” in a browser. Simply put, you can think of the DOM as the steps the browser takes after receiving the HTML document to render the page.

The first thing the browser receives is the HTML document. After that, it will start parsing the content within this document and fetch additional resources, such as images, CSS, and JavaScript files.

The DOM is what forms from this parsing of information and resources. One can think of it as a structured, organized version of the webpage’s code.

Nowadays the DOM is often very different from the initial HTML document, due to what’s collectively called dynamic HTML. Dynamic HTML is the ability for a page to change its content depending on user input, environmental conditions (e.g. time of day), and other variables, leveraging HTML, CSS, and JavaScript.

Simple example with a <title> tag that is populated through JavaScript:

HTML source

DOM

What is headless browsing?

Headless browsing is simply the action of fetching webpages without the user interface. It is important to understand because Google, and now Baidu, leverage headless browsing to gain a better understanding of the user’s experience and the content of webpages.

PhantomJS and Zombie.js are scripted headless browsers, typically used for automating web interaction for testing purposes, and rendering static HTML snapshots for initial requests (pre-rendering).


Why can JavaScript be challenging for SEO? (and how to fix issues)

There are three (3) primary reasons to be concerned about JavaScript on your site:

  1. Crawlability: Bots’ ability to crawl your site.
  2. Obtainability: Bots’ ability to access information and parse your content.
  3. Perceived site latency: AKA the Critical Rendering Path.

Crawlability

Are bots able to find URLs and understand your site’s architecture? There are two important elements here:

  1. Blocking search engines from your JavaScript (even accidentally).
  2. Proper internal linking, not leveraging JavaScript events as a replacement for HTML tags.

Why is blocking JavaScript such a big deal?

If search engines are blocked from crawling JavaScript, they will not be receiving your site’s full experience. This means search engines are not seeing what the end user is seeing. This can reduce your site’s appeal to search engines and could eventually be considered cloaking (if the intent is indeed malicious).

Fetch as Google and TechnicalSEO.com’s robots.txt and Fetch and Render testing tools can help to identify resources that Googlebot is blocked from.

The easiest way to solve this problem is through providing search engines access to the resources they need to understand your user experience.

!!! Important note: Work with your development team to determine which files should and should not be accessible to search engines.

Internal linking

Internal linking should be implemented with regular anchor tags within the HTML or the DOM (using an a hrefs=”www.example.com” HTML tag) versus leveraging JavaScript functions to allow the user to traverse the site.

Essentially: Don’t use JavaScript’s onclick events as a replacement for internal linking. While end URLs might be found and crawled (through strings in JavaScript code or XML sitemaps), they won’t be associated with the global navigation of the site.

Internal linking is a strong signal to search engines regarding the site’s architecture and importance of pages. In fact, internal links are so strong that they can (in certain situations) override “SEO hints” such as canonical tags.

URL structure

Historically, JavaScript-based websites (aka “AJAX sites”) were using fragment identifiers (#) within URLs.

  • Not recommended:
    • The Lone Hash (#) – The lone pound symbol is not crawlable. It is used to identify anchor link (aka jump links). These are the links that allow one to jump to a piece of content on a page. Anything after the lone hash portion of the URL is never sent to the server and will cause the page to automatically scroll to the first element with a matching ID (or the first <a> element with a name of the following information). Google recommends avoiding the use of “#” in URLs.
    • Hashbang (#!) (and escaped_fragments URLs) – Hashbang URLs were a hack to support crawlers (Google wants to avoid now and only Bing supports). Many a moon ago, Google and Bing developed a complicated AJAX solution, whereby a pretty (#!) URL with the UX co-existed with an equivalent escaped_fragment HTML-based experience for bots. Google has since backtracked on this recommendation, preferring to receive the exact user experience. In escaped fragments, there are two experiences here:
      • Original Experience (aka Pretty URL): This URL must either have a #! (hashbang) within the URL to indicate that there is an escaped fragment or a meta element indicating that an escaped fragment exists (<meta name=”fragment” content=”!”>).
      • Escaped Fragment (aka Ugly URL, HTML snapshot): This URL replace the hashbang (#!) with “_escaped_fragment_” and serves the HTML snapshot. It is called the ugly URL because it’s long and looks like (and for all intents and purposes is) a hack.

Image result

Image source

  • Recommended:
    • pushState History API – PushState is navigation-based and part of the History API (think: your web browsing history). Essentially, pushState updates the URL in the address bar and only what needs to change on the page is updated. It allows JS sites to leverage “clean” URLs. PushState is currently supported by Google, when supporting browser navigation for client-side or hybrid rendering.
      • A good use of pushState is for infinite scroll (i.e., as the user hits new parts of the page the URL will update). Ideally, if the user refreshes the page, the experience will land them in the exact same spot. However, they do not need to refresh the page, as the content updates as they scroll down, while the URL is updated in the address bar.
      • Example: A good example of a search engine-friendly infinite scroll implementation, created by Google’s John Mueller (go figure), can be found here. He technically leverages the replaceState(), which doesn’t include the same back button functionality as pushState.
      • Read more: Mozilla PushState History API Documents

Obtainability

Search engines have been shown to employ headless browsing to render the DOM to gain a better understanding of the user’s experience and the content on page. That is to say, Google can process some JavaScript and uses the DOM (instead of the HTML document).

At the same time, there are situations where search engines struggle to comprehend JavaScript. Nobody wants a Hulu situation to happen to their site or a client’s site. It is crucial to understand how bots are interacting with your onsite content. When you aren’t sure, test.

Assuming we’re talking about a search engine bot that executes JavaScript, there are a few important elements for search engines to be able to obtain content:

  • If the user must interact for something to fire, search engines probably aren’t seeing it.
    • Google is a lazy user. It doesn’t click, it doesn’t scroll, and it doesn’t log in. If the full UX demands action from the user, special precautions should be taken to ensure that bots are receiving an equivalent experience.
  • If the JavaScript occurs after the JavaScript load event fires plus ~5-seconds*, search engines may not be seeing it.
    • *John Mueller mentioned that there is no specific timeout value; however, sites should aim to load within five seconds.
    • *Screaming Frog tests show a correlation to five seconds to render content.
    • *The load event plus five seconds is what Google’s PageSpeed Insights, Mobile Friendliness Tool, and Fetch as Google use; check out Max Prin’s test timer.
  • If there are errors within the JavaScript, both browsers and search engines won’t be able to go through and potentially miss sections of pages if the entire code is not executed.

How to make sure Google and other search engines can get your content

1. TEST

The most popular solution to resolving JavaScript is probably not resolving anything (grab a coffee and let Google work its algorithmic brilliance). Providing Google with the same experience as searchers is Google’s preferred scenario.

Google first announced being able to “better understand the web (i.e., JavaScript)” in May 2014. Industry experts suggested that Google could crawl JavaScript way before this announcement. The iPullRank team offered two great pieces on this in 2011: Googlebot is Chrome and How smart are Googlebots? (thank you, Josh and Mike). Adam Audette’s Google can crawl JavaScript and leverages the DOM in 2015 confirmed. Therefore, if you can see your content in the DOM, chances are your content is being parsed by Google.

adamaudette - I don't always JavaScript, but when I do, I know google can crawl the dom and dynamically generated HTML

Recently, Barry Goralewicz performed a cool experiment testing a combination of various JavaScript libraries and frameworks to determine how Google interacts with the pages (e.g., are they indexing URL/content? How does GSC interact? Etc.). It ultimately showed that Google is able to interact with many forms of JavaScript and highlighted certain frameworks as perhaps more challenging. John Mueller even started a JavaScript search group (from what I’ve read, it’s fairly therapeutic).

All of these studies are amazing and help SEOs understand when to be concerned and take a proactive role. However, before you determine that sitting back is the right solution for your site, I recommend being actively cautious by experimenting with small section Think: Jim Collin’s “bullets, then cannonballs” philosophy from his book Great by Choice:

“A bullet is an empirical test aimed at learning what works and meets three criteria: a bullet must be low-cost, low-risk, and low-distraction… 10Xers use bullets to empirically validate what will actually work. Based on that empirical validation, they then concentrate their resources to fire a cannonball, enabling large returns from concentrated bets.”

Consider testing and reviewing through the following:

  1. Confirm that your content is appearing within the DOM.
  2. Test a subset of pages to see if Google can index content.
  • Manually check quotes from your content.
  • Fetch with Google and see if content appears.
  • Fetch with Google supposedly occurs around the load event or before timeout. It’s a great test to check to see if Google will be able to see your content and whether or not you’re blocking JavaScript in your robots.txt. Although Fetch with Google is not foolproof, it’s a good starting point.
  • Note: If you aren’t verified in GSC, try Technicalseo.com’s Fetch and Render As Any Bot Tool.

After you’ve tested all this, what if something’s not working and search engines and bots are struggling to index and obtain your content? Perhaps you’re concerned about alternative search engines (DuckDuckGo, Facebook, LinkedIn, etc.), or maybe you’re leveraging meta information that needs to be parsed by other bots, such as Twitter summary cards or Facebook Open Graph tags. If any of this is identified in testing or presents itself as a concern, an HTML snapshot may be the only decision.

2. HTML SNAPSHOTS
What are HTmL snapshots?

HTML snapshots are a fully rendered page (as one might see in the DOM) that can be returned to search engine bots (think: a static HTML version of the DOM).

Google introduced HTML snapshots 2009, deprecated (but still supported) them in 2015, and awkwardly mentioned them as an element to “avoid” in late 2016. HTML snapshots are a contentious topic with Google. However, they’re important to understand, because in certain situations they’re necessary.

If search engines (or sites like Facebook) cannot grasp your JavaScript, it’s better to return an HTML snapshot than not to have your content indexed and understood at all. Ideally, your site would leverage some form of user-agent detection on the server side and return the HTML snapshot to the bot.

At the same time, one must recognize that Google wants the same experience as the user (i.e., only provide Google with an HTML snapshot if the tests are dire and the JavaScript search group cannot provide support for your situation).

Considerations

When considering HTML snapshots, you must consider that Google has deprecated this AJAX recommendation. Although Google technically still supports it, Google recommends avoiding it. Yes, Google changed its mind and now want to receive the same experience as the user. This direction makes sense, as it allows the bot to receive an experience more true to the user experience.

A second consideration factor relates to the risk of cloaking. If the HTML snapshots are found to not represent the experience on the page, it’s considered a cloaking risk. Straight from the source:

“The HTML snapshot must contain the same content as the end user would see in a browser. If this is not the case, it may be considered cloaking.”
– Google Developer AJAX Crawling FAQs

Benefits

Despite the considerations, HTML snapshots have powerful advantages:

  1. Knowledge that search engines and crawlers will be able to understand the experience.
    • Certain types of JavaScript may be harder for Google to grasp (cough… Angular (also colloquially referred to as AngularJS 2) …cough).
  2. Other search engines and crawlers (think: Bing, Facebook) will be able to understand the experience.
    • Bing, among other search engines, has not stated that it can crawl and index JavaScript. HTML snapshots may be the only solution for a JavaScript-heavy site. As always, test to make sure that this is the case before diving in.

"It's not just Google understanding your JavaScript. It's also about the speed." -DOM - "It's not just about Google understanding your Javascript. it's also about your perceived latency." -DOM

Site latency

When browsers receive an HTML document and create the DOM (although there is some level of pre-scanning), most resources are loaded as they appear within the HTML document. This means that if you have a huge file toward the top of your HTML document, a browser will load that immense file first.

The concept of Google’s critical rendering path is to load what the user needs as soon as possible, which can be translated to → “get everything above-the-fold in front of the user, ASAP.”

Critical Rendering Path – Optimized Rendering Loads Progressively ASAP:

progressive page rendering

Image source

However, if you have unnecessary resources or JavaScript files clogging up the page’s ability to load, you get “render-blocking JavaScript.” Meaning: your JavaScript is blocking the page’s potential to appear as if it’s loading faster (also called: perceived latency).

Render-blocking JavaScript – Solutions

If you analyze your page speed results (through tools like Page Speed Insights Tool, WebPageTest.org, CatchPoint, etc.) and determine that there is a render-blocking JavaScript issue, here are three potential solutions:

  1. Inline: Add the JavaScript in the HTML document.
  2. Async: Make JavaScript asynchronous (i.e., add “async” attribute to HTML tag).
  3. Defer: By placing JavaScript lower within the HTML.

!!! Important note: It’s important to understand that scripts must be arranged in order of precedence. Scripts that are used to load the above-the-fold content must be prioritized and should not be deferred. Also, any script that references another file can only be used after the referenced file has loaded. Make sure to work closely with your development team to confirm that there are no interruptions to the user’s experience.

Read more: Google Developer’s Speed Documentation


TL;DR – Moral of the story

Crawlers and search engines will do their best to crawl, execute, and interpret your JavaScript, but it is not guaranteed. Make sure your content is crawlable, obtainable, and isn’t developing site latency obstructions. The key = every situation demands testing. Based on the results, evaluate potential solutions.

Thanks: Thank you Max Prin (@maxxeight) for reviewing this content piece and sharing your knowledge, insight, and wisdom. It wouldn’t be the same without you.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

“”

JavaScript & SEO: Making Your Bot Experience As Good As Your User Experience

Posted by webpage development and how to code basic JavaScript.

javacssseo.gif

Image sources: 1, 2, 3

JavaScript is either placed in the HTML document within <script> tags (i.e., it is embedded in the HTML) or linked/referenced. There are currently a plethora of JavaScript libraries and frameworks, including jQuery, AngularJS, ReactJS, EmberJS, etc.

JavaScript libraries and frameworks:

What is AJAX?

AJAX, or Asynchronous JavaScript and XML, is a set of web development techniques combining JavaScript and XML that allows web applications to communicate with a server in the background without interfering with the current page. Asynchronous means that other functions or lines of code can run while the async script is running. XML used to be the primary language to pass data; however, the term AJAX is used for all types of data transfers (including JSON; I guess “AJAJ” doesn’t sound as clean as “AJAX” [pun intended]).

A common use of AJAX is to update the content or layout of a webpage without initiating a full page refresh. Normally, when a page loads, all the assets on the page must be requested and fetched from the server and then rendered on the page. However, with AJAX, only the assets that differ between pages need to be loaded, which improves the user experience as they do not have to refresh the entire page.

One can think of AJAX as mini server calls. A good example of AJAX in action is Google Maps. The page updates without a full page reload (i.e., mini server calls are being used to load content as the user navigates).

Related image

Image source

What is the Document Object Model (DOM)?

As an SEO professional, you need to understand what the DOM is, because it’s what Google is using to analyze and understand webpages.

The DOM is what you see when you “Inspect Element” in a browser. Simply put, you can think of the DOM as the steps the browser takes after receiving the HTML document to render the page.

The first thing the browser receives is the HTML document. After that, it will start parsing the content within this document and fetch additional resources, such as images, CSS, and JavaScript files.

The DOM is what forms from this parsing of information and resources. One can think of it as a structured, organized version of the webpage’s code.

Nowadays the DOM is often very different from the initial HTML document, due to what’s collectively called dynamic HTML. Dynamic HTML is the ability for a page to change its content depending on user input, environmental conditions (e.g. time of day), and other variables, leveraging HTML, CSS, and JavaScript.

Simple example with a <title> tag that is populated through JavaScript:

HTML source

DOM

What is headless browsing?

Headless browsing is simply the action of fetching webpages without the user interface. It is important to understand because Google, and now Baidu, leverage headless browsing to gain a better understanding of the user’s experience and the content of webpages.

PhantomJS and Zombie.js are scripted headless browsers, typically used for automating web interaction for testing purposes, and rendering static HTML snapshots for initial requests (pre-rendering).


Why can JavaScript be challenging for SEO? (and how to fix issues)

There are three (3) primary reasons to be concerned about JavaScript on your site:

  1. Crawlability: Bots’ ability to crawl your site.
  2. Obtainability: Bots’ ability to access information and parse your content.
  3. Perceived site latency: AKA the Critical Rendering Path.

Crawlability

Are bots able to find URLs and understand your site’s architecture? There are two important elements here:

  1. Blocking search engines from your JavaScript (even accidentally).
  2. Proper internal linking, not leveraging JavaScript events as a replacement for HTML tags.

Why is blocking JavaScript such a big deal?

If search engines are blocked from crawling JavaScript, they will not be receiving your site’s full experience. This means search engines are not seeing what the end user is seeing. This can reduce your site’s appeal to search engines and could eventually be considered cloaking (if the intent is indeed malicious).

Fetch as Google and TechnicalSEO.com’s robots.txt and Fetch and Render testing tools can help to identify resources that Googlebot is blocked from.

The easiest way to solve this problem is through providing search engines access to the resources they need to understand your user experience.

!!! Important note: Work with your development team to determine which files should and should not be accessible to search engines.

Internal linking

Internal linking should be implemented with regular anchor tags within the HTML or the DOM (using an a hrefs=”www.example.com” HTML tag) versus leveraging JavaScript functions to allow the user to traverse the site.

Essentially: Don’t use JavaScript’s onclick events as a replacement for internal linking. While end URLs might be found and crawled (through strings in JavaScript code or XML sitemaps), they won’t be associated with the global navigation of the site.

Internal linking is a strong signal to search engines regarding the site’s architecture and importance of pages. In fact, internal links are so strong that they can (in certain situations) override “SEO hints” such as canonical tags.

URL structure

Historically, JavaScript-based websites (aka “AJAX sites”) were using fragment identifiers (#) within URLs.

  • Not recommended:
    • The Lone Hash (#) – The lone pound symbol is not crawlable. It is used to identify anchor link (aka jump links). These are the links that allow one to jump to a piece of content on a page. Anything after the lone hash portion of the URL is never sent to the server and will cause the page to automatically scroll to the first element with a matching ID (or the first <a> element with a name of the following information). Google recommends avoiding the use of “#” in URLs.
    • Hashbang (#!) (and escaped_fragments URLs) – Hashbang URLs were a hack to support crawlers (Google wants to avoid now and only Bing supports). Many a moon ago, Google and Bing developed a complicated AJAX solution, whereby a pretty (#!) URL with the UX co-existed with an equivalent escaped_fragment HTML-based experience for bots. Google has since backtracked on this recommendation, preferring to receive the exact user experience. In escaped fragments, there are two experiences here:
      • Original Experience (aka Pretty URL): This URL must either have a #! (hashbang) within the URL to indicate that there is an escaped fragment or a meta element indicating that an escaped fragment exists (<meta name=”fragment” content=”!”>).
      • Escaped Fragment (aka Ugly URL, HTML snapshot): This URL replace the hashbang (#!) with “_escaped_fragment_” and serves the HTML snapshot. It is called the ugly URL because it’s long and looks like (and for all intents and purposes is) a hack.

Image result

Image source

  • Recommended:
    • pushState History API – PushState is navigation-based and part of the History API (think: your web browsing history). Essentially, pushState updates the URL in the address bar and only what needs to change on the page is updated. It allows JS sites to leverage “clean” URLs. PushState is currently supported by Google, when supporting browser navigation for client-side or hybrid rendering.
      • A good use of pushState is for infinite scroll (i.e., as the user hits new parts of the page the URL will update). Ideally, if the user refreshes the page, the experience will land them in the exact same spot. However, they do not need to refresh the page, as the content updates as they scroll down, while the URL is updated in the address bar.
      • Example: A good example of a search engine-friendly infinite scroll implementation, created by Google’s John Mueller (go figure), can be found here. He technically leverages the replaceState(), which doesn’t include the same back button functionality as pushState.
      • Read more: Mozilla PushState History API Documents

Obtainability

Search engines have been shown to employ headless browsing to render the DOM to gain a better understanding of the user’s experience and the content on page. That is to say, Google can process some JavaScript and uses the DOM (instead of the HTML document).

At the same time, there are situations where search engines struggle to comprehend JavaScript. Nobody wants a Hulu situation to happen to their site or a client’s site. It is crucial to understand how bots are interacting with your onsite content. When you aren’t sure, test.

Assuming we’re talking about a search engine bot that executes JavaScript, there are a few important elements for search engines to be able to obtain content:

  • If the user must interact for something to fire, search engines probably aren’t seeing it.
    • Google is a lazy user. It doesn’t click, it doesn’t scroll, and it doesn’t log in. If the full UX demands action from the user, special precautions should be taken to ensure that bots are receiving an equivalent experience.
  • If the JavaScript occurs after the JavaScript load event fires plus ~5-seconds*, search engines may not be seeing it.
    • *John Mueller mentioned that there is no specific timeout value; however, sites should aim to load within five seconds.
    • *Screaming Frog tests show a correlation to five seconds to render content.
    • *The load event plus five seconds is what Google’s PageSpeed Insights, Mobile Friendliness Tool, and Fetch as Google use; check out Max Prin’s test timer.
  • If there are errors within the JavaScript, both browsers and search engines won’t be able to go through and potentially miss sections of pages if the entire code is not executed.

How to make sure Google and other search engines can get your content

1. TEST

The most popular solution to resolving JavaScript is probably not resolving anything (grab a coffee and let Google work its algorithmic brilliance). Providing Google with the same experience as searchers is Google’s preferred scenario.

Google first announced being able to “better understand the web (i.e., JavaScript)” in May 2014. Industry experts suggested that Google could crawl JavaScript way before this announcement. The iPullRank team offered two great pieces on this in 2011: Googlebot is Chrome and How smart are Googlebots? (thank you, Josh and Mike). Adam Audette’s Google can crawl JavaScript and leverages the DOM in 2015 confirmed. Therefore, if you can see your content in the DOM, chances are your content is being parsed by Google.

adamaudette - I don't always JavaScript, but when I do, I know google can crawl the dom and dynamically generated HTML

Recently, Barry Goralewicz performed a cool experiment testing a combination of various JavaScript libraries and frameworks to determine how Google interacts with the pages (e.g., are they indexing URL/content? How does GSC interact? Etc.). It ultimately showed that Google is able to interact with many forms of JavaScript and highlighted certain frameworks as perhaps more challenging. John Mueller even started a JavaScript search group (from what I’ve read, it’s fairly therapeutic).

All of these studies are amazing and help SEOs understand when to be concerned and take a proactive role. However, before you determine that sitting back is the right solution for your site, I recommend being actively cautious by experimenting with small section Think: Jim Collin’s “bullets, then cannonballs” philosophy from his book Great by Choice:

“A bullet is an empirical test aimed at learning what works and meets three criteria: a bullet must be low-cost, low-risk, and low-distraction… 10Xers use bullets to empirically validate what will actually work. Based on that empirical validation, they then concentrate their resources to fire a cannonball, enabling large returns from concentrated bets.”

Consider testing and reviewing through the following:

  1. Confirm that your content is appearing within the DOM.
  2. Test a subset of pages to see if Google can index content.
  • Manually check quotes from your content.
  • Fetch with Google and see if content appears.
  • Fetch with Google supposedly occurs around the load event or before timeout. It’s a great test to check to see if Google will be able to see your content and whether or not you’re blocking JavaScript in your robots.txt. Although Fetch with Google is not foolproof, it’s a good starting point.
  • Note: If you aren’t verified in GSC, try Technicalseo.com’s Fetch and Render As Any Bot Tool.

After you’ve tested all this, what if something’s not working and search engines and bots are struggling to index and obtain your content? Perhaps you’re concerned about alternative search engines (DuckDuckGo, Facebook, LinkedIn, etc.), or maybe you’re leveraging meta information that needs to be parsed by other bots, such as Twitter summary cards or Facebook Open Graph tags. If any of this is identified in testing or presents itself as a concern, an HTML snapshot may be the only decision.

2. HTML SNAPSHOTS
What are HTmL snapshots?

HTML snapshots are a fully rendered page (as one might see in the DOM) that can be returned to search engine bots (think: a static HTML version of the DOM).

Google introduced HTML snapshots 2009, deprecated (but still supported) them in 2015, and awkwardly mentioned them as an element to “avoid” in late 2016. HTML snapshots are a contentious topic with Google. However, they’re important to understand, because in certain situations they’re necessary.

If search engines (or sites like Facebook) cannot grasp your JavaScript, it’s better to return an HTML snapshot than not to have your content indexed and understood at all. Ideally, your site would leverage some form of user-agent detection on the server side and return the HTML snapshot to the bot.

At the same time, one must recognize that Google wants the same experience as the user (i.e., only provide Google with an HTML snapshot if the tests are dire and the JavaScript search group cannot provide support for your situation).

Considerations

When considering HTML snapshots, you must consider that Google has deprecated this AJAX recommendation. Although Google technically still supports it, Google recommends avoiding it. Yes, Google changed its mind and now want to receive the same experience as the user. This direction makes sense, as it allows the bot to receive an experience more true to the user experience.

A second consideration factor relates to the risk of cloaking. If the HTML snapshots are found to not represent the experience on the page, it’s considered a cloaking risk. Straight from the source:

“The HTML snapshot must contain the same content as the end user would see in a browser. If this is not the case, it may be considered cloaking.”
– Google Developer AJAX Crawling FAQs

Benefits

Despite the considerations, HTML snapshots have powerful advantages:

  1. Knowledge that search engines and crawlers will be able to understand the experience.
    • Certain types of JavaScript may be harder for Google to grasp (cough… Angular (also colloquially referred to as AngularJS 2) …cough).
  2. Other search engines and crawlers (think: Bing, Facebook) will be able to understand the experience.
    • Bing, among other search engines, has not stated that it can crawl and index JavaScript. HTML snapshots may be the only solution for a JavaScript-heavy site. As always, test to make sure that this is the case before diving in.

"It's not just Google understanding your JavaScript. It's also about the speed." -DOM - "It's not just about Google understanding your Javascript. it's also about your perceived latency." -DOM

Site latency

When browsers receive an HTML document and create the DOM (although there is some level of pre-scanning), most resources are loaded as they appear within the HTML document. This means that if you have a huge file toward the top of your HTML document, a browser will load that immense file first.

The concept of Google’s critical rendering path is to load what the user needs as soon as possible, which can be translated to → “get everything above-the-fold in front of the user, ASAP.”

Critical Rendering Path – Optimized Rendering Loads Progressively ASAP:

progressive page rendering

Image source

However, if you have unnecessary resources or JavaScript files clogging up the page’s ability to load, you get “render-blocking JavaScript.” Meaning: your JavaScript is blocking the page’s potential to appear as if it’s loading faster (also called: perceived latency).

Render-blocking JavaScript – Solutions

If you analyze your page speed results (through tools like Page Speed Insights Tool, WebPageTest.org, CatchPoint, etc.) and determine that there is a render-blocking JavaScript issue, here are three potential solutions:

  1. Inline: Add the JavaScript in the HTML document.
  2. Async: Make JavaScript asynchronous (i.e., add “async” attribute to HTML tag).
  3. Defer: By placing JavaScript lower within the HTML.

!!! Important note: It’s important to understand that scripts must be arranged in order of precedence. Scripts that are used to load the above-the-fold content must be prioritized and should not be deferred. Also, any script that references another file can only be used after the referenced file has loaded. Make sure to work closely with your development team to confirm that there are no interruptions to the user’s experience.

Read more: Google Developer’s Speed Documentation


TL;DR – Moral of the story

Crawlers and search engines will do their best to crawl, execute, and interpret your JavaScript, but it is not guaranteed. Make sure your content is crawlable, obtainable, and isn’t developing site latency obstructions. The key = every situation demands testing. Based on the results, evaluate potential solutions.

Thanks: Thank you Max Prin (@maxxeight) for reviewing this content piece and sharing your knowledge, insight, and wisdom. It wouldn’t be the same without you.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Posted by webpage development and how to code basic JavaScript.

javacssseo.gif

Image sources: 1, 2, 3

JavaScript is either placed in the HTML document within <script> tags (i.e., it is embedded in the HTML) or linked/referenced. There are currently a plethora of JavaScript libraries and frameworks, including jQuery, AngularJS, ReactJS, EmberJS, etc.

JavaScript libraries and frameworks:

What is AJAX?

AJAX, or Asynchronous JavaScript and XML, is a set of web development techniques combining JavaScript and XML that allows web applications to communicate with a server in the background without interfering with the current page. Asynchronous means that other functions or lines of code can run while the async script is running. XML used to be the primary language to pass data; however, the term AJAX is used for all types of data transfers (including JSON; I guess “AJAJ” doesn’t sound as clean as “AJAX” [pun intended]).

A common use of AJAX is to update the content or layout of a webpage without initiating a full page refresh. Normally, when a page loads, all the assets on the page must be requested and fetched from the server and then rendered on the page. However, with AJAX, only the assets that differ between pages need to be loaded, which improves the user experience as they do not have to refresh the entire page.

One can think of AJAX as mini server calls. A good example of AJAX in action is Google Maps. The page updates without a full page reload (i.e., mini server calls are being used to load content as the user navigates).

Related image

Image source

What is the Document Object Model (DOM)?

As an SEO professional, you need to understand what the DOM is, because it’s what Google is using to analyze and understand webpages.

The DOM is what you see when you “Inspect Element” in a browser. Simply put, you can think of the DOM as the steps the browser takes after receiving the HTML document to render the page.

The first thing the browser receives is the HTML document. After that, it will start parsing the content within this document and fetch additional resources, such as images, CSS, and JavaScript files.

The DOM is what forms from this parsing of information and resources. One can think of it as a structured, organized version of the webpage’s code.

Nowadays the DOM is often very different from the initial HTML document, due to what’s collectively called dynamic HTML. Dynamic HTML is the ability for a page to change its content depending on user input, environmental conditions (e.g. time of day), and other variables, leveraging HTML, CSS, and JavaScript.

Simple example with a <title> tag that is populated through JavaScript:

HTML source

DOM

What is headless browsing?

Headless browsing is simply the action of fetching webpages without the user interface. It is important to understand because Google, and now Baidu, leverage headless browsing to gain a better understanding of the user’s experience and the content of webpages.

PhantomJS and Zombie.js are scripted headless browsers, typically used for automating web interaction for testing purposes, and rendering static HTML snapshots for initial requests (pre-rendering).


Why can JavaScript be challenging for SEO? (and how to fix issues)

There are three (3) primary reasons to be concerned about JavaScript on your site:

  1. Crawlability: Bots’ ability to crawl your site.
  2. Obtainability: Bots’ ability to access information and parse your content.
  3. Perceived site latency: AKA the Critical Rendering Path.

Crawlability

Are bots able to find URLs and understand your site’s architecture? There are two important elements here:

  1. Blocking search engines from your JavaScript (even accidentally).
  2. Proper internal linking, not leveraging JavaScript events as a replacement for HTML tags.

Why is blocking JavaScript such a big deal?

If search engines are blocked from crawling JavaScript, they will not be receiving your site’s full experience. This means search engines are not seeing what the end user is seeing. This can reduce your site’s appeal to search engines and could eventually be considered cloaking (if the intent is indeed malicious).

Fetch as Google and TechnicalSEO.com’s robots.txt and Fetch and Render testing tools can help to identify resources that Googlebot is blocked from.

The easiest way to solve this problem is through providing search engines access to the resources they need to understand your user experience.

!!! Important note: Work with your development team to determine which files should and should not be accessible to search engines.

Internal linking

Internal linking should be implemented with regular anchor tags within the HTML or the DOM (using an a hrefs=”www.example.com” HTML tag) versus leveraging JavaScript functions to allow the user to traverse the site.

Essentially: Don’t use JavaScript’s onclick events as a replacement for internal linking. While end URLs might be found and crawled (through strings in JavaScript code or XML sitemaps), they won’t be associated with the global navigation of the site.

Internal linking is a strong signal to search engines regarding the site’s architecture and importance of pages. In fact, internal links are so strong that they can (in certain situations) override “SEO hints” such as canonical tags.

URL structure

Historically, JavaScript-based websites (aka “AJAX sites”) were using fragment identifiers (#) within URLs.

  • Not recommended:
    • The Lone Hash (#) – The lone pound symbol is not crawlable. It is used to identify anchor link (aka jump links). These are the links that allow one to jump to a piece of content on a page. Anything after the lone hash portion of the URL is never sent to the server and will cause the page to automatically scroll to the first element with a matching ID (or the first <a> element with a name of the following information). Google recommends avoiding the use of “#” in URLs.
    • Hashbang (#!) (and escaped_fragments URLs) – Hashbang URLs were a hack to support crawlers (Google wants to avoid now and only Bing supports). Many a moon ago, Google and Bing developed a complicated AJAX solution, whereby a pretty (#!) URL with the UX co-existed with an equivalent escaped_fragment HTML-based experience for bots. Google has since backtracked on this recommendation, preferring to receive the exact user experience. In escaped fragments, there are two experiences here:
      • Original Experience (aka Pretty URL): This URL must either have a #! (hashbang) within the URL to indicate that there is an escaped fragment or a meta element indicating that an escaped fragment exists (<meta name=”fragment” content=”!”>).
      • Escaped Fragment (aka Ugly URL, HTML snapshot): This URL replace the hashbang (#!) with “_escaped_fragment_” and serves the HTML snapshot. It is called the ugly URL because it’s long and looks like (and for all intents and purposes is) a hack.

Image result

Image source

  • Recommended:
    • pushState History API – PushState is navigation-based and part of the History API (think: your web browsing history). Essentially, pushState updates the URL in the address bar and only what needs to change on the page is updated. It allows JS sites to leverage “clean” URLs. PushState is currently supported by Google, when supporting browser navigation for client-side or hybrid rendering.
      • A good use of pushState is for infinite scroll (i.e., as the user hits new parts of the page the URL will update). Ideally, if the user refreshes the page, the experience will land them in the exact same spot. However, they do not need to refresh the page, as the content updates as they scroll down, while the URL is updated in the address bar.
      • Example: A good example of a search engine-friendly infinite scroll implementation, created by Google’s John Mueller (go figure), can be found here. He technically leverages the replaceState(), which doesn’t include the same back button functionality as pushState.
      • Read more: Mozilla PushState History API Documents

Obtainability

Search engines have been shown to employ headless browsing to render the DOM to gain a better understanding of the user’s experience and the content on page. That is to say, Google can process some JavaScript and uses the DOM (instead of the HTML document).

At the same time, there are situations where search engines struggle to comprehend JavaScript. Nobody wants a Hulu situation to happen to their site or a client’s site. It is crucial to understand how bots are interacting with your onsite content. When you aren’t sure, test.

Assuming we’re talking about a search engine bot that executes JavaScript, there are a few important elements for search engines to be able to obtain content:

  • If the user must interact for something to fire, search engines probably aren’t seeing it.
    • Google is a lazy user. It doesn’t click, it doesn’t scroll, and it doesn’t log in. If the full UX demands action from the user, special precautions should be taken to ensure that bots are receiving an equivalent experience.
  • If the JavaScript occurs after the JavaScript load event fires plus ~5-seconds*, search engines may not be seeing it.
    • *John Mueller mentioned that there is no specific timeout value; however, sites should aim to load within five seconds.
    • *Screaming Frog tests show a correlation to five seconds to render content.
    • *The load event plus five seconds is what Google’s PageSpeed Insights, Mobile Friendliness Tool, and Fetch as Google use; check out Max Prin’s test timer.
  • If there are errors within the JavaScript, both browsers and search engines won’t be able to go through and potentially miss sections of pages if the entire code is not executed.

How to make sure Google and other search engines can get your content

1. TEST

The most popular solution to resolving JavaScript is probably not resolving anything (grab a coffee and let Google work its algorithmic brilliance). Providing Google with the same experience as searchers is Google’s preferred scenario.

Google first announced being able to “better understand the web (i.e., JavaScript)” in May 2014. Industry experts suggested that Google could crawl JavaScript way before this announcement. The iPullRank team offered two great pieces on this in 2011: Googlebot is Chrome and How smart are Googlebots? (thank you, Josh and Mike). Adam Audette’s Google can crawl JavaScript and leverages the DOM in 2015 confirmed. Therefore, if you can see your content in the DOM, chances are your content is being parsed by Google.

adamaudette - I don't always JavaScript, but when I do, I know google can crawl the dom and dynamically generated HTML

Recently, Barry Goralewicz performed a cool experiment testing a combination of various JavaScript libraries and frameworks to determine how Google interacts with the pages (e.g., are they indexing URL/content? How does GSC interact? Etc.). It ultimately showed that Google is able to interact with many forms of JavaScript and highlighted certain frameworks as perhaps more challenging. John Mueller even started a JavaScript search group (from what I’ve read, it’s fairly therapeutic).

All of these studies are amazing and help SEOs understand when to be concerned and take a proactive role. However, before you determine that sitting back is the right solution for your site, I recommend being actively cautious by experimenting with small section Think: Jim Collin’s “bullets, then cannonballs” philosophy from his book Great by Choice:

“A bullet is an empirical test aimed at learning what works and meets three criteria: a bullet must be low-cost, low-risk, and low-distraction… 10Xers use bullets to empirically validate what will actually work. Based on that empirical validation, they then concentrate their resources to fire a cannonball, enabling large returns from concentrated bets.”

Consider testing and reviewing through the following:

  1. Confirm that your content is appearing within the DOM.
  2. Test a subset of pages to see if Google can index content.
  • Manually check quotes from your content.
  • Fetch with Google and see if content appears.
  • Fetch with Google supposedly occurs around the load event or before timeout. It’s a great test to check to see if Google will be able to see your content and whether or not you’re blocking JavaScript in your robots.txt. Although Fetch with Google is not foolproof, it’s a good starting point.
  • Note: If you aren’t verified in GSC, try Technicalseo.com’s Fetch and Render As Any Bot Tool.

After you’ve tested all this, what if something’s not working and search engines and bots are struggling to index and obtain your content? Perhaps you’re concerned about alternative search engines (DuckDuckGo, Facebook, LinkedIn, etc.), or maybe you’re leveraging meta information that needs to be parsed by other bots, such as Twitter summary cards or Facebook Open Graph tags. If any of this is identified in testing or presents itself as a concern, an HTML snapshot may be the only decision.

2. HTML SNAPSHOTS
What are HTmL snapshots?

HTML snapshots are a fully rendered page (as one might see in the DOM) that can be returned to search engine bots (think: a static HTML version of the DOM).

Google introduced HTML snapshots 2009, deprecated (but still supported) them in 2015, and awkwardly mentioned them as an element to “avoid” in late 2016. HTML snapshots are a contentious topic with Google. However, they’re important to understand, because in certain situations they’re necessary.

If search engines (or sites like Facebook) cannot grasp your JavaScript, it’s better to return an HTML snapshot than not to have your content indexed and understood at all. Ideally, your site would leverage some form of user-agent detection on the server side and return the HTML snapshot to the bot.

At the same time, one must recognize that Google wants the same experience as the user (i.e., only provide Google with an HTML snapshot if the tests are dire and the JavaScript search group cannot provide support for your situation).

Considerations

When considering HTML snapshots, you must consider that Google has deprecated this AJAX recommendation. Although Google technically still supports it, Google recommends avoiding it. Yes, Google changed its mind and now want to receive the same experience as the user. This direction makes sense, as it allows the bot to receive an experience more true to the user experience.

A second consideration factor relates to the risk of cloaking. If the HTML snapshots are found to not represent the experience on the page, it’s considered a cloaking risk. Straight from the source:

“The HTML snapshot must contain the same content as the end user would see in a browser. If this is not the case, it may be considered cloaking.”
– Google Developer AJAX Crawling FAQs

Benefits

Despite the considerations, HTML snapshots have powerful advantages:

  1. Knowledge that search engines and crawlers will be able to understand the experience.
    • Certain types of JavaScript may be harder for Google to grasp (cough… Angular (also colloquially referred to as AngularJS 2) …cough).
  2. Other search engines and crawlers (think: Bing, Facebook) will be able to understand the experience.
    • Bing, among other search engines, has not stated that it can crawl and index JavaScript. HTML snapshots may be the only solution for a JavaScript-heavy site. As always, test to make sure that this is the case before diving in.

"It's not just Google understanding your JavaScript. It's also about the speed." -DOM - "It's not just about Google understanding your Javascript. it's also about your perceived latency." -DOM

Site latency

When browsers receive an HTML document and create the DOM (although there is some level of pre-scanning), most resources are loaded as they appear within the HTML document. This means that if you have a huge file toward the top of your HTML document, a browser will load that immense file first.

The concept of Google’s critical rendering path is to load what the user needs as soon as possible, which can be translated to → “get everything above-the-fold in front of the user, ASAP.”

Critical Rendering Path – Optimized Rendering Loads Progressively ASAP:

progressive page rendering

Image source

However, if you have unnecessary resources or JavaScript files clogging up the page’s ability to load, you get “render-blocking JavaScript.” Meaning: your JavaScript is blocking the page’s potential to appear as if it’s loading faster (also called: perceived latency).

Render-blocking JavaScript – Solutions

If you analyze your page speed results (through tools like Page Speed Insights Tool, WebPageTest.org, CatchPoint, etc.) and determine that there is a render-blocking JavaScript issue, here are three potential solutions:

  1. Inline: Add the JavaScript in the HTML document.
  2. Async: Make JavaScript asynchronous (i.e., add “async” attribute to HTML tag).
  3. Defer: By placing JavaScript lower within the HTML.

!!! Important note: It’s important to understand that scripts must be arranged in order of precedence. Scripts that are used to load the above-the-fold content must be prioritized and should not be deferred. Also, any script that references another file can only be used after the referenced file has loaded. Make sure to work closely with your development team to confirm that there are no interruptions to the user’s experience.

Read more: Google Developer’s Speed Documentation


TL;DR – Moral of the story

Crawlers and search engines will do their best to crawl, execute, and interpret your JavaScript, but it is not guaranteed. Make sure your content is crawlable, obtainable, and isn’t developing site latency obstructions. The key = every situation demands testing. Based on the results, evaluate potential solutions.

Thanks: Thank you Max Prin (@maxxeight) for reviewing this content piece and sharing your knowledge, insight, and wisdom. It wouldn’t be the same without you.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

“”

JavaScript & SEO: Making Your Bot Experience As Good As Your User Experience

Posted by webpage development and how to code basic JavaScript.

javacssseo.gif

Image sources: 1, 2, 3

JavaScript is either placed in the HTML document within <script> tags (i.e., it is embedded in the HTML) or linked/referenced. There are currently a plethora of JavaScript libraries and frameworks, including jQuery, AngularJS, ReactJS, EmberJS, etc.

JavaScript libraries and frameworks:

What is AJAX?

AJAX, or Asynchronous JavaScript and XML, is a set of web development techniques combining JavaScript and XML that allows web applications to communicate with a server in the background without interfering with the current page. Asynchronous means that other functions or lines of code can run while the async script is running. XML used to be the primary language to pass data; however, the term AJAX is used for all types of data transfers (including JSON; I guess “AJAJ” doesn’t sound as clean as “AJAX” [pun intended]).

A common use of AJAX is to update the content or layout of a webpage without initiating a full page refresh. Normally, when a page loads, all the assets on the page must be requested and fetched from the server and then rendered on the page. However, with AJAX, only the assets that differ between pages need to be loaded, which improves the user experience as they do not have to refresh the entire page.

One can think of AJAX as mini server calls. A good example of AJAX in action is Google Maps. The page updates without a full page reload (i.e., mini server calls are being used to load content as the user navigates).

Related image

Image source

What is the Document Object Model (DOM)?

As an SEO professional, you need to understand what the DOM is, because it’s what Google is using to analyze and understand webpages.

The DOM is what you see when you “Inspect Element” in a browser. Simply put, you can think of the DOM as the steps the browser takes after receiving the HTML document to render the page.

The first thing the browser receives is the HTML document. After that, it will start parsing the content within this document and fetch additional resources, such as images, CSS, and JavaScript files.

The DOM is what forms from this parsing of information and resources. One can think of it as a structured, organized version of the webpage’s code.

Nowadays the DOM is often very different from the initial HTML document, due to what’s collectively called dynamic HTML. Dynamic HTML is the ability for a page to change its content depending on user input, environmental conditions (e.g. time of day), and other variables, leveraging HTML, CSS, and JavaScript.

Simple example with a <title> tag that is populated through JavaScript:

HTML source

DOM

What is headless browsing?

Headless browsing is simply the action of fetching webpages without the user interface. It is important to understand because Google, and now Baidu, leverage headless browsing to gain a better understanding of the user’s experience and the content of webpages.

PhantomJS and Zombie.js are scripted headless browsers, typically used for automating web interaction for testing purposes, and rendering static HTML snapshots for initial requests (pre-rendering).


Why can JavaScript be challenging for SEO? (and how to fix issues)

There are three (3) primary reasons to be concerned about JavaScript on your site:

  1. Crawlability: Bots’ ability to crawl your site.
  2. Obtainability: Bots’ ability to access information and parse your content.
  3. Perceived site latency: AKA the Critical Rendering Path.

Crawlability

Are bots able to find URLs and understand your site’s architecture? There are two important elements here:

  1. Blocking search engines from your JavaScript (even accidentally).
  2. Proper internal linking, not leveraging JavaScript events as a replacement for HTML tags.

Why is blocking JavaScript such a big deal?

If search engines are blocked from crawling JavaScript, they will not be receiving your site’s full experience. This means search engines are not seeing what the end user is seeing. This can reduce your site’s appeal to search engines and could eventually be considered cloaking (if the intent is indeed malicious).

Fetch as Google and TechnicalSEO.com’s robots.txt and Fetch and Render testing tools can help to identify resources that Googlebot is blocked from.

The easiest way to solve this problem is through providing search engines access to the resources they need to understand your user experience.

!!! Important note: Work with your development team to determine which files should and should not be accessible to search engines.

Internal linking

Internal linking should be implemented with regular anchor tags within the HTML or the DOM (using an a hrefs=”www.example.com” HTML tag) versus leveraging JavaScript functions to allow the user to traverse the site.

Essentially: Don’t use JavaScript’s onclick events as a replacement for internal linking. While end URLs might be found and crawled (through strings in JavaScript code or XML sitemaps), they won’t be associated with the global navigation of the site.

Internal linking is a strong signal to search engines regarding the site’s architecture and importance of pages. In fact, internal links are so strong that they can (in certain situations) override “SEO hints” such as canonical tags.

URL structure

Historically, JavaScript-based websites (aka “AJAX sites”) were using fragment identifiers (#) within URLs.

  • Not recommended:
    • The Lone Hash (#) – The lone pound symbol is not crawlable. It is used to identify anchor link (aka jump links). These are the links that allow one to jump to a piece of content on a page. Anything after the lone hash portion of the URL is never sent to the server and will cause the page to automatically scroll to the first element with a matching ID (or the first <a> element with a name of the following information). Google recommends avoiding the use of “#” in URLs.
    • Hashbang (#!) (and escaped_fragments URLs) – Hashbang URLs were a hack to support crawlers (Google wants to avoid now and only Bing supports). Many a moon ago, Google and Bing developed a complicated AJAX solution, whereby a pretty (#!) URL with the UX co-existed with an equivalent escaped_fragment HTML-based experience for bots. Google has since backtracked on this recommendation, preferring to receive the exact user experience. In escaped fragments, there are two experiences here:
      • Original Experience (aka Pretty URL): This URL must either have a #! (hashbang) within the URL to indicate that there is an escaped fragment or a meta element indicating that an escaped fragment exists (<meta name=”fragment” content=”!”>).
      • Escaped Fragment (aka Ugly URL, HTML snapshot): This URL replace the hashbang (#!) with “_escaped_fragment_” and serves the HTML snapshot. It is called the ugly URL because it’s long and looks like (and for all intents and purposes is) a hack.

Image result

Image source

  • Recommended:
    • pushState History API – PushState is navigation-based and part of the History API (think: your web browsing history). Essentially, pushState updates the URL in the address bar and only what needs to change on the page is updated. It allows JS sites to leverage “clean” URLs. PushState is currently supported by Google, when supporting browser navigation for client-side or hybrid rendering.
      • A good use of pushState is for infinite scroll (i.e., as the user hits new parts of the page the URL will update). Ideally, if the user refreshes the page, the experience will land them in the exact same spot. However, they do not need to refresh the page, as the content updates as they scroll down, while the URL is updated in the address bar.
      • Example: A good example of a search engine-friendly infinite scroll implementation, created by Google’s John Mueller (go figure), can be found here. He technically leverages the replaceState(), which doesn’t include the same back button functionality as pushState.
      • Read more: Mozilla PushState History API Documents

Obtainability

Search engines have been shown to employ headless browsing to render the DOM to gain a better understanding of the user’s experience and the content on page. That is to say, Google can process some JavaScript and uses the DOM (instead of the HTML document).

At the same time, there are situations where search engines struggle to comprehend JavaScript. Nobody wants a Hulu situation to happen to their site or a client’s site. It is crucial to understand how bots are interacting with your onsite content. When you aren’t sure, test.

Assuming we’re talking about a search engine bot that executes JavaScript, there are a few important elements for search engines to be able to obtain content:

  • If the user must interact for something to fire, search engines probably aren’t seeing it.
    • Google is a lazy user. It doesn’t click, it doesn’t scroll, and it doesn’t log in. If the full UX demands action from the user, special precautions should be taken to ensure that bots are receiving an equivalent experience.
  • If the JavaScript occurs after the JavaScript load event fires plus ~5-seconds*, search engines may not be seeing it.
    • *John Mueller mentioned that there is no specific timeout value; however, sites should aim to load within five seconds.
    • *Screaming Frog tests show a correlation to five seconds to render content.
    • *The load event plus five seconds is what Google’s PageSpeed Insights, Mobile Friendliness Tool, and Fetch as Google use; check out Max Prin’s test timer.
  • If there are errors within the JavaScript, both browsers and search engines won’t be able to go through and potentially miss sections of pages if the entire code is not executed.

How to make sure Google and other search engines can get your content

1. TEST

The most popular solution to resolving JavaScript is probably not resolving anything (grab a coffee and let Google work its algorithmic brilliance). Providing Google with the same experience as searchers is Google’s preferred scenario.

Google first announced being able to “better understand the web (i.e., JavaScript)” in May 2014. Industry experts suggested that Google could crawl JavaScript way before this announcement. The iPullRank team offered two great pieces on this in 2011: Googlebot is Chrome and How smart are Googlebots? (thank you, Josh and Mike). Adam Audette’s Google can crawl JavaScript and leverages the DOM in 2015 confirmed. Therefore, if you can see your content in the DOM, chances are your content is being parsed by Google.

adamaudette - I don't always JavaScript, but when I do, I know google can crawl the dom and dynamically generated HTML

Recently, Barry Goralewicz performed a cool experiment testing a combination of various JavaScript libraries and frameworks to determine how Google interacts with the pages (e.g., are they indexing URL/content? How does GSC interact? Etc.). It ultimately showed that Google is able to interact with many forms of JavaScript and highlighted certain frameworks as perhaps more challenging. John Mueller even started a JavaScript search group (from what I’ve read, it’s fairly therapeutic).

All of these studies are amazing and help SEOs understand when to be concerned and take a proactive role. However, before you determine that sitting back is the right solution for your site, I recommend being actively cautious by experimenting with small section Think: Jim Collin’s “bullets, then cannonballs” philosophy from his book Great by Choice:

“A bullet is an empirical test aimed at learning what works and meets three criteria: a bullet must be low-cost, low-risk, and low-distraction… 10Xers use bullets to empirically validate what will actually work. Based on that empirical validation, they then concentrate their resources to fire a cannonball, enabling large returns from concentrated bets.”

Consider testing and reviewing through the following:

  1. Confirm that your content is appearing within the DOM.
  2. Test a subset of pages to see if Google can index content.
  • Manually check quotes from your content.
  • Fetch with Google and see if content appears.
  • Fetch with Google supposedly occurs around the load event or before timeout. It’s a great test to check to see if Google will be able to see your content and whether or not you’re blocking JavaScript in your robots.txt. Although Fetch with Google is not foolproof, it’s a good starting point.
  • Note: If you aren’t verified in GSC, try Technicalseo.com’s Fetch and Render As Any Bot Tool.

After you’ve tested all this, what if something’s not working and search engines and bots are struggling to index and obtain your content? Perhaps you’re concerned about alternative search engines (DuckDuckGo, Facebook, LinkedIn, etc.), or maybe you’re leveraging meta information that needs to be parsed by other bots, such as Twitter summary cards or Facebook Open Graph tags. If any of this is identified in testing or presents itself as a concern, an HTML snapshot may be the only decision.

2. HTML SNAPSHOTS
What are HTmL snapshots?

HTML snapshots are a fully rendered page (as one might see in the DOM) that can be returned to search engine bots (think: a static HTML version of the DOM).

Google introduced HTML snapshots 2009, deprecated (but still supported) them in 2015, and awkwardly mentioned them as an element to “avoid” in late 2016. HTML snapshots are a contentious topic with Google. However, they’re important to understand, because in certain situations they’re necessary.

If search engines (or sites like Facebook) cannot grasp your JavaScript, it’s better to return an HTML snapshot than not to have your content indexed and understood at all. Ideally, your site would leverage some form of user-agent detection on the server side and return the HTML snapshot to the bot.

At the same time, one must recognize that Google wants the same experience as the user (i.e., only provide Google with an HTML snapshot if the tests are dire and the JavaScript search group cannot provide support for your situation).

Considerations

When considering HTML snapshots, you must consider that Google has deprecated this AJAX recommendation. Although Google technically still supports it, Google recommends avoiding it. Yes, Google changed its mind and now want to receive the same experience as the user. This direction makes sense, as it allows the bot to receive an experience more true to the user experience.

A second consideration factor relates to the risk of cloaking. If the HTML snapshots are found to not represent the experience on the page, it’s considered a cloaking risk. Straight from the source:

“The HTML snapshot must contain the same content as the end user would see in a browser. If this is not the case, it may be considered cloaking.”
– Google Developer AJAX Crawling FAQs

Benefits

Despite the considerations, HTML snapshots have powerful advantages:

  1. Knowledge that search engines and crawlers will be able to understand the experience.
    • Certain types of JavaScript may be harder for Google to grasp (cough… Angular (also colloquially referred to as AngularJS 2) …cough).
  2. Other search engines and crawlers (think: Bing, Facebook) will be able to understand the experience.
    • Bing, among other search engines, has not stated that it can crawl and index JavaScript. HTML snapshots may be the only solution for a JavaScript-heavy site. As always, test to make sure that this is the case before diving in.

"It's not just Google understanding your JavaScript. It's also about the speed." -DOM - "It's not just about Google understanding your Javascript. it's also about your perceived latency." -DOM

Site latency

When browsers receive an HTML document and create the DOM (although there is some level of pre-scanning), most resources are loaded as they appear within the HTML document. This means that if you have a huge file toward the top of your HTML document, a browser will load that immense file first.

The concept of Google’s critical rendering path is to load what the user needs as soon as possible, which can be translated to → “get everything above-the-fold in front of the user, ASAP.”

Critical Rendering Path – Optimized Rendering Loads Progressively ASAP:

progressive page rendering

Image source

However, if you have unnecessary resources or JavaScript files clogging up the page’s ability to load, you get “render-blocking JavaScript.” Meaning: your JavaScript is blocking the page’s potential to appear as if it’s loading faster (also called: perceived latency).

Render-blocking JavaScript – Solutions

If you analyze your page speed results (through tools like Page Speed Insights Tool, WebPageTest.org, CatchPoint, etc.) and determine that there is a render-blocking JavaScript issue, here are three potential solutions:

  1. Inline: Add the JavaScript in the HTML document.
  2. Async: Make JavaScript asynchronous (i.e., add “async” attribute to HTML tag).
  3. Defer: By placing JavaScript lower within the HTML.

!!! Important note: It’s important to understand that scripts must be arranged in order of precedence. Scripts that are used to load the above-the-fold content must be prioritized and should not be deferred. Also, any script that references another file can only be used after the referenced file has loaded. Make sure to work closely with your development team to confirm that there are no interruptions to the user’s experience.

Read more: Google Developer’s Speed Documentation


TL;DR – Moral of the story

Crawlers and search engines will do their best to crawl, execute, and interpret your JavaScript, but it is not guaranteed. Make sure your content is crawlable, obtainable, and isn’t developing site latency obstructions. The key = every situation demands testing. Based on the results, evaluate potential solutions.

Thanks: Thank you Max Prin (@maxxeight) for reviewing this content piece and sharing your knowledge, insight, and wisdom. It wouldn’t be the same without you.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Posted by webpage development and how to code basic JavaScript.

javacssseo.gif

Image sources: 1, 2, 3

JavaScript is either placed in the HTML document within <script> tags (i.e., it is embedded in the HTML) or linked/referenced. There are currently a plethora of JavaScript libraries and frameworks, including jQuery, AngularJS, ReactJS, EmberJS, etc.

JavaScript libraries and frameworks:

What is AJAX?

AJAX, or Asynchronous JavaScript and XML, is a set of web development techniques combining JavaScript and XML that allows web applications to communicate with a server in the background without interfering with the current page. Asynchronous means that other functions or lines of code can run while the async script is running. XML used to be the primary language to pass data; however, the term AJAX is used for all types of data transfers (including JSON; I guess “AJAJ” doesn’t sound as clean as “AJAX” [pun intended]).

A common use of AJAX is to update the content or layout of a webpage without initiating a full page refresh. Normally, when a page loads, all the assets on the page must be requested and fetched from the server and then rendered on the page. However, with AJAX, only the assets that differ between pages need to be loaded, which improves the user experience as they do not have to refresh the entire page.

One can think of AJAX as mini server calls. A good example of AJAX in action is Google Maps. The page updates without a full page reload (i.e., mini server calls are being used to load content as the user navigates).

Related image

Image source

What is the Document Object Model (DOM)?

As an SEO professional, you need to understand what the DOM is, because it’s what Google is using to analyze and understand webpages.

The DOM is what you see when you “Inspect Element” in a browser. Simply put, you can think of the DOM as the steps the browser takes after receiving the HTML document to render the page.

The first thing the browser receives is the HTML document. After that, it will start parsing the content within this document and fetch additional resources, such as images, CSS, and JavaScript files.

The DOM is what forms from this parsing of information and resources. One can think of it as a structured, organized version of the webpage’s code.

Nowadays the DOM is often very different from the initial HTML document, due to what’s collectively called dynamic HTML. Dynamic HTML is the ability for a page to change its content depending on user input, environmental conditions (e.g. time of day), and other variables, leveraging HTML, CSS, and JavaScript.

Simple example with a <title> tag that is populated through JavaScript:

HTML source

DOM

What is headless browsing?

Headless browsing is simply the action of fetching webpages without the user interface. It is important to understand because Google, and now Baidu, leverage headless browsing to gain a better understanding of the user’s experience and the content of webpages.

PhantomJS and Zombie.js are scripted headless browsers, typically used for automating web interaction for testing purposes, and rendering static HTML snapshots for initial requests (pre-rendering).


Why can JavaScript be challenging for SEO? (and how to fix issues)

There are three (3) primary reasons to be concerned about JavaScript on your site:

  1. Crawlability: Bots’ ability to crawl your site.
  2. Obtainability: Bots’ ability to access information and parse your content.
  3. Perceived site latency: AKA the Critical Rendering Path.

Crawlability

Are bots able to find URLs and understand your site’s architecture? There are two important elements here:

  1. Blocking search engines from your JavaScript (even accidentally).
  2. Proper internal linking, not leveraging JavaScript events as a replacement for HTML tags.

Why is blocking JavaScript such a big deal?

If search engines are blocked from crawling JavaScript, they will not be receiving your site’s full experience. This means search engines are not seeing what the end user is seeing. This can reduce your site’s appeal to search engines and could eventually be considered cloaking (if the intent is indeed malicious).

Fetch as Google and TechnicalSEO.com’s robots.txt and Fetch and Render testing tools can help to identify resources that Googlebot is blocked from.

The easiest way to solve this problem is through providing search engines access to the resources they need to understand your user experience.

!!! Important note: Work with your development team to determine which files should and should not be accessible to search engines.

Internal linking

Internal linking should be implemented with regular anchor tags within the HTML or the DOM (using an a hrefs=”www.example.com” HTML tag) versus leveraging JavaScript functions to allow the user to traverse the site.

Essentially: Don’t use JavaScript’s onclick events as a replacement for internal linking. While end URLs might be found and crawled (through strings in JavaScript code or XML sitemaps), they won’t be associated with the global navigation of the site.

Internal linking is a strong signal to search engines regarding the site’s architecture and importance of pages. In fact, internal links are so strong that they can (in certain situations) override “SEO hints” such as canonical tags.

URL structure

Historically, JavaScript-based websites (aka “AJAX sites”) were using fragment identifiers (#) within URLs.

  • Not recommended:
    • The Lone Hash (#) – The lone pound symbol is not crawlable. It is used to identify anchor link (aka jump links). These are the links that allow one to jump to a piece of content on a page. Anything after the lone hash portion of the URL is never sent to the server and will cause the page to automatically scroll to the first element with a matching ID (or the first <a> element with a name of the following information). Google recommends avoiding the use of “#” in URLs.
    • Hashbang (#!) (and escaped_fragments URLs) – Hashbang URLs were a hack to support crawlers (Google wants to avoid now and only Bing supports). Many a moon ago, Google and Bing developed a complicated AJAX solution, whereby a pretty (#!) URL with the UX co-existed with an equivalent escaped_fragment HTML-based experience for bots. Google has since backtracked on this recommendation, preferring to receive the exact user experience. In escaped fragments, there are two experiences here:
      • Original Experience (aka Pretty URL): This URL must either have a #! (hashbang) within the URL to indicate that there is an escaped fragment or a meta element indicating that an escaped fragment exists (<meta name=”fragment” content=”!”>).
      • Escaped Fragment (aka Ugly URL, HTML snapshot): This URL replace the hashbang (#!) with “_escaped_fragment_” and serves the HTML snapshot. It is called the ugly URL because it’s long and looks like (and for all intents and purposes is) a hack.

Image result

Image source

  • Recommended:
    • pushState History API – PushState is navigation-based and part of the History API (think: your web browsing history). Essentially, pushState updates the URL in the address bar and only what needs to change on the page is updated. It allows JS sites to leverage “clean” URLs. PushState is currently supported by Google, when supporting browser navigation for client-side or hybrid rendering.
      • A good use of pushState is for infinite scroll (i.e., as the user hits new parts of the page the URL will update). Ideally, if the user refreshes the page, the experience will land them in the exact same spot. However, they do not need to refresh the page, as the content updates as they scroll down, while the URL is updated in the address bar.
      • Example: A good example of a search engine-friendly infinite scroll implementation, created by Google’s John Mueller (go figure), can be found here. He technically leverages the replaceState(), which doesn’t include the same back button functionality as pushState.
      • Read more: Mozilla PushState History API Documents

Obtainability

Search engines have been shown to employ headless browsing to render the DOM to gain a better understanding of the user’s experience and the content on page. That is to say, Google can process some JavaScript and uses the DOM (instead of the HTML document).

At the same time, there are situations where search engines struggle to comprehend JavaScript. Nobody wants a Hulu situation to happen to their site or a client’s site. It is crucial to understand how bots are interacting with your onsite content. When you aren’t sure, test.

Assuming we’re talking about a search engine bot that executes JavaScript, there are a few important elements for search engines to be able to obtain content:

  • If the user must interact for something to fire, search engines probably aren’t seeing it.
    • Google is a lazy user. It doesn’t click, it doesn’t scroll, and it doesn’t log in. If the full UX demands action from the user, special precautions should be taken to ensure that bots are receiving an equivalent experience.
  • If the JavaScript occurs after the JavaScript load event fires plus ~5-seconds*, search engines may not be seeing it.
    • *John Mueller mentioned that there is no specific timeout value; however, sites should aim to load within five seconds.
    • *Screaming Frog tests show a correlation to five seconds to render content.
    • *The load event plus five seconds is what Google’s PageSpeed Insights, Mobile Friendliness Tool, and Fetch as Google use; check out Max Prin’s test timer.
  • If there are errors within the JavaScript, both browsers and search engines won’t be able to go through and potentially miss sections of pages if the entire code is not executed.

How to make sure Google and other search engines can get your content

1. TEST

The most popular solution to resolving JavaScript is probably not resolving anything (grab a coffee and let Google work its algorithmic brilliance). Providing Google with the same experience as searchers is Google’s preferred scenario.

Google first announced being able to “better understand the web (i.e., JavaScript)” in May 2014. Industry experts suggested that Google could crawl JavaScript way before this announcement. The iPullRank team offered two great pieces on this in 2011: Googlebot is Chrome and How smart are Googlebots? (thank you, Josh and Mike). Adam Audette’s Google can crawl JavaScript and leverages the DOM in 2015 confirmed. Therefore, if you can see your content in the DOM, chances are your content is being parsed by Google.

adamaudette - I don't always JavaScript, but when I do, I know google can crawl the dom and dynamically generated HTML

Recently, Barry Goralewicz performed a cool experiment testing a combination of various JavaScript libraries and frameworks to determine how Google interacts with the pages (e.g., are they indexing URL/content? How does GSC interact? Etc.). It ultimately showed that Google is able to interact with many forms of JavaScript and highlighted certain frameworks as perhaps more challenging. John Mueller even started a JavaScript search group (from what I’ve read, it’s fairly therapeutic).

All of these studies are amazing and help SEOs understand when to be concerned and take a proactive role. However, before you determine that sitting back is the right solution for your site, I recommend being actively cautious by experimenting with small section Think: Jim Collin’s “bullets, then cannonballs” philosophy from his book Great by Choice:

“A bullet is an empirical test aimed at learning what works and meets three criteria: a bullet must be low-cost, low-risk, and low-distraction… 10Xers use bullets to empirically validate what will actually work. Based on that empirical validation, they then concentrate their resources to fire a cannonball, enabling large returns from concentrated bets.”

Consider testing and reviewing through the following:

  1. Confirm that your content is appearing within the DOM.
  2. Test a subset of pages to see if Google can index content.
  • Manually check quotes from your content.
  • Fetch with Google and see if content appears.
  • Fetch with Google supposedly occurs around the load event or before timeout. It’s a great test to check to see if Google will be able to see your content and whether or not you’re blocking JavaScript in your robots.txt. Although Fetch with Google is not foolproof, it’s a good starting point.
  • Note: If you aren’t verified in GSC, try Technicalseo.com’s Fetch and Render As Any Bot Tool.

After you’ve tested all this, what if something’s not working and search engines and bots are struggling to index and obtain your content? Perhaps you’re concerned about alternative search engines (DuckDuckGo, Facebook, LinkedIn, etc.), or maybe you’re leveraging meta information that needs to be parsed by other bots, such as Twitter summary cards or Facebook Open Graph tags. If any of this is identified in testing or presents itself as a concern, an HTML snapshot may be the only decision.

2. HTML SNAPSHOTS
What are HTmL snapshots?

HTML snapshots are a fully rendered page (as one might see in the DOM) that can be returned to search engine bots (think: a static HTML version of the DOM).

Google introduced HTML snapshots 2009, deprecated (but still supported) them in 2015, and awkwardly mentioned them as an element to “avoid” in late 2016. HTML snapshots are a contentious topic with Google. However, they’re important to understand, because in certain situations they’re necessary.

If search engines (or sites like Facebook) cannot grasp your JavaScript, it’s better to return an HTML snapshot than not to have your content indexed and understood at all. Ideally, your site would leverage some form of user-agent detection on the server side and return the HTML snapshot to the bot.

At the same time, one must recognize that Google wants the same experience as the user (i.e., only provide Google with an HTML snapshot if the tests are dire and the JavaScript search group cannot provide support for your situation).

Considerations

When considering HTML snapshots, you must consider that Google has deprecated this AJAX recommendation. Although Google technically still supports it, Google recommends avoiding it. Yes, Google changed its mind and now want to receive the same experience as the user. This direction makes sense, as it allows the bot to receive an experience more true to the user experience.

A second consideration factor relates to the risk of cloaking. If the HTML snapshots are found to not represent the experience on the page, it’s considered a cloaking risk. Straight from the source:

“The HTML snapshot must contain the same content as the end user would see in a browser. If this is not the case, it may be considered cloaking.”
– Google Developer AJAX Crawling FAQs

Benefits

Despite the considerations, HTML snapshots have powerful advantages:

  1. Knowledge that search engines and crawlers will be able to understand the experience.
    • Certain types of JavaScript may be harder for Google to grasp (cough… Angular (also colloquially referred to as AngularJS 2) …cough).
  2. Other search engines and crawlers (think: Bing, Facebook) will be able to understand the experience.
    • Bing, among other search engines, has not stated that it can crawl and index JavaScript. HTML snapshots may be the only solution for a JavaScript-heavy site. As always, test to make sure that this is the case before diving in.

"It's not just Google understanding your JavaScript. It's also about the speed." -DOM - "It's not just about Google understanding your Javascript. it's also about your perceived latency." -DOM

Site latency

When browsers receive an HTML document and create the DOM (although there is some level of pre-scanning), most resources are loaded as they appear within the HTML document. This means that if you have a huge file toward the top of your HTML document, a browser will load that immense file first.

The concept of Google’s critical rendering path is to load what the user needs as soon as possible, which can be translated to → “get everything above-the-fold in front of the user, ASAP.”

Critical Rendering Path – Optimized Rendering Loads Progressively ASAP:

progressive page rendering

Image source

However, if you have unnecessary resources or JavaScript files clogging up the page’s ability to load, you get “render-blocking JavaScript.” Meaning: your JavaScript is blocking the page’s potential to appear as if it’s loading faster (also called: perceived latency).

Render-blocking JavaScript – Solutions

If you analyze your page speed results (through tools like Page Speed Insights Tool, WebPageTest.org, CatchPoint, etc.) and determine that there is a render-blocking JavaScript issue, here are three potential solutions:

  1. Inline: Add the JavaScript in the HTML document.
  2. Async: Make JavaScript asynchronous (i.e., add “async” attribute to HTML tag).
  3. Defer: By placing JavaScript lower within the HTML.

!!! Important note: It’s important to understand that scripts must be arranged in order of precedence. Scripts that are used to load the above-the-fold content must be prioritized and should not be deferred. Also, any script that references another file can only be used after the referenced file has loaded. Make sure to work closely with your development team to confirm that there are no interruptions to the user’s experience.

Read more: Google Developer’s Speed Documentation


TL;DR – Moral of the story

Crawlers and search engines will do their best to crawl, execute, and interpret your JavaScript, but it is not guaranteed. Make sure your content is crawlable, obtainable, and isn’t developing site latency obstructions. The key = every situation demands testing. Based on the results, evaluate potential solutions.

Thanks: Thank you Max Prin (@maxxeight) for reviewing this content piece and sharing your knowledge, insight, and wisdom. It wouldn’t be the same without you.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

“”

JavaScript & SEO: Making Your Bot Experience As Good As Your User Experience

Posted by webpage development and how to code basic JavaScript.

javacssseo.gif

Image sources: 1, 2, 3

JavaScript is either placed in the HTML document within <script> tags (i.e., it is embedded in the HTML) or linked/referenced. There are currently a plethora of JavaScript libraries and frameworks, including jQuery, AngularJS, ReactJS, EmberJS, etc.

JavaScript libraries and frameworks:

What is AJAX?

AJAX, or Asynchronous JavaScript and XML, is a set of web development techniques combining JavaScript and XML that allows web applications to communicate with a server in the background without interfering with the current page. Asynchronous means that other functions or lines of code can run while the async script is running. XML used to be the primary language to pass data; however, the term AJAX is used for all types of data transfers (including JSON; I guess “AJAJ” doesn’t sound as clean as “AJAX” [pun intended]).

A common use of AJAX is to update the content or layout of a webpage without initiating a full page refresh. Normally, when a page loads, all the assets on the page must be requested and fetched from the server and then rendered on the page. However, with AJAX, only the assets that differ between pages need to be loaded, which improves the user experience as they do not have to refresh the entire page.

One can think of AJAX as mini server calls. A good example of AJAX in action is Google Maps. The page updates without a full page reload (i.e., mini server calls are being used to load content as the user navigates).

Related image

Image source

What is the Document Object Model (DOM)?

As an SEO professional, you need to understand what the DOM is, because it’s what Google is using to analyze and understand webpages.

The DOM is what you see when you “Inspect Element” in a browser. Simply put, you can think of the DOM as the steps the browser takes after receiving the HTML document to render the page.

The first thing the browser receives is the HTML document. After that, it will start parsing the content within this document and fetch additional resources, such as images, CSS, and JavaScript files.

The DOM is what forms from this parsing of information and resources. One can think of it as a structured, organized version of the webpage’s code.

Nowadays the DOM is often very different from the initial HTML document, due to what’s collectively called dynamic HTML. Dynamic HTML is the ability for a page to change its content depending on user input, environmental conditions (e.g. time of day), and other variables, leveraging HTML, CSS, and JavaScript.

Simple example with a <title> tag that is populated through JavaScript:

HTML source

DOM

What is headless browsing?

Headless browsing is simply the action of fetching webpages without the user interface. It is important to understand because Google, and now Baidu, leverage headless browsing to gain a better understanding of the user’s experience and the content of webpages.

PhantomJS and Zombie.js are scripted headless browsers, typically used for automating web interaction for testing purposes, and rendering static HTML snapshots for initial requests (pre-rendering).


Why can JavaScript be challenging for SEO? (and how to fix issues)

There are three (3) primary reasons to be concerned about JavaScript on your site:

  1. Crawlability: Bots’ ability to crawl your site.
  2. Obtainability: Bots’ ability to access information and parse your content.
  3. Perceived site latency: AKA the Critical Rendering Path.

Crawlability

Are bots able to find URLs and understand your site’s architecture? There are two important elements here:

  1. Blocking search engines from your JavaScript (even accidentally).
  2. Proper internal linking, not leveraging JavaScript events as a replacement for HTML tags.

Why is blocking JavaScript such a big deal?

If search engines are blocked from crawling JavaScript, they will not be receiving your site’s full experience. This means search engines are not seeing what the end user is seeing. This can reduce your site’s appeal to search engines and could eventually be considered cloaking (if the intent is indeed malicious).

Fetch as Google and TechnicalSEO.com’s robots.txt and Fetch and Render testing tools can help to identify resources that Googlebot is blocked from.

The easiest way to solve this problem is through providing search engines access to the resources they need to understand your user experience.

!!! Important note: Work with your development team to determine which files should and should not be accessible to search engines.

Internal linking

Internal linking should be implemented with regular anchor tags within the HTML or the DOM (using an a hrefs=”www.example.com” HTML tag) versus leveraging JavaScript functions to allow the user to traverse the site.

Essentially: Don’t use JavaScript’s onclick events as a replacement for internal linking. While end URLs might be found and crawled (through strings in JavaScript code or XML sitemaps), they won’t be associated with the global navigation of the site.

Internal linking is a strong signal to search engines regarding the site’s architecture and importance of pages. In fact, internal links are so strong that they can (in certain situations) override “SEO hints” such as canonical tags.

URL structure

Historically, JavaScript-based websites (aka “AJAX sites”) were using fragment identifiers (#) within URLs.

  • Not recommended:
    • The Lone Hash (#) – The lone pound symbol is not crawlable. It is used to identify anchor link (aka jump links). These are the links that allow one to jump to a piece of content on a page. Anything after the lone hash portion of the URL is never sent to the server and will cause the page to automatically scroll to the first element with a matching ID (or the first <a> element with a name of the following information). Google recommends avoiding the use of “#” in URLs.
    • Hashbang (#!) (and escaped_fragments URLs) – Hashbang URLs were a hack to support crawlers (Google wants to avoid now and only Bing supports). Many a moon ago, Google and Bing developed a complicated AJAX solution, whereby a pretty (#!) URL with the UX co-existed with an equivalent escaped_fragment HTML-based experience for bots. Google has since backtracked on this recommendation, preferring to receive the exact user experience. In escaped fragments, there are two experiences here:
      • Original Experience (aka Pretty URL): This URL must either have a #! (hashbang) within the URL to indicate that there is an escaped fragment or a meta element indicating that an escaped fragment exists (<meta name=”fragment” content=”!”>).
      • Escaped Fragment (aka Ugly URL, HTML snapshot): This URL replace the hashbang (#!) with “_escaped_fragment_” and serves the HTML snapshot. It is called the ugly URL because it’s long and looks like (and for all intents and purposes is) a hack.

Image result

Image source

  • Recommended:
    • pushState History API – PushState is navigation-based and part of the History API (think: your web browsing history). Essentially, pushState updates the URL in the address bar and only what needs to change on the page is updated. It allows JS sites to leverage “clean” URLs. PushState is currently supported by Google, when supporting browser navigation for client-side or hybrid rendering.
      • A good use of pushState is for infinite scroll (i.e., as the user hits new parts of the page the URL will update). Ideally, if the user refreshes the page, the experience will land them in the exact same spot. However, they do not need to refresh the page, as the content updates as they scroll down, while the URL is updated in the address bar.
      • Example: A good example of a search engine-friendly infinite scroll implementation, created by Google’s John Mueller (go figure), can be found here. He technically leverages the replaceState(), which doesn’t include the same back button functionality as pushState.
      • Read more: Mozilla PushState History API Documents

Obtainability

Search engines have been shown to employ headless browsing to render the DOM to gain a better understanding of the user’s experience and the content on page. That is to say, Google can process some JavaScript and uses the DOM (instead of the HTML document).

At the same time, there are situations where search engines struggle to comprehend JavaScript. Nobody wants a Hulu situation to happen to their site or a client’s site. It is crucial to understand how bots are interacting with your onsite content. When you aren’t sure, test.

Assuming we’re talking about a search engine bot that executes JavaScript, there are a few important elements for search engines to be able to obtain content:

  • If the user must interact for something to fire, search engines probably aren’t seeing it.
    • Google is a lazy user. It doesn’t click, it doesn’t scroll, and it doesn’t log in. If the full UX demands action from the user, special precautions should be taken to ensure that bots are receiving an equivalent experience.
  • If the JavaScript occurs after the JavaScript load event fires plus ~5-seconds*, search engines may not be seeing it.
    • *John Mueller mentioned that there is no specific timeout value; however, sites should aim to load within five seconds.
    • *Screaming Frog tests show a correlation to five seconds to render content.
    • *The load event plus five seconds is what Google’s PageSpeed Insights, Mobile Friendliness Tool, and Fetch as Google use; check out Max Prin’s test timer.
  • If there are errors within the JavaScript, both browsers and search engines won’t be able to go through and potentially miss sections of pages if the entire code is not executed.

How to make sure Google and other search engines can get your content

1. TEST

The most popular solution to resolving JavaScript is probably not resolving anything (grab a coffee and let Google work its algorithmic brilliance). Providing Google with the same experience as searchers is Google’s preferred scenario.

Google first announced being able to “better understand the web (i.e., JavaScript)” in May 2014. Industry experts suggested that Google could crawl JavaScript way before this announcement. The iPullRank team offered two great pieces on this in 2011: Googlebot is Chrome and How smart are Googlebots? (thank you, Josh and Mike). Adam Audette’s Google can crawl JavaScript and leverages the DOM in 2015 confirmed. Therefore, if you can see your content in the DOM, chances are your content is being parsed by Google.

adamaudette - I don't always JavaScript, but when I do, I know google can crawl the dom and dynamically generated HTML

Recently, Barry Goralewicz performed a cool experiment testing a combination of various JavaScript libraries and frameworks to determine how Google interacts with the pages (e.g., are they indexing URL/content? How does GSC interact? Etc.). It ultimately showed that Google is able to interact with many forms of JavaScript and highlighted certain frameworks as perhaps more challenging. John Mueller even started a JavaScript search group (from what I’ve read, it’s fairly therapeutic).

All of these studies are amazing and help SEOs understand when to be concerned and take a proactive role. However, before you determine that sitting back is the right solution for your site, I recommend being actively cautious by experimenting with small section Think: Jim Collin’s “bullets, then cannonballs” philosophy from his book Great by Choice:

“A bullet is an empirical test aimed at learning what works and meets three criteria: a bullet must be low-cost, low-risk, and low-distraction… 10Xers use bullets to empirically validate what will actually work. Based on that empirical validation, they then concentrate their resources to fire a cannonball, enabling large returns from concentrated bets.”

Consider testing and reviewing through the following:

  1. Confirm that your content is appearing within the DOM.
  2. Test a subset of pages to see if Google can index content.
  • Manually check quotes from your content.
  • Fetch with Google and see if content appears.
  • Fetch with Google supposedly occurs around the load event or before timeout. It’s a great test to check to see if Google will be able to see your content and whether or not you’re blocking JavaScript in your robots.txt. Although Fetch with Google is not foolproof, it’s a good starting point.
  • Note: If you aren’t verified in GSC, try Technicalseo.com’s Fetch and Render As Any Bot Tool.

After you’ve tested all this, what if something’s not working and search engines and bots are struggling to index and obtain your content? Perhaps you’re concerned about alternative search engines (DuckDuckGo, Facebook, LinkedIn, etc.), or maybe you’re leveraging meta information that needs to be parsed by other bots, such as Twitter summary cards or Facebook Open Graph tags. If any of this is identified in testing or presents itself as a concern, an HTML snapshot may be the only decision.

2. HTML SNAPSHOTS
What are HTmL snapshots?

HTML snapshots are a fully rendered page (as one might see in the DOM) that can be returned to search engine bots (think: a static HTML version of the DOM).

Google introduced HTML snapshots 2009, deprecated (but still supported) them in 2015, and awkwardly mentioned them as an element to “avoid” in late 2016. HTML snapshots are a contentious topic with Google. However, they’re important to understand, because in certain situations they’re necessary.

If search engines (or sites like Facebook) cannot grasp your JavaScript, it’s better to return an HTML snapshot than not to have your content indexed and understood at all. Ideally, your site would leverage some form of user-agent detection on the server side and return the HTML snapshot to the bot.

At the same time, one must recognize that Google wants the same experience as the user (i.e., only provide Google with an HTML snapshot if the tests are dire and the JavaScript search group cannot provide support for your situation).

Considerations

When considering HTML snapshots, you must consider that Google has deprecated this AJAX recommendation. Although Google technically still supports it, Google recommends avoiding it. Yes, Google changed its mind and now want to receive the same experience as the user. This direction makes sense, as it allows the bot to receive an experience more true to the user experience.

A second consideration factor relates to the risk of cloaking. If the HTML snapshots are found to not represent the experience on the page, it’s considered a cloaking risk. Straight from the source:

“The HTML snapshot must contain the same content as the end user would see in a browser. If this is not the case, it may be considered cloaking.”
– Google Developer AJAX Crawling FAQs

Benefits

Despite the considerations, HTML snapshots have powerful advantages:

  1. Knowledge that search engines and crawlers will be able to understand the experience.
    • Certain types of JavaScript may be harder for Google to grasp (cough… Angular (also colloquially referred to as AngularJS 2) …cough).
  2. Other search engines and crawlers (think: Bing, Facebook) will be able to understand the experience.
    • Bing, among other search engines, has not stated that it can crawl and index JavaScript. HTML snapshots may be the only solution for a JavaScript-heavy site. As always, test to make sure that this is the case before diving in.

"It's not just Google understanding your JavaScript. It's also about the speed." -DOM - "It's not just about Google understanding your Javascript. it's also about your perceived latency." -DOM

Site latency

When browsers receive an HTML document and create the DOM (although there is some level of pre-scanning), most resources are loaded as they appear within the HTML document. This means that if you have a huge file toward the top of your HTML document, a browser will load that immense file first.

The concept of Google’s critical rendering path is to load what the user needs as soon as possible, which can be translated to → “get everything above-the-fold in front of the user, ASAP.”

Critical Rendering Path – Optimized Rendering Loads Progressively ASAP:

progressive page rendering

Image source

However, if you have unnecessary resources or JavaScript files clogging up the page’s ability to load, you get “render-blocking JavaScript.” Meaning: your JavaScript is blocking the page’s potential to appear as if it’s loading faster (also called: perceived latency).

Render-blocking JavaScript – Solutions

If you analyze your page speed results (through tools like Page Speed Insights Tool, WebPageTest.org, CatchPoint, etc.) and determine that there is a render-blocking JavaScript issue, here are three potential solutions:

  1. Inline: Add the JavaScript in the HTML document.
  2. Async: Make JavaScript asynchronous (i.e., add “async” attribute to HTML tag).
  3. Defer: By placing JavaScript lower within the HTML.

!!! Important note: It’s important to understand that scripts must be arranged in order of precedence. Scripts that are used to load the above-the-fold content must be prioritized and should not be deferred. Also, any script that references another file can only be used after the referenced file has loaded. Make sure to work closely with your development team to confirm that there are no interruptions to the user’s experience.

Read more: Google Developer’s Speed Documentation


TL;DR – Moral of the story

Crawlers and search engines will do their best to crawl, execute, and interpret your JavaScript, but it is not guaranteed. Make sure your content is crawlable, obtainable, and isn’t developing site latency obstructions. The key = every situation demands testing. Based on the results, evaluate potential solutions.

Thanks: Thank you Max Prin (@maxxeight) for reviewing this content piece and sharing your knowledge, insight, and wisdom. It wouldn’t be the same without you.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Posted by webpage development and how to code basic JavaScript.

javacssseo.gif

Image sources: 1, 2, 3

JavaScript is either placed in the HTML document within <script> tags (i.e., it is embedded in the HTML) or linked/referenced. There are currently a plethora of JavaScript libraries and frameworks, including jQuery, AngularJS, ReactJS, EmberJS, etc.

JavaScript libraries and frameworks:

What is AJAX?

AJAX, or Asynchronous JavaScript and XML, is a set of web development techniques combining JavaScript and XML that allows web applications to communicate with a server in the background without interfering with the current page. Asynchronous means that other functions or lines of code can run while the async script is running. XML used to be the primary language to pass data; however, the term AJAX is used for all types of data transfers (including JSON; I guess “AJAJ” doesn’t sound as clean as “AJAX” [pun intended]).

A common use of AJAX is to update the content or layout of a webpage without initiating a full page refresh. Normally, when a page loads, all the assets on the page must be requested and fetched from the server and then rendered on the page. However, with AJAX, only the assets that differ between pages need to be loaded, which improves the user experience as they do not have to refresh the entire page.

One can think of AJAX as mini server calls. A good example of AJAX in action is Google Maps. The page updates without a full page reload (i.e., mini server calls are being used to load content as the user navigates).

Related image

Image source

What is the Document Object Model (DOM)?

As an SEO professional, you need to understand what the DOM is, because it’s what Google is using to analyze and understand webpages.

The DOM is what you see when you “Inspect Element” in a browser. Simply put, you can think of the DOM as the steps the browser takes after receiving the HTML document to render the page.

The first thing the browser receives is the HTML document. After that, it will start parsing the content within this document and fetch additional resources, such as images, CSS, and JavaScript files.

The DOM is what forms from this parsing of information and resources. One can think of it as a structured, organized version of the webpage’s code.

Nowadays the DOM is often very different from the initial HTML document, due to what’s collectively called dynamic HTML. Dynamic HTML is the ability for a page to change its content depending on user input, environmental conditions (e.g. time of day), and other variables, leveraging HTML, CSS, and JavaScript.

Simple example with a <title> tag that is populated through JavaScript:

HTML source

DOM

What is headless browsing?

Headless browsing is simply the action of fetching webpages without the user interface. It is important to understand because Google, and now Baidu, leverage headless browsing to gain a better understanding of the user’s experience and the content of webpages.

PhantomJS and Zombie.js are scripted headless browsers, typically used for automating web interaction for testing purposes, and rendering static HTML snapshots for initial requests (pre-rendering).


Why can JavaScript be challenging for SEO? (and how to fix issues)

There are three (3) primary reasons to be concerned about JavaScript on your site:

  1. Crawlability: Bots’ ability to crawl your site.
  2. Obtainability: Bots’ ability to access information and parse your content.
  3. Perceived site latency: AKA the Critical Rendering Path.

Crawlability

Are bots able to find URLs and understand your site’s architecture? There are two important elements here:

  1. Blocking search engines from your JavaScript (even accidentally).
  2. Proper internal linking, not leveraging JavaScript events as a replacement for HTML tags.

Why is blocking JavaScript such a big deal?

If search engines are blocked from crawling JavaScript, they will not be receiving your site’s full experience. This means search engines are not seeing what the end user is seeing. This can reduce your site’s appeal to search engines and could eventually be considered cloaking (if the intent is indeed malicious).

Fetch as Google and TechnicalSEO.com’s robots.txt and Fetch and Render testing tools can help to identify resources that Googlebot is blocked from.

The easiest way to solve this problem is through providing search engines access to the resources they need to understand your user experience.

!!! Important note: Work with your development team to determine which files should and should not be accessible to search engines.

Internal linking

Internal linking should be implemented with regular anchor tags within the HTML or the DOM (using an a hrefs=”www.example.com” HTML tag) versus leveraging JavaScript functions to allow the user to traverse the site.

Essentially: Don’t use JavaScript’s onclick events as a replacement for internal linking. While end URLs might be found and crawled (through strings in JavaScript code or XML sitemaps), they won’t be associated with the global navigation of the site.

Internal linking is a strong signal to search engines regarding the site’s architecture and importance of pages. In fact, internal links are so strong that they can (in certain situations) override “SEO hints” such as canonical tags.

URL structure

Historically, JavaScript-based websites (aka “AJAX sites”) were using fragment identifiers (#) within URLs.

  • Not recommended:
    • The Lone Hash (#) – The lone pound symbol is not crawlable. It is used to identify anchor link (aka jump links). These are the links that allow one to jump to a piece of content on a page. Anything after the lone hash portion of the URL is never sent to the server and will cause the page to automatically scroll to the first element with a matching ID (or the first <a> element with a name of the following information). Google recommends avoiding the use of “#” in URLs.
    • Hashbang (#!) (and escaped_fragments URLs) – Hashbang URLs were a hack to support crawlers (Google wants to avoid now and only Bing supports). Many a moon ago, Google and Bing developed a complicated AJAX solution, whereby a pretty (#!) URL with the UX co-existed with an equivalent escaped_fragment HTML-based experience for bots. Google has since backtracked on this recommendation, preferring to receive the exact user experience. In escaped fragments, there are two experiences here:
      • Original Experience (aka Pretty URL): This URL must either have a #! (hashbang) within the URL to indicate that there is an escaped fragment or a meta element indicating that an escaped fragment exists (<meta name=”fragment” content=”!”>).
      • Escaped Fragment (aka Ugly URL, HTML snapshot): This URL replace the hashbang (#!) with “_escaped_fragment_” and serves the HTML snapshot. It is called the ugly URL because it’s long and looks like (and for all intents and purposes is) a hack.

Image result

Image source

  • Recommended:
    • pushState History API – PushState is navigation-based and part of the History API (think: your web browsing history). Essentially, pushState updates the URL in the address bar and only what needs to change on the page is updated. It allows JS sites to leverage “clean” URLs. PushState is currently supported by Google, when supporting browser navigation for client-side or hybrid rendering.
      • A good use of pushState is for infinite scroll (i.e., as the user hits new parts of the page the URL will update). Ideally, if the user refreshes the page, the experience will land them in the exact same spot. However, they do not need to refresh the page, as the content updates as they scroll down, while the URL is updated in the address bar.
      • Example: A good example of a search engine-friendly infinite scroll implementation, created by Google’s John Mueller (go figure), can be found here. He technically leverages the replaceState(), which doesn’t include the same back button functionality as pushState.
      • Read more: Mozilla PushState History API Documents

Obtainability

Search engines have been shown to employ headless browsing to render the DOM to gain a better understanding of the user’s experience and the content on page. That is to say, Google can process some JavaScript and uses the DOM (instead of the HTML document).

At the same time, there are situations where search engines struggle to comprehend JavaScript. Nobody wants a Hulu situation to happen to their site or a client’s site. It is crucial to understand how bots are interacting with your onsite content. When you aren’t sure, test.

Assuming we’re talking about a search engine bot that executes JavaScript, there are a few important elements for search engines to be able to obtain content:

  • If the user must interact for something to fire, search engines probably aren’t seeing it.
    • Google is a lazy user. It doesn’t click, it doesn’t scroll, and it doesn’t log in. If the full UX demands action from the user, special precautions should be taken to ensure that bots are receiving an equivalent experience.
  • If the JavaScript occurs after the JavaScript load event fires plus ~5-seconds*, search engines may not be seeing it.
    • *John Mueller mentioned that there is no specific timeout value; however, sites should aim to load within five seconds.
    • *Screaming Frog tests show a correlation to five seconds to render content.
    • *The load event plus five seconds is what Google’s PageSpeed Insights, Mobile Friendliness Tool, and Fetch as Google use; check out Max Prin’s test timer.
  • If there are errors within the JavaScript, both browsers and search engines won’t be able to go through and potentially miss sections of pages if the entire code is not executed.

How to make sure Google and other search engines can get your content

1. TEST

The most popular solution to resolving JavaScript is probably not resolving anything (grab a coffee and let Google work its algorithmic brilliance). Providing Google with the same experience as searchers is Google’s preferred scenario.

Google first announced being able to “better understand the web (i.e., JavaScript)” in May 2014. Industry experts suggested that Google could crawl JavaScript way before this announcement. The iPullRank team offered two great pieces on this in 2011: Googlebot is Chrome and How smart are Googlebots? (thank you, Josh and Mike). Adam Audette’s Google can crawl JavaScript and leverages the DOM in 2015 confirmed. Therefore, if you can see your content in the DOM, chances are your content is being parsed by Google.

adamaudette - I don't always JavaScript, but when I do, I know google can crawl the dom and dynamically generated HTML

Recently, Barry Goralewicz performed a cool experiment testing a combination of various JavaScript libraries and frameworks to determine how Google interacts with the pages (e.g., are they indexing URL/content? How does GSC interact? Etc.). It ultimately showed that Google is able to interact with many forms of JavaScript and highlighted certain frameworks as perhaps more challenging. John Mueller even started a JavaScript search group (from what I’ve read, it’s fairly therapeutic).

All of these studies are amazing and help SEOs understand when to be concerned and take a proactive role. However, before you determine that sitting back is the right solution for your site, I recommend being actively cautious by experimenting with small section Think: Jim Collin’s “bullets, then cannonballs” philosophy from his book Great by Choice:

“A bullet is an empirical test aimed at learning what works and meets three criteria: a bullet must be low-cost, low-risk, and low-distraction… 10Xers use bullets to empirically validate what will actually work. Based on that empirical validation, they then concentrate their resources to fire a cannonball, enabling large returns from concentrated bets.”

Consider testing and reviewing through the following:

  1. Confirm that your content is appearing within the DOM.
  2. Test a subset of pages to see if Google can index content.
  • Manually check quotes from your content.
  • Fetch with Google and see if content appears.
  • Fetch with Google supposedly occurs around the load event or before timeout. It’s a great test to check to see if Google will be able to see your content and whether or not you’re blocking JavaScript in your robots.txt. Although Fetch with Google is not foolproof, it’s a good starting point.
  • Note: If you aren’t verified in GSC, try Technicalseo.com’s Fetch and Render As Any Bot Tool.

After you’ve tested all this, what if something’s not working and search engines and bots are struggling to index and obtain your content? Perhaps you’re concerned about alternative search engines (DuckDuckGo, Facebook, LinkedIn, etc.), or maybe you’re leveraging meta information that needs to be parsed by other bots, such as Twitter summary cards or Facebook Open Graph tags. If any of this is identified in testing or presents itself as a concern, an HTML snapshot may be the only decision.

2. HTML SNAPSHOTS
What are HTmL snapshots?

HTML snapshots are a fully rendered page (as one might see in the DOM) that can be returned to search engine bots (think: a static HTML version of the DOM).

Google introduced HTML snapshots 2009, deprecated (but still supported) them in 2015, and awkwardly mentioned them as an element to “avoid” in late 2016. HTML snapshots are a contentious topic with Google. However, they’re important to understand, because in certain situations they’re necessary.

If search engines (or sites like Facebook) cannot grasp your JavaScript, it’s better to return an HTML snapshot than not to have your content indexed and understood at all. Ideally, your site would leverage some form of user-agent detection on the server side and return the HTML snapshot to the bot.

At the same time, one must recognize that Google wants the same experience as the user (i.e., only provide Google with an HTML snapshot if the tests are dire and the JavaScript search group cannot provide support for your situation).

Considerations

When considering HTML snapshots, you must consider that Google has deprecated this AJAX recommendation. Although Google technically still supports it, Google recommends avoiding it. Yes, Google changed its mind and now want to receive the same experience as the user. This direction makes sense, as it allows the bot to receive an experience more true to the user experience.

A second consideration factor relates to the risk of cloaking. If the HTML snapshots are found to not represent the experience on the page, it’s considered a cloaking risk. Straight from the source:

“The HTML snapshot must contain the same content as the end user would see in a browser. If this is not the case, it may be considered cloaking.”
– Google Developer AJAX Crawling FAQs

Benefits

Despite the considerations, HTML snapshots have powerful advantages:

  1. Knowledge that search engines and crawlers will be able to understand the experience.
    • Certain types of JavaScript may be harder for Google to grasp (cough… Angular (also colloquially referred to as AngularJS 2) …cough).
  2. Other search engines and crawlers (think: Bing, Facebook) will be able to understand the experience.
    • Bing, among other search engines, has not stated that it can crawl and index JavaScript. HTML snapshots may be the only solution for a JavaScript-heavy site. As always, test to make sure that this is the case before diving in.

"It's not just Google understanding your JavaScript. It's also about the speed." -DOM - "It's not just about Google understanding your Javascript. it's also about your perceived latency." -DOM

Site latency

When browsers receive an HTML document and create the DOM (although there is some level of pre-scanning), most resources are loaded as they appear within the HTML document. This means that if you have a huge file toward the top of your HTML document, a browser will load that immense file first.

The concept of Google’s critical rendering path is to load what the user needs as soon as possible, which can be translated to → “get everything above-the-fold in front of the user, ASAP.”

Critical Rendering Path – Optimized Rendering Loads Progressively ASAP:

progressive page rendering

Image source

However, if you have unnecessary resources or JavaScript files clogging up the page’s ability to load, you get “render-blocking JavaScript.” Meaning: your JavaScript is blocking the page’s potential to appear as if it’s loading faster (also called: perceived latency).

Render-blocking JavaScript – Solutions

If you analyze your page speed results (through tools like Page Speed Insights Tool, WebPageTest.org, CatchPoint, etc.) and determine that there is a render-blocking JavaScript issue, here are three potential solutions:

  1. Inline: Add the JavaScript in the HTML document.
  2. Async: Make JavaScript asynchronous (i.e., add “async” attribute to HTML tag).
  3. Defer: By placing JavaScript lower within the HTML.

!!! Important note: It’s important to understand that scripts must be arranged in order of precedence. Scripts that are used to load the above-the-fold content must be prioritized and should not be deferred. Also, any script that references another file can only be used after the referenced file has loaded. Make sure to work closely with your development team to confirm that there are no interruptions to the user’s experience.

Read more: Google Developer’s Speed Documentation


TL;DR – Moral of the story

Crawlers and search engines will do their best to crawl, execute, and interpret your JavaScript, but it is not guaranteed. Make sure your content is crawlable, obtainable, and isn’t developing site latency obstructions. The key = every situation demands testing. Based on the results, evaluate potential solutions.

Thanks: Thank you Max Prin (@maxxeight) for reviewing this content piece and sharing your knowledge, insight, and wisdom. It wouldn’t be the same without you.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

“”