How to Perform the World's Greatest SEO Audit

Now that tax season is over, it's once again safe to say my favorite A-word... audit! That's right. My name is Steve, and I'm an SEO audit junkie.

Like any good junkie, I've read every audit-related article; I've written thousands of lines of audit-related code, and I've performed audits for friends, clients, and pretty much everyone else I know with a website.

All of this research and experience has helped me create an insanely thorough SEO audit process. And today, I'm going to share that process with you.

This is designed to be a comprehensive guide for performing a technical SEO audit. Whether you're auditing your own site, investigating an issue for a client, or just looking for good bathroom reading material, I can assure you that this guide has a little something for everyone. So without further ado, let's begin.

SEO Audit Preparation

When performing an audit, most people want to dive right into the analysis. Although I agree it's a lot more fun to immediately start analyzing, you should resist the urge.

A thorough audit requires at least a little planning to ensure nothing slips through the cracks.

Crawl Before You Walk

Before we can diagnose problems with the site, we have to know exactly what we're dealing with. Therefore, the first (and most important) preparation step is to crawl the entire website.

Crawling Tools

I've written custom crawling and analysis code for my audits, but if you want to avoid coding, I recommend using Screaming Frog's SEO Spider to perform the site crawl (it's free for the first 500 URIs and £99/year after that).

Alternatively, if you want a truly free tool, you can use Xenu's Link Sleuth; however, be forewarned that this tool was designed to crawl a site to find broken links. It displays a site's page titles and meta descriptions, but it was not created to perform the level of analysis we're going to discuss.

For more information about these crawling tools, read Dr. Pete's Crawler Face-off: Xenu vs. Screaming Frog.

Crawling Configuration

Once you've chosen (or developed) a crawling tool, you need to configure it to behave like your favorite search engine crawler (e.g., Googlebot, Bingbot, etc.). First, you should set the crawler's user agent to an appropriate string.

Popular Search Engine User Agents:

Googlebot - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Bingbot - "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
Next, you should decide how you want the crawler to handle various Web technologies.

There is an ongoing debate about the intelligence of search engine crawlers. It's not entirely clear if they are full-blown headless browsers or simply glorified curl scripts (or something in between).

By default, I suggest disabling cookies, JavaScript, and CSS when crawling a site. If you can diagnose and correct the problems encountered by dumb crawlers, that work can also be applied to most (if not all) of the problems experienced by smarter crawlers.

Then, for situations where a dumb crawler just won't cut it (e.g., pages that are heavily reliant on AJAX), you can switch to a smarter crawler.

Ask the Oracles

The site crawl gives us a wealth of information, but to take this audit to the next level, we need to consult the search engines. Unfortunately, search engines don't like to give unrestricted access to their servers so we'll just have to settle for the next best thing: webmaster tools.

Most of the major search engines offer a set of diagnostic tools for webmasters, but for our purposes, we'll focus on Google Webmaster Tools and Bing Webmaster Tools. If you still haven't registered your site with these services, now's as good a time as any.

Now that we've consulted the search engines, we also need to get input from the site's visitors. The easiest way to get that input is through the site's analytics.

The Web is being monitored by an ever-expanding list of analytics packages, but for our purposes, it doesn't matter which package your site is using. As long as you can investigate your site's traffic patterns, you're good to go for our upcoming analysis.

At this point, we're not finished collecting data, but we have enough to begin the analysis so let's get this party started!

SEO Audit Analysis

The actual analysis is broken down into five large sections:

Accessibility
Indexability
On-Page Ranking Factors
Off-Page Ranking Factors
Competitive Analysis

(1) Accessibility

If search engines and users can't access your site, it might as well not exist. With that in mind, let's make sure your site's pages are accessible.

Robots.txt

The robots.txt file is used to restrict search engine crawlers from accessing sections of your website. Although the file is very useful, it's also an easy way to inadvertently block crawlers.

As an extreme example, the following robots.txt entry restricts all crawlers from accessing any part of your site:

Robots.txt Example

Manually check the robots.txt file, and make sure it's not restricting access to important sections of your site. You can also use your Google Webmaster Tools account to identify URLs that are being blocked by the file.

Robots Meta Tags

The robots meta tag is used to tell search engine crawlers if they are allowed to index a specific page and follow its links.

When analyzing your site's accessibility, you want to identify pages that are inadvertently blocking crawlers. Here is an example of a robots meta tag that prevents crawlers from indexing a page and following its links:

Robots Meta Tag Example

HTTP Status Codes


Search engines and users are unable to access your site's content if you have URLs that return errors (i.e., 4xx and 5xx HTTP status codes).

During your site crawl, you should identify and fix any URLs that return errors (this also includes soft 404 errors). If a broken URL's corresponding page is no longer available on your site, redirect the URL to a relevant replacement.

Speaking of redirection, this is also a great opportunity to inventory your site's redirection techniques. Be sure the site is using 301 HTTP redirects (and not 302 HTTP redirects, meta refresh redirects, or JavaScript-based redirects) because they pass the most link juice to their destination pages.

XML Sitemap


Your site's XML Sitemap provides a roadmap for search engine crawlers to ensure they can easily find all of your site's pages.

Here are a few important questions to answer about your Sitemap:

Is the Sitemap a well-formed XML document? Does it follow the Sitemap protocol? Search engines expect a specific format for Sitemaps; if yours doesn't conform to this format, it might not be processed correctly.

Has the Sitemap been submitted to your webmaster tools accounts? It's possible for search engines to find the Sitemap without your assistance, but you should explicitly notify them about its location.

Did you find pages in the site crawl that do not appear in the Sitemap? You want to make sure the Sitemap presents an up-to-date view of the website.

Are there pages listed in the Sitemap that do not appear in the site crawl? If these pages still exist on the site, they are currently orphaned. Find an appropriate location for them in the site architecture, and make sure they receive at least one internal backlink.

Your site architecture defines the overall structure of your website, including its vertical depth (how many levels it has) as well as its horizontal breadth at each level.

When evaluating your site architecture, identify how many clicks it takes to get from the homepage to other important pages. Also, evaluate how well pages are linking to others in the site's hierarchy, and make sure the most important pages are prioritized in the architecture.

Ideally, you want to strive for a flatter site architecture that takes advantage of both vertical and horizontal linking opportunities.

Flash and JavaScript Navigation

The best site architecture in the world can be undermined by navigational elements that are inaccessible to search engines. Although search engine crawlers have become much more intelligent over the years, it is still safer to avoid Flash and JavaScript navigation.

To evaluate your site's usage of JavaScript navigation, you can perform two separate site crawls: one with JavaScript disabled and another with it enabled. Then, you can compare the corresponding link graphs to identify sections of the site that are inaccessible without JavaScript.

Site Performance

Users have a very limited attention span, and if your site takes too long to load, they will leave. Similarly, search engine crawlers have a limited amount of time that they can allocate to each site on the Internet. Consequently, sites that load quickly are crawled more thoroughly and more consistently than slower ones.

You can evaluate your site's performance with a number of different tools. Google Page Speed and YSlow check a given page using various best practices and then provide helpful suggestions (e.g., enable compression, leverage a content distribution network for heavily used resources, etc.). Pingdom Full Page Test presents an itemized list of the objects loaded by a page, their sizes, and their load times. Here's an excerpt from Pingdom's results for SEOmoz:

Pingdom Results for SEOmoz

These tools help you identify pages (and specific objects on those pages) that are serving as bottlenecks for your site. Then, you can itemize suggestions for optimizing those bottlenecks and improving your site's performance.

Source : http://moz.com/blog/how-to-perform-the-worlds-greatest-seo-audit