How to get your pages found by search engines and preserve their link power (Technical SEO for Profit, part 2)

Posted by Mark Nunney
Learn SEO
How to get your pages found by search engines and preserve their link power - Technical SEO for Profit series, part 2 - from Wordtracker

Your first technical job as an SEO is to make sure search engines find and index the pages you want them to. Mark Nunney explains how to optimize, getting the right pages indexed and then keeping them that way through inevitable URL changes and broken links.

Technical SEO for Profit

SEO success starts with technical SEO. Based on extracts from Mark Nunney’s best-selling e-book, SEO for Profit, this article is part of a series on technical SEO, including:

1) How to optimize URLs for search engines and people
2) How to get your pages found and preserve their link power
3) How to optimize your code for search engines
4) Tracking response for SEO with Google Analytics
5) Technical SEO checklist

HTML sitemaps for users and search engines

HTML sitemaps are pages on your site that use links to show users and search engines where to find your site’s different pages.

If your site is not too big (say, fewer than 200 pages), one page can link directly to all of your site’s pages.

You should link to your HTML sitemap from your site’s home page.

XML Sitemaps for search engines

XML Sitemaps are submitted to search engines to help them find and index your site’s pages efficiently. As well as showing where your pages are (their URLs) they can give information about:

  • The relative importance of each page (with rank ranging from 1, extremely, to 0.1, not very).
  • How often a page changes (from always, every time it’s accessed, to never).
  • On what date a page was last changed.

Although Bing prefers that you do, you don’t need to submit an XML Sitemap unless you have pages that struggle to get indexed. If that is the case it’s better to solve the indexing problem first by:

  • Making sure search engines have clear links through your site’s structure to all pages (at least via an HTML Sitemap).
  • Using consistent URLs (one per page, see our clean URLs article)
  • Ensuring links on your site do not require Flash, JavaScript or Ajax to be accessed. Not that search engines can’t follow such links: it’s just easier for them to follow HTML links.
  • Giving all pages unique content, page title tags and meta descriptions.
  • Ensuring key category pages (and as many as possible content pages) have their own inbound links from other sites (deep links).

That done, an XML Sitemap is a good ‘belt and braces’ tactic to make sure all your pages get indexed.

More reading about XML

This Google video compares XML and HTML Sitemaps.

Here’s a how-to guide to building your own Sitemaps:

And here is a list of Sitemap generators that help you do the job

XML Sitemaps can be generated for specific types of pages including:

Robots.txt files - is yours set up correctly?

Robots.txt is a simple text file that sits on your site’s root. Ie, at this address:

yoursite.com/robots.txt

Use robots.txt to tell search engines (and other spiders and robots) which pages you DON’T want them to visit or index.

If you’re having indexing problems (pages from your site aren't showing up in Google's results), always check your robots.txt file first.

Recently, Wordtracker audited more than 100 sites. The most common reason why sites weren't ranking well was the content of the robots.txt file. Often, the robots.txt file was set up to tell Google not to crawl any of the site's pages. In effect, those sites were invisible to Google.

So, if your site's pages aren't showing up in Google's results, run a site audit and check that your robots.txt file isn't the cause.

You'll find more help at: http://googlewebmastercentral.blogspot.com/2008/03/speaking-language-of-robots.html

That all seems simple enough. If you don’t want pages to be seen by users and search engines then you use robots.txt, right? Not quite ...

Robots.txt will only stop search engines from crawling a page. It won’t stop them finding it or listing it in their search results and worse …

... inbound links to such pages will send their link power down a black hole. Ie, your 'link juice' will be lost.

To fully control page indexing and link power, you need to learn how to use the meta robots tag. The meta robots tag offers more flexibility than robots.txt, allowing you to index/noindex and follow/nofollow individual pages.

Meta robots tag

The meta robots tag sits in the '<head>' part of your page’s code.

Here’s an example of how it looks:

<meta name=”robots” content=”noindex”>

Its main uses are to ask search engines to:

  • Not index a page (using the value ‘noindex’).
  • Not follow any link on the page (using the value ‘nofollow’).

Let’s look in detail at how to use meta robots:

  • Do not index this page. Link power will be passed through links on the page to the pages they link to.

<meta name=”robots” content=”noindex”><meta name=”robots” content=”noindex,follow”>

However, there is no need to use the latter (noindex,follow) because by default a page’s links will be followed.

  • Do not index this page and don’t follow the links on it. Link power to the page will be lost. Here are two ways of writing the code for that:

<meta name=”robots” content=”noindex,nofollow”><meta name=”robots” content=”none”>

  • Index this page and follow links from it. Code options are:

<meta name=”robots” content=”index”><meta name=”robots” content=”index,follow”>

None of the above need be used because a page’s default condition is to be indexed and for its links to be followed.

  • Index this page but do not follow links on it. The page is indexed but its link power is going nowhere.

<meta name=”robots” content=”nofollow”><meta name=”robots” content=”index,nofollow”>

However, there is no need to use the latter (index,nofollow) because by default a page will be indexed.

Important: nofollow links don’t pass on link power but their share of a page’s link power is not given to other links – it’s gone. Nofollow links are black holes for link power.

One last thing on meta robots: meta robots can’t work if you’ve blocked a page from being crawled with robots.txt.

Here’s more help from Google on the meta robots tag

The nofollow tag

A rel=”nofollow” tag can be applied to individual links on a page to stop those links being followed and passing on a page’s link power. The tag looks like this:

<a href=”http://www.examplesite.com/” rel=”nofollow”>this is the link (aka anchor) text</a>

This will override a robot 'follow' tag for that link only.

You might want to use a nofollow tag if:

  • You don't trust - or can't be sure - of the content you're linking to.
  • The link has been paid for. Google strongly suggests that paid links should use the nofollow tag.
  • The link points to a page that you don't want Google to crawl.

Here's some more on nofollow tags

Important: nofollow links are link power black holes. A nofollow tag stops the linked-to page receiving any link power but it doesn’t stop your page losing that link power. So you will not preserve your pages’ link power with nofollow links. This means you can’t use nofollow to ‘sculpt’ your site’s link power into a select number of pages.

Broken links (404s)

A broken link is a link to a page that can’t be found. Often called a 404 after the HTML ‘not found’ error message that is displayed in a browser when a page is not found.

Obviously users don’t like 404s because they haven’t found what they are looking for.

Not so obvious is that 404s are an SEO opportunity. Links mean prizes in SEO and if a link to your site gets a 404 then your site doesn’t get the link power. Make that link power yours by either:

  • Making a page with the ‘not found’ URL.
  • Adding a 301 redirect from the 404 URL to a valid URL.

404s might be the easiest links you’ll get. All you have to do is find the 404s, which you can do like this:

  • Use a site audit tool like Screaming Frog to audit your site. You'll be presented with a list of your site's internal broken links (404 errors).
  • Visit Google Webmaster Tools > Diagnostics > Crawl errors > Show URLs: Not found
  • Configure Google Analytics by adding GA tracking code to a custom 404 page. Modify the tracking code with trackPageview so that pages with URLs containing /404 (plus the failed URL path) will appear in your Top Pages reports. This is a little more long-winded, so you may need the configuration details here:

http://www.google.com/support/analytics/bin/answer.py?answer=86927

Check redirects (302s and 301s)

If you are changing a page’s URL then you need to let search engines know about it. Otherwise all the page’s link power will be lost and the page (with its new URL) will need to start building its reputation in Google’s index from scratch.

This is best done with a 301 permanent redirect.

A 302 is a temporary redirect and it should be used when a redirect is, er, temporary.

If you find any 302 redirects that should be permanent then change them to 301s.

You can check the status of any redirects you find using a server header checker like this one:http://www.seoconsultants.com/tools/headers

We look at 301 redirects in more detail in the sections below on domain name changes

Site crawls and indexing

Google finds your website’s content with two 'spiders'. Googlebot 'crawls' sites looking for pages for Google's main web index and Google News (think 'desktop' content). Googlebot-Mobile crawls looking for content for Google's mobile index (think 'smartphone' and 'feature phone' content).

There are three ways to see some of what Google’s spiders find:

1) Use a site audit tool such as Screaming Frog. It will crawl your site looking for and highlighting SEO errors.

2) Search Google with site:yoursite.com. This shows you the pages Google has indexed. Look for the following:

  • Pages with titles but no description. Do you want these pages indexed? If you see these, check the meta robots tags (see above) on those pages and on the links to those pages.
  • Different versions of the same pages eg, https and http, www and non- www, faceted pages, session ids, tracking parameters. See our post on clean urls for more information on how to deal with these.

3) Use your Google Webmaster Tools (GWT) account http://www.google.com/webmasters/tools/

In your GWT account, Google summarizes some of the results of its crawls and gives you lots of advice to consider. Check the following ...

Optimization > Sitemaps This refers to XML Sitemaps. Fix any errors shown.

Health > Blocked URLs > robots.txt analysis Fix any errors shown.

Configuration > Sitelinks Here Google shows you which pages (if any) on your site might get shown on Google’s results pages with ‘sitelinks’.

Sitelinks appear with some site’s listings on Google’s results pages for some searches. They look like this:

8_site_links

Check the sitelinks chosen are relevant and wanted. Block any that aren’t (new ones should soon appear to replace them). Watch out for Christmas links out of season!

Configuration > Settings > Geographic target If you are targeting a particular country then choose which one. If you are not targeting a particular country then choose ‘Unlisted’.

If different parts of the same site are targeting different countries (eg, blue widgets in France and blue widgets in USA) then:

  • Create new GWT accounts for the different parts of the site.
  • Set each of their geographic targets.

This only works with country-neutral top-level domains like .com and .org.

Configuration > Settings > Preferred domain As discussed above, this is where you let Google know your canonical domain. Eg, http://www.yoursite.com or yoursite.com.

Health > Malware If any warnings show here then act as quickly as possible as you’ll be getting no traffic from Google.

Health > Crawl errors Look for 404s. Implement 301 redirects from them to appropriate pages on your site.These might be the easiest links you’ll ever do any work for.

Health > Crawl Stats > Pages crawled per day Regularly check this report and look for inactivity and changes. But not just drops in the number of pages crawled.

Eg, the report below shows a large increase in the number of pages crawled each day. This was a warning of a problem: a number of relcanonical tags on faceted URLs went missing causing Google to start crawling thousands of duplicate URLs.

pages_crawled

Health > Crawl Stats > Kilobytes downloaded per day As with the ‘pages crawled per day’ report, use this to spot changes in Googlebot’s behavior that might warn of problems.

Eg, the report below is from the same site as above. Note that even though more pages started to be crawled (see graph above), fewer (not more) kilobytes were downloaded each day. Was this because Googlebot found those thousands of duplicate pages but didn’t download them all?

kb_downloaded

Health > Crawl Stats > Time spent downloading a page (page speed) This report is important. Google has declared that page load speed is a ranking factor. They might be bluffing but it’s wise to act as if not. Anyway, slow page speeds might stop visitors staying or returning to your site - both of which may be used as ranking signals. Check this report regularly and take action if your page speeds are changing, as on the following example:

time_spent_downloading

Below we’ll look at GWT’s ‘Site performance’ report which might look like it’s measuring the same thing but it’s not.

‘Time spent downloading a page’ is the average time it took Googlebot to download a page’s HTML.

‘Site performance’ is the average time it took for site visitors to download everything on visited pages, including images and scripts.

Optimization > HTML Improvements Both the Sreaming Frog site audit tool and Google's Webmaster Tools give reports showing pages with the following problems:

  • Duplicate meta descriptions
  • Long meta descriptions
  • Short meta descriptions
  • Missing page title tags
  • Duplicate page title tags
  • Long page title tags
  • Short page title tags
  • Non-indexable content

Reports here may indicate bigger problems. For example, the following report shows 21,869 pages with duplicate meta descriptions which are the result of our old enemy, faceted pages - multiple versions of the same page - getting indexed.

instant preview of patagonia search on Google

Labs > Instant previews Instant previews are snapshots of pages taken to show searchers on Google’s results pages. You can see an example here:

instant preview of patagonia search on Google

Check to see your page’s instant previews are looking good. If they are not it could be because:

  • Your page has too much Flash on it. In which case make it work for non-Flash (this will help iPad and iPhone users too).
  • Your site is displaying different content to Googlebot and users which might be deemed ‘cloaking’ which can get your site banned because it’s against Google’s Quality Guidelines.

You can find those guidelines here:https://www.google.com/support/webmasters/bin/answer.py?answer=35769

You can display instant previews for videos too if you submit a Video Sitemap. See:http://www.google.com/support/webmasters/bin/answer.py?answer=1217726

If your domain name has changed

If your domain has been changed in the past, say from www.olddomain.com to www.newdomain.com, then you need to make sure that all the link power is transferred from www.olddomain.com to www.newdomain.com.

Carry out the following checks:

  • Make sure you have ownership, access and control of olddomain.com’s domain name server (DNS) and will continue to do so.
  • Use a server header checker to check that all URLs on olddomain.com 301 redirect to their equivalents on newdomain.com. You can check your site's redirects

To be clear, you want the following to be happening:

301 redirect code

Eg:

  • www.olddomain.com will 301 permanently redirect to: www.newdomain.com
  • olddomain.com will 301 permanently redirect to: newdomain.com
  • www.olddomain.com/somepage.html will 301 permanently redirect to: www.newdomain.com/somepage.html
  • olddomain.com/somepage.html will 301 permanently redirect to: newdomain.com/somepage.html

Do this check for olddomain.com, www.olddomain.com and any other subdomains (like blog.olddomain.com) that were used on olddomain.com. You could also:

  • Check a sample of deep URLs, eg, www.oldomain.com/folder/folder/deeppage.html
  • Find olddomain.com’s most important inbound links and check they redirect appropriately. Find these links with Google Webmaster Tools.
  • On your server header reports, look for the report ‘301 Moved Permanently’ as shown in bold in the server header report below:

#1 Server Response: http://www.wtatour.com/player/ caroline-wozniacki_2257889_12631
HTTP/1.1 301 Moved Permanently Server: AkamaiGHost Content-Length: 0 Location: http://www.wtatennis.com/player/caroline- wozniacki_2257889_12631 Date: Sun, 17Jul 2011 14:39:54 GMT Connection: keep-alive

  • Watch out for multiple redirects, eg, with this pattern (where A, B and C are different URLs):

A -> B -> C

Change these to two separate 301 redirects like this:

A -> C

B -> C

Google can handle any number of 301s for different pages. But Google can’t handle many multiple redirects from A to B to C to D, etc. Get to four and you have almost no chance of Google following them.

So it’s a good idea never to get close to four.

If your domain name is going to change

If you are planning to change your website’s domain name in the future, you need to make sure that both users and search engines following old URLs find the same pages they would have before the name change.

This keeps users happy. And it preserves all the previous domain’s link power, redirecting it to the new versions of the pages that previously received it.

(Note: redirects don’t quite preserve 100% of link power but you’re unlikely to notice any loss.)

Make sure all olddomain.com links have a 301 permanent redirect to their equivalent newdomain.com URL. So:

301 redirect code - all urls

Eg:

  • www.olddomain.com will 301 permanently redirect to: www.newdomain.com
  • olddomain.com will 301 permanently redirect to: newdomain.com
  • www.olddomain.com/somepage.html will 301 permanently redirect to: www.newdomain.com/somepage.html
  • olddomain.com/somepage.html will 301 permanently redirect to: newdomain.com/somepage.html
  • blog.olddomain.com/somepage.html will 301 permanently redirect to: blog.newdomain.com/somepage.html

Test your 301s on a development site.

Make sure all internal links go to the new domain address. Once moved, send in your own spider to check the results for old and broken links.

Open a new GWT account for the new domain.Let Google know about the intended move using the ‘change of address’ form on GWT at Configuration > Change of Address.

If your domain change is part of a rebuild or redesign then make the move first, wait until you’re sure it’s worked and then do the rebuild. This will help diagnose anything that goes wrong or right.

Update links on your social profiles eg, Facebook, Twitter, LinkedIn, G+ and email signatures.

Check all your tracking tools are reconfigured for the new domain, including: Google Analytics.

If you've questions about any other technical aspects of SEO, please let us know using the comments below.

More Technical SEO for Profit

This article is part of a series on technical SEO, based on extracts from Mark Nunney’s best-selling e-book, SEO for Profit:

1) How to optimize URLs for search engines and people
2) How to get your pages are found and preserve their link power
3) How to optimize your code for search engines
4) Tracking response for SEO with Google Analytics
5) Technical SEO checklist

Get a free 7-day trial

A subscription to Wordtracker's premium Keywords tool will help you to:

  • Generate thousands of relevant keywords to improve your organic and PPC search campaigns.
  • Optimize your website content by using the most popular keywords for your product and services.
  • Research online markets, find niche opportunities and exploit them before your competitors.

Take a free 7-day trial of Wordtracker’s Keywords tool