Quality Indicator Tip: NoIndex Your WordPress Archive Pages

ryancaldwell
December 8, 2007

I was just reading a fantastic interview with Matt Cutts over at Stone Temple Consulting.

Everyone even remotely interested in SEO and Google search should read that interview. It’s got priceless information, most of which is what I refer to as “2nd order quality indicators”. Quality indicators are on- and off-page factors that Google looks at when evaluating the value of your website.

The overwhelming impression that I get from this interview is that Google does indeed sum the total of all quality indicators, where some indicators (such as a strong link portfolio) are clearly more important than others, and weighted accordingly. Still, the fact that Google sums all quality indicators means that if you pay careful attention to the little details…well, the little details will add up and pay-dividends in the SERPS.

My hope is to document some of these 2nd order, minor quality indicators over the next few weeks, so stay tuned. Today, I’m offering you the following tip:

Add the NoIndex META tag to secondary, duplicate content, wherever it exists on your site.

For those of you who use WordPress, the vast majority of duplicate content is going to exist in your Category Archive pages. And because your Archive pages will most likely have stronger PR individual articles, Google is prone to make the mistake of thinking of the Category Archive content as primary. Thankfully, this can be fixed.

There’s a simple way to tell Google NOT to index your Archive pages and to see individual articles as the primary content (and I’d recommmend that everyone with a WordPress blog implement this measure).

In the header.php file of your WordPress theme, simply add the following code:

The good news is that your Archive pages will still retain PageRank…and continue to serve the role of “existence indicators” for your older articles.

Performancing specializes in identifying quality indicators and we can turn your blog into something special. Let us optimize your blog with our Blog Management Servicestoday.

Comments

pholpher says

December 10, 2007 at 1:11 pm

Nice interview with Matt Cutts. I’m glad Matt is okay with using the different meta tags to allocate PageRank on your site. SEOs used to wonder if that was okay because some of them thought that using nofollow on your own site might be interpreted by Google as manipulating their search results.
Markus Merz says

December 9, 2007 at 1:53 pm

> “It is almost impossible to have a .blogspot.com blog NOT rank well.”

A) I also have some other blogs. This is only a very fresh example.
B) If this would be the case then all SERPs would be full of blogspot sites on the 1st page

Let’s stay on the subject. Your tip “unless you get a bunch of links early on” is true. I wouldn’t say a bunch but some strong links definitely force Google to index early. Having a nice set of side kicks and linking back from there to your fresh site really helps to start a site.
nusuni says

December 9, 2007 at 12:29 pm

jonroth, the solution is to do “follow,noindex” for the meta tag. That tells the SEs “hey, don’t index these archive page, but follow the links on it” – which will let your individual post pages get indexed and get credit for the content. Otherwise you have internal content duplication – which is a nasty mess to clean up.

However, as with most things in SEO you really don’t have to worry about content duplication if you have a really powerful site or blog, like Problogger or blog herald or even performancing. Most blogs at that level get so many links to the individual posts that it doesn’t matter if the archives are indexed.

As far as the SEs only indexing archives early on for self hosted blogs – I do have to agree. Unless you set up a sitemap for the archives or unless you get a bunch of links early on, they probably won’t get indexed.
nusuni says

December 9, 2007 at 12:17 pm

Markus – you’re talking about a blogspot blog. It is almost impossible to have a .blogspot.com blog NOT rank well. I used to run a tech blog on blogspot (about 1 year ago), and within a few days it ranked #3 for sony device on Google – funny thing is it only had one article posted, no links, etc. It was ranked well because it leached off of blogspot’s good rankings. Plus I have a feeling google has a variable in place to give blogspot blogs a big boost- they don’t do nearly as well on other SEs.
Markus Merz says

December 9, 2007 at 8:11 am
> “Google, for example, often only indexes archive pages for young blogs”

That’s absolutely new to me. In fact I think it is not true. My experience is that Google pretty fast indexes relevant single articles for a subject.

This example is a very fresh project for a friend of mine and only articles are listed. And the page was indexed almost immediately and ranks very high for the keywords Fischbretter and Rolf Boscheinen. In fact also irrelevant secondary keywords (for the site) like Restaurant Clasenhof rank absolutely high directly from the beginning.

Look at the article structure and you will see why. All articles are content rich and a have absolutely relevant outgoing links. Absolutely no magic. Just classical article enhancement. Compare it to the other sites ranking lower and you will understand.

Check out:
- Better writing: Pronouns are evil (NEW: Now with examples!)
- Create a structure for your blog posts
jonroth says

December 8, 2007 at 10:52 pm

I’ll have to read the interview, but my question is whether this is a good idea for young blogs. Google, for example, often only indexes archive pages for young blogs for the first month or longer (depending on many factors). So if you NoIndex your archives, aren’t you screwing yourself in the search engines if your blog is new? I only ever listen to half of what Matt Cutts says, because some of it seems so ridiculous and self-sabotaging.
Markus Merz says

December 8, 2007 at 1:18 pm

“Great interview with a huge load of SEO details. An actual must read article for anybody interested in how Google indexes pages.”

That’s the note how I sent the interview to del.icio.us.
Markus Merz says

December 8, 2007 at 10:53 am

I am using the robot.txt file to disallow certain sub-folders (imprint, archive, contact, calendar).

Site wide available navigation links without refreshing editorial content are a good hint to ‘disallow’. Example:

User-agent: * Disallow: /kontakt/ Disallow: /archiv/ Disallow: /kalender/ Disallow: /impressum/
# Because of AdSense Content Analyse # Allows all pages User-agent: Mediapartners-Google Disallow:

Trackbacks

Identifying Traffic Drop Causes - Performancing says:

October 9, 2015 at 12:00 pm

[…] cause a huge amount of problems if not setup correctly. Make sure that the Googlebot is allowed to crawl the correct pages and parts of your site using the tester tool. Here’s a fantastic guide from YOAST on how to […]

Skip links

Reader Interactions

Comments

Trackbacks

Footer