Unintended Consequences: When Google Punishes Something Good

Correct me if I’m wrong on this, but I’m relatively sure that I’ve identified a patterned mistake that Google’s algorithm makes.

Here’s how it goes:

  • You write a really interesting article and it makes the front page of Digg
  • Your article is provocative enough to draw a steady stream of comments for the good part of 1 week
  • You gain dozens if not hundreds of solid, editorial backlinks to your article from reputable sites
  • A week after publishing, your article disappears from Google’s index, while many of the articles referencing yours remain

So I’ve noticed that with my last five articles to make the front page of Digg (i.e. a pattern), the article disappears from Google’s index completely, even on an exact search for the article title:

http://www.google.com/search?q=The+Ten+Worst+Job+Interview+Questions+Ever

The interesting thing to note is that on such a search, 70-100% of the first 10 SERP results return articles that reference the original. In such a case, you’d think that Google would clearly know the source of the article.

But apparently some penalty filter gets applied. Now, you might think that it’s a duplicate content penalty. But hear me out, I don’t think so.

Here’s my theory:

I think that Google will often put a penalty on a web-page that changes it’s content too much. In the example I used above, the source article received over 300 comments. In other words, the content changed drastically from when Google first indexed the article to when Google subsequently indexed the article multiple times in response to the 100’s of backlinks that the article received.

So my theory is basically that too many comments, which, in human terms, is almost always a good sign, can actually hurt your site from an algorithmic perspective (both in terms of content dilution and content transformation).

Google’s algorithm balances a number of factors, and if backlinks were the only indicator, then articles that get widespread attention, hundreds of backlinks and front page Digg recognition would clearly be at the top of any exact-match title search.

Since a large volume of comments is normally a sign of vitality (unless you’re getting spammed), Google needs to find a way to reward such vitality rather than punish. Unfortunately, despite claims of a looming “true AI” algorithms still make critical mistakes and punish good online phenomena.

Suggestion: Create an html tag that publishers can use to identify *primary* content and another tag to identify *secondary* content. This way, the transformation of content that the comments section naturally produces can be differentiated from the content that originally drew such a reaction in the first place.

9 thoughts on “Unintended Consequences: When Google Punishes Something Good

  1. Following the thread for a while now…

    > you gradually water down the original keyword relevance of the page?

    Agree with Martin. This was my very 1st thought. Esp. true if the original content just hits the minimum length of 200 to 300 words. Thinkable solutions:

    • Comments in pop-up?
    • A comment link (nofollow) to a duplicate page (noindex) where comments are open. Comment form is not directly available on article page.
    • Newspapers tend to have only the last n comments below an article. To read the rest you have to switch to ‘more comments’.

    > what if you were to edit a lot of those comments to work some of your main keywords into that additional content

    • You can delete comments but editing is only allowed if you comment your editing inside each comment.
  2. First let me be in awe! If you are getting multiple front pages to Digg then I KNOW I’m doing something wrong. I have yet to get front page #1.

    As for the page falling out of site, I have to say I like Martin’s theory. It seems as logic as anything else.

  3. Couldn’t you split your comments into multiple pages? That would be a simple fix for you, while Google thinks of some clever solution on their side.

  4. I think Martin Jamieson is probably right – I tend to close off comments on popular posts the minute they start getting repetitive but realize now that this is another good reason to do that.

  5. Could it be that as you get more and more comments (especially of the variety ‘great post, love your work’), you gradually water down the original keyword relevance of the page?

    The type of articles you’re talking about probably have way more content in the comments than in the actual article, so when Google rates the content, it’s possible that the content contained in the comments would be directly affecting the calculations by their algorithm… so not really a penalty, just a dilution effect.

    Out of interest, how relevant are those comments to your article (in terms of keywords) – what if you were to edit a lot of those comments to work some of your main keywords into that additional content… I would be interested to see if your articles climb in the rankings for those search terms again.

    …also, do any of those articles now rank well for ‘good post’

  6. Forgive me if I’m being a little naive, but surely putting the comments in their own div element with a specific ID which the googlebot is told to ignore would be the solution. Or is making comments on a blog indexable essential?

    You can flag parts of your page so that adsense ignores them when choosing ads for your page so why not googlebot?

  7. Seems if almost every result in the top 10 serp is pointing to the post then Google doesn’t need any help identifying the source.

    It shouldn’t be because of the spike in links, or at least I always understood that Google could recognize when an article naturally gathered a lot of links all at once via bait/digg versus unnaturally through spam?

    Could it be that Google is suspicious of the domain at all?

  8. Nice post. I’ve suffered from similar problems recently – a post which hit the front page of reddit, fark, and gained loads of backlinks shot to the top of the google rankings, and has now, 2 months later, dropped out of site.

    I like your suggestion of a secondary content tag and, over time, it could probably take off – especially if CMS developers like wordpress integrate it into their systems.

    But, it also strikes me that there is only a limited amount of CMS platforms, and the majority of pages that will attract lots of comments are underpinned by one of those CMSs. Given this – it can’t be too hard for Google to simply amend its algorithm to take into account sites built on these platforms.

  9. If you’re right, though, then Google will have to figure out a way to distinguish between “spam” content in the secondary section — because if this is actually what’s happening, that’s probably why. They’re trying to figure out a way to push spam down.

    Even gmail can’t do that perfectly yet.

Comments are closed.