Correct me if I’m wrong on this, but I’m relatively sure that I’ve identified a patterned mistake that Google’s algorithm makes.
Here’s how it goes:
- You write a really interesting article and it makes the front page of Digg
- Your article is provocative enough to draw a steady stream of comments for the good part of 1 week
- You gain dozens if not hundreds of solid, editorial backlinks to your article from reputable sites
- A week after publishing, your article disappears from Google’s index, while many of the articles referencing yours remain
So I’ve noticed that with my last five articles to make the front page of Digg (i.e. a pattern), the article disappears from Google’s index completely, even on an exact search for the article title:
The interesting thing to note is that on such a search, 70-100% of the first 10 SERP results return articles that reference the original. In such a case, you’d think that Google would clearly know the source of the article.
But apparently some penalty filter gets applied. Now, you might think that it’s a duplicate content penalty. But hear me out, I don’t think so.
Here’s my theory:
I think that Google will often put a penalty on a web-page that changes it’s content too much. In the example I used above, the source article received over 300 comments. In other words, the content changed drastically from when Google first indexed the article to when Google subsequently indexed the article multiple times in response to the 100’s of backlinks that the article received.
So my theory is basically that too many comments, which, in human terms, is almost always a good sign, can actually hurt your site from an algorithmic perspective (both in terms of content dilution and content transformation).
Google’s algorithm balances a number of factors, and if backlinks were the only indicator, then articles that get widespread attention, hundreds of backlinks and front page Digg recognition would clearly be at the top of any exact-match title search.
Since a large volume of comments is normally a sign of vitality (unless you’re getting spammed), Google needs to find a way to reward such vitality rather than punish. Unfortunately, despite claims of a looming “true AI” algorithms still make critical mistakes and punish good online phenomena.
Suggestion: Create an html tag that publishers can use to identify *primary* content and another tag to identify *secondary* content. This way, the transformation of content that the comments section naturally produces can be differentiated from the content that originally drew such a reaction in the first place.