Google’s Pirate Penalty Adds Another Reason to Not Plagiarize Online

Skull and CrossbonesIf you didn’t already know that it wasn’t a good idea to plagiarize content on the Web, you missed a lot of reasons to not do it.

Not only is it ethically dubious, almost always illegal and prone to serious duplicate content penalties from Google, but it’s also a quick ways to make enemies of people you should probably be trying to work with rather than fighting against.

However, if none of those reasons are compelling to you, then Google has provided yet another. Late last week, the search engine announced that it will begin factoring in copyright notices it receives, a so-called “pirate penalty”.

What does this mean? Simply that Google will begin demoting sites that are the subject to a large number of copyright notices. This makes it so that copyright holders, in addition to possibly being able to shut down your site, they can also seriously hurt your search engine ranking more broadly.

Though the move is largely targeted at websites that traffic in pirated content, plagiarists and spammers should be given at least some pause for concern as this new clampdown could easily impact them as well.

The devil is in the details and Google has been very stingy with those to date. However, there is a little that we do know.

The Basics of the Penalty

Under the Digital Millennium Copyright Act (DMCA), Google is obligated to remove links to allegedly infringing content from its search engine when it receives a valid copyright notice. Google has always complied with that though, historically, the process has been slow and ungainly.

However, Google recently overhauled their DMCA process, asking most filers to use either a Web-based form or a special backend for extremely heavy filers. The result was a streamlined process that enabled quicker takedowns and, most importantly, better record-keeping.

Google showcased these new records when it launched its transparency report that showcased not only how many takedowns it has handled, but who has been filing them and against what domains.

Still, many copyright holders felt that Google wasn’t doing enough to help keep infringing material out of its index and that it was profiting off of pirated content through Adsense and Adwords. Initially, Google rebuked those allegations, saying that it had no way of knowing for certain if material was infringing before receiving a notice, but surprisingly, announced the shift in policy last week.

The idea is simple, since Google can now track which domains receive the most copyright notices, Google can now factor that information in when determining search results and it plans on doing just that. It will use this to bump down sites with a large number of notices and, through proxy, promote legitimate sources.

The move is clearly targeted at sites that encourage file sharing, file downloading and other forms of large-scale copyright infringement. This includes cyberlockers, Bittorrent trackers and communities dedicated to file swapping. But while Google has said that larger, legitimate user-generated content sites, such as Facebook, Twitter, IMDB, etc. have nothing to fear, smaller sites may. The reason is that the number of notices isn’t the only piece of information Google tracks about allegedly infringing domains.

Keeping Track of Percentages

One of the interesting, and often overlooked, items in the transparency report is that Google also monitors the percentage of reported URLs versus the total URLs indexed for a site.

Normally, even for a piracy-focused site, that number is very low, less than 5%. This means that a site heavily dedicated to copyright infringement and doing nothing to help remove such works, less than 1 out of 20 links are reported to Google in a given time period. Often times, that number is actually much less, less than 1% or even .1% in some cases.

This is because many of these sites have such a large number of URLs that even with very active reporting and enforcement, the vast majority simply can not be reported.

However, a site with much fewer URLs, say a small business site with only a handful of pages, that percentage could be much higher. If you have 12 pages on your site and one gets reported to Google, that’s a slightly over 8% rate, more than any of the major copyright infringing sites on the Web.

Likewise, if you have a blog with 1,000 URLs and copy/paste ten articles that get reported, that puts you at 1% of your URLs being reported, which, once again, is more than most sites dedicated to piracy.

In short, from a percentage standpoint, it doesn’t take much to make a small-to-medium-sized site look like a piracy haven.

While this doesn’t mean that Google will penalized based upon this information, Google hasn’t said how they will determine which sites get the penalty, but given that Google is keeping close track of it and displaying the information, it seems to be a hint that it might at least be factored in.

Bottom Line

Most likely, this penalty isn’t going to effect any site that is legitimate or takes copyright issues seriously. If you produce your content or work to run a community that respects the law, you’ll probably be fine.

Still, it’s worth taking a few precautions. If you accept works uploaded by your visitors, consider registering a DMCA agent for your site and putting the information prominently on your home page. You want to be the point of contact for all copyright complaints, not Google. Also, consider registering for Google Webmaster Tools so you can be notified of any DMCA notices filed against you and respond appropriately.

If you do that and don’t knowingly host infringing material, you can probably rest comfortably that this penalty won’t impact you.

But if you plagiarize or otherwise encourage illegal activity, there’s a very good chance that it could come back to bite you, especially if copyright holders decide to go to Google first in a bid to hurt your search engine rankings.

As such, you can throw this on the laundry list of reasons why plagiarism online is bad and bad for your site, not that most webmasters needed another reason.