What To Do When People Steal Your Blog Content

In a recent comment, Sylvia Forester asked

What can we do to stop Bitacle from stealing our posts and making money off of them?

Rather than responding in the comments, it seemed like a good topic for a full blog post. Copyright and IP law are much too large a topic to cover extensively here, but I can provide a few thoughts on where to start.

I haven’t looked at Bitacle previously, but with a quick scan of a couple pages it appears to me that they do include a link back to the original content when they repost material. This may in fact be a benefit to your blog, as people who use Bitacle for search may find you for the first time and become regular readers… There are a number of sites that I allow to republish content from the TypePad Hacks blog in order to reach a wider audience. They send a fair bit of traffic and I don’t begrudge them a few advertising dollars in exchange. On the other hand, it is possible that your reputation could be harmed by spam blogs harvesting your posts and republishing them on sites that contain offensive or dangerous material.

An important question to ask yourself before taking action is why you object to your content being reused: is it because someone else may be making money from your content or because you don’t want your personal brand to be diluted by appearing in multiple places online? This will help you frame the tone of your response. You should also ask yourself whether your reputation or brand is helped or harmed by broader distribution… If the republishing site contains proper attribution and/or a link back to your site, they may be doing you more good than harm even if they make a buck or two in the process. Remember, Google makes advertising dollars when they list you in their search engine and you wouldn’t want them to stop listing you!

If you prefer to keep your content exclusive to your own site, here are a few options you can exercise.

Step One: Email the site that is republishing your content and politely ask them to remove your content from their site. I find that people respond much better to a request than to a threat, so it is important to start out with an email, phone call or letter that is respectful but firm in tone. This is not a step you want to exercise when you are angry or tired.

As more and more companies and individuals are republishing blog content from RSS feeds, you might want to keep a copy of a stock email that you can send out whenever needed. This will save you a fair amount of time.

Once you have taken this step, give them at least a week or two to comply… In most cases, the response will not be instant. Whoever is using your content is going to have to take the time to look through their feeds, archives etc. and even when happy to comply, it probably won’t be at the top of their priority list. Remember that your blog is probably one of hundreds or thousands being excerpted or republished. In your initial contact, you might want to suggest a deadline and request that they contact you with a followup email when they have acted upon your request.

Step Two: If the first contact does not get results, send a cease and desist letter threatening legal action if the recipient continues to ignore your request. Although such letters are usually draughted by a lawyer, you can find many examples online that could be tailored to your situation. In many cases, the letter itself will get results without having to go to trial, but if it doesn’t you’ll have to decide whether or not a trial is worth it to you. In most cases, it will be expensive to take legal action even if you win.

If your content has been formally registered with the US Copyright Office, you will be entitled to sue for legal fees. Even if you have not formally registered copyright, your content is still protected under copyright law. In fact, you don’t even have to label the work as copyrighted in order to seek protection. The main benefit of formally registering copyright is the ability to sue for legal fees, but it is very important to note that even if you win a judgement against someone, that does not guarantee that they will actually pay you. You must register your content three months prior to bringing suit if you wish to sue for legal fees.

The only exception to automatic protection under copyright law is if you have chosen to license parts of the content through a Creative Commons license, in which case that license will grant specific rights based on your choices.

Step Three: This article on the Learning Moveable Type blog provides a good overview of how to use the  DMCA (Digital Millenium Copyright Act) to protect your content by filing a complaint through Google. You can file a DMCA complaint through the Adsense team if the site uses Adsense, or file a general DMCA complaint if not. You might want to include a reference to filing a DMCA complaint through Google in your cease and desist letter.

Step Four: If all the above fails and you still feel strongly that heads must roll, you’ll have to actually bring suit against the offending party in court. At this point, you’ll want to hire a lawyer to handle the proceedings. Personally, I feel that in most cases an actual law suit is both overkill and a game of diminishing returns… but of course, many feel differently.

Step Five: Most sites which republish content draw from RSS feeds rather than hand-harvesting content from individual posts… The best way to protect against your content migrating to other sites is to provide a partial RSS feed or no RSS feed at all. BUT— to my thinking, this is really cutting off your nose to spite your face. Sure, it prevents most content theft, but it also limits your readership and exposure.

There’s been a great deal of debate as to whether to offer full or partial RSS feeds of sites in order to curtail article harvesting. The basic gist of the discussion is that full feeds are good for readers but bad for security, while partial feeds keep content more secure but can be frustrating for readers. It’s my feeling that blogs should be more about the reader than the author if they wish to succeed.

Some people like partial feeds because they can scan a lot of headlines quickly. Myself, I favor full feeds. If everyone was as good as Brian Clark at writing headlines and starting their post with engaing copy, I might be more into partial feeds. But in most cases, I prefer to see the whole post so I can give the author the benefit of the doubt when a potentially interesting post starts off a bit slow.

When I have to click through to an actual website or blog in my browser, I get annoyed and eventually stop reading the blog altogether. The only time I want to click through is to leave a comment, and I would be way happier if I could comment on a post from within my RSS reader. The whole point of RSS for me is to be able to collect the info I want to read in one place and manage it by saving the articles I found valuable or marking them as something I wish to link to or comment on, etc. Dropping or limiting the RSS feed makes the entire experience of a blog less usefull and less welcoming to me.

When you punish your readers in order to discourage content pirates, you are effectively treating your readers as though they too may be bad people. Partial feeds discourage your reader from getting your full message, from linking to your content appropriately and from joining the discussion in the comments… take a look at this post by Seth Godin about a site that has made doing business with them almost impossible based on a few bad experiences they had with former customers.

21 thoughts on “What To Do When People Steal Your Blog Content

  1. Why not just offer only truncated feeds? Sell the sizzle, make them click for the steak. If the “scrapers” are legit scrapers, you get good traffic, and backlinks on every post you make, which helps your PR and search engine placement. Which gets you more traffic, better pay for your text links, advertisers… how do you lose?

  2. I’ve seen a couple of really nice programs on the Web that help you learn if anything’s been stolen from you. One of them is http://www.copyscape.com and the other one is new. The author posted a link on my forum. http://www.content-cop.com. Tried it. Does the job. We’ll see what comes up next. ‘Cause I’m tired of getting my content stolen.

  3. True, I oversimplified. The point I was making is that if you decide to incorporate other people’s full-text feeds, do due diligence and contact them to be sure they are okay with it. Some are, some aren’t.

  4. Fair use is a bit more complicated than just letting you use 300 words. It involves the amount used as well as the reason for the use, the commercial aspects of the use, etc. I’m sure Nolo has a book or even online guide.

  5. The sites that carry full text from my blogs either contacted me in advance to ask if I would allow the articles to be republished, or are sites that I contacted to offer republication rights… in both cases, contact was made in advance. Also, none of these sites republish all my posts, just the ones that I offer or the ones that fit their editorial slant. In most cases, there will be at least a little bit of rewriting involved to make the article fit better.

    Performancing is a perfect example: When Nick put out a call for authors, I emailed and offered him some of the articles from one of my blogs. Not all the articles wind up on Performancing, and when they do, there’s usually a bit of rewrite involved. It’s a good relationship that works for both of us, but it’s a lot more like a syndicated column than syndicating a feed.

    Syndicating entire feeds without contacting the author(s) first is not only wrong legally but also amounts to nothing but a splog. There’s a big difference between spam blogs that use content without permission and blog networks which aggregate content by permission.

  6. @jtpratt:

    Basically, fair use allows you 300 words from any collected works, overall. If you apply the term “collected works” to a blog, then you can never print more than 300 words from it, total, without permission. So I beg to differ about the term “steal” when someone reprints a full-feed. Summarizing other people’s content is one thing, republish it full-feed is another. It’s not justified just because you link back.

    That doesn’t mean that everyone minds, but I want someone to actually ask me if they plan to profit off the hard work I put in writing articles. Still, if you are republishing that many feeds, you might do due diligence and contact the owners of the sites you are republishing and see how they feel.

    From a reader point of view, if I visit your site and see a full-text article, I’m not necessarily going to know that you’ve republished it. So why would I follow any link? I don’t agree with your hypothesis. Republishing full feeds without permission, explicit or implicit, is simply not justified. (I hold to this viewpoint so strongly that I decided not to write an ebook about how to do this, that someone requested of me.) I give people the permission to republish some of my content, but certainly not all of it. Similarly, if I publish articles at an article directory, I’ve given my implicit permission to republish.

    The argument, in my opinion, is equal to saying that the door to someone’s house was open, and because it was raining outside, you decided to go in.

  7. There is definitely a fine line. And it’s ok no matter which side of it your on, as long as what you perceive as ‘fair use’ of your RSS feed is respected. There are still many bloggers out there that vehemently stay away from all advertising – and despise it with a passion. They see any kind of profiteering as a dilution of the ‘pure information’ they have published. There are those that wouldn’t be blogging at all if there wasn’t money to be made. And then there is everyone in between.

    I have a site that aggregates a dozen+ feeds and 100+ articles a day. It benefits me, and it benefits the sites I aggregate. I am not stealing, and I respect the wishes of any feed owner that contacts me.

    How it benefits me:
    – it adds content to my site
    – it drives more traffic to my site
    – I of course monetize what I can with adsense

    How it benefits the feed owner:
    – I link back to their site for every post (more backlinks for them = greater pagerank for them)
    – I have a master page with all their posts and link back to their site with description of what it’s about (increased branding)
    – they can place advertisements in their feeds, and they’ll show up on my site increasing their revenue

    The relationship of how the feed is produced and how it’s aggregated and used is tricky at best. Both parties have to be aware of their options.

    I’ll give you some examples…

    I aggregate feeds with both partial and full posts. Full posts of course are great as I get more content on my site. But a one line post (like digg.com posts) of course greatly increases the ‘bounce rate’ of users visiting – and then quickly leaving within 30 seconds. So, when I choose a feed to aggregate – I read it for awhile to see if their posting format will fit my site.

    Many of the feed owners place either random or intermittent ads in their RSS feeds, which are anything from text links to full banner ads. It doesn’t bother me that their ads now appear in my site – because they are finding a way of monetizing their content as well. As long as it doesn’t deter from the format of my site, I will never care if feed owners place ads in their feeds, but this is still semi-taboo with many users who despise ads in feeds.

    In the feeds that I aggregate I’ve found that many feed owners are getting smarter with their posts, and what they’ll do is put the name of the blog and the post author at the top of the post, and then put links (back to their site) to comment or reply, or even similar tags and categories. I don’t mind any of this either – as long as I’m still getting quality content from their feed. I’m sure some of my visitors click those links and leave my site to go to theirs. But the trade-off is worth it.

    I’ve also seen a really weird side effect of all this though…and it’s when an RSS feed is aggregated into another site, and then that site’s feed gets aggregated (and that site is aggregated, and that site is aggregated) that problems occur. I call it the ‘regurgitation factor’.

    Let’s hypothetically say that on my site I aggregate the feed for Gizmodo. I like their content and it fits my format. I aggregate 6 other tech-type feeds as well. So people that read my RSS feed get posts from 6 tech sites I like + my original posts. My hybrid feed should be able to pull in some good subscribers, and results in sending both visitors and subscribers to the feeds I aggregate and link back to. Then – bigbadtechsite.com starts aggregating my full feed into his site. So a users reads an article about the new Treo on bigbadtechsite, and clicks through to my site, and then has to click through back to Gizmodo. Imagine if the bigbadtechsite.com feed is aggregated into yet another site? I think if the ‘regurgitation factor’ is more than one – you’ve got a problem.

    I know lots of people will have tons of issues with doing this at all – and all kinds of unscrupulous people will read this post and brainstorm ways to create the ultimate spam site based on only RSS feeds.

    Whatever you do – be considerate of the feed owners wishes, be professional, and be careful when either choosing feeds to include in your site – and in the ways that you publish and make available your own RSS feed.

  8. When Adsense responded to me, their text actually deterred from complaining. I felt as if I was doing something wrong. I thought they had a link to a DMCA template, but it’s just to the text of the act. So it’ll be nice to see Jonathan’s DMCA template.

  9. I’ve stated before how strange I think Google’s handling of Adsense complaints is considering that there is no clause in the DMCA that covers advertising networks. Since they technically profit directly from the infringement, they could be held liable for it, at least in part, DMCA be dammed.

    I’d wager, in large, that it is an attempt to shield themselves against lawsuits from unhappy partners that get their revenues yanked due to copyright complaints. Still, I don’t know how that would work.

    All in all though, it is the best way to handle Bitacle since they are located in Spain and are immune to regular DMCA notices.

    Still, remember that DMCA notices to Adsense MUST be faxed or mailed and MUST contain a handwritten signature. Very important.

    If anyone needs a DMCA template, I’ll gladly share mine.

  10. I used to do this in every single one of my posts before I started writing for clients. Someone suggested it looked cheesy, and I guess I got swayed and took it out. I actually used to use it see how quickly my pages were being indexed, but searching out thieves works too. I was going to write a plugin to insert this into the feed. If I ever find time, I’ll do it and announce it here. (It’s simple for WordPress, etc., but time is an issue for me.)

  11. One of my favorite blogs (Inhabitat) has recently added a new copyright notice at the bottom of all articles that come in via the feed. I’m not sure how they set this up, but I do know they are using FeedBurner to serve up the feed so it must be a service I haven’t noticed yet at FB. The notice is worded quite well:

    Copyright © 2006 Inhabitat. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at is guilty of copyright infringement. If you see someone deliberately infringing our copyright, please email legal@[blog].com to alert us.

    Although this wouldn’t stop people from hijacking content via RSS, it does make them look bad when they do.

    Also, if you worded it very specifically, you could use it to do a search to find offending sites.

  12. For those of you using WordPress, one experimental way of detecting unauthorized content use is by implanting a fingerprint in your content. I have just finished a wordpress plugin that inserts a digital fingerprint into your RSS feeds in addition to helping you monitor the blogosphere (via RSS search) for use of your digital fingerprint. The theory is that any site that uses your unique fingerprint has probably copied your content.

    Unlike AntiLeech (which is great!), this plugin does not aim to prevent spam, but attempts to make blog owners more easily aware of potential problems.

    The plugin is called Digital FingerPrint — [full disclosure: personal link]

  13. @typetive: while you’re right, Adsense still wants you to fill out a DMCA and snail mail it to them. At least that’s what they told me.

  14. I notice clicks over from Bitacle to my site, but I never investigated how they go there. Now I’m curious to find out.

    I too have struggled with scrapers that republish my full posts with photos (hotlinked, thankyouverymuch) – I’ve found that in the instances so far step one works just fine … if I can find a contact address on the site. (I had problems with coollect.com).

    The other piece of advice is if the site has ads, report the abuse. Adsense is great at suspending accounts when it’s a site made up of scraped content. Take away the money and you’d be amazed at how quickly they cease and desist. I’d put that right on par with step 1 … even if they remove your content, it’s doubtful that they have permission from all the other sites they’ve scraped from and should lose their profit from it.

  15. For those of you with WordPress, Owen Winkler just produced a plugin that helps deal with this situation too. I’d put it in around Step Three and move the rest down.

    His plugin is called AntiLeech.

  16. The first time Bitacle popped up on my radar screen, they had pulled an affiliate link, but I just checked and my Amazon links are intact. How about duplicate content? Should we be concerned about that?

  17. I agree with Filmstalker.. It happens everywhere and most of the time it gives you traffic. Searching or finding news and report it, is something else like Bitacle does. Bitacle just copies everything and monetize it. Not only the first 100 words, or just a title.. You should see it for yourself.. Enough solutions to solve this (htaccess, robots.txt) or to get your links spread in a smart way. We will see where it ends.

  18. I have two different experiences of “stealing” content. The first is the one you mention where other sites copy and paste my content onto their page word for word and then provide a link back. Although it bugs me, it actually does work out for me as these sites are reaching an audience I don’t normally get to, and they get me some click throughs.

    These are sites like Gerard Butler fan sites, or Korean film fan sites, etc.

    It’s annoying, but it’s helping me. For example when I ran a review of the Korean film The Host, Korea went mad for it, and my traffic that month doubled. There were several Korean sites with my entire review in them and a source link, and although it was annoying it was better to have that advertising than not.

    Sure it’s an extreme example, but with the Butler sites it’s the same issue but a much smaller scale. They’re still giving me a few people a day for a few weeks. Surely even that’s worth it?

    The second issue I’m having is a huge grey area. Most of the daily posts I make are reporting and commenting on news stories from other straight news sites. So, for example, I’ll comment on a press release or rumour carried by another site.

    Sometimes I find these stories before the rest of the film community does and I write about it first. Within a few hours the story is appearing on a number of other film sites, some crediting you as a source or “via”, while others are either giving no source or ignoring the fact that they found it through your article.

    In one case I even know a site that is looking at my stories, rewriting them, and then just quoting my source. This site is known within some areas of the film blogging community for doing just that.

    It’s very difficult to prove and a very grey area. After all there’s no need to say that the story was found on a certain site, just what the original source is.

    Yet it’s the same outcome, these sites are using you as an aggregator, using your hard work sourcing and writing content to produce their own, and therefore make money.

    This is much harder to deal with, and is perhaps impossible. I certainly don’t know what to do about it.

Comments are closed.