Feeds

Hotlinkers stealing Google Image Search traffic

PerformancingAds
Submitted by redrose on August 23, 2007 - 10:26pm in

Google Image Search indexes hotlinked images (including images already in their index from the original sites) as belonging to pages on the hotlinker's site. This diverts image search traffic away from the original image creator's website to the often ad-filled website of the hotlinker. I've even seen these hotlinked images outrank and replace the original images in the image search results.

Blocking hotlinking with .htaccess does not prevent this indexing from occurring, and despite being informed about this massive problem, Google hasn't done anything about this except requiring a DMCA for each individual hotlinked image (which many webmasters apparently don't do, leaving the hotlinkers with millions of hotlinked images to use on their sites.)

If you want to see the scope of this problem from just one of these many commercial hotlinking sites with millions of hotlinked images, search for the following in Google Image Search (with no quotes):

You are here: Random Image


I have a different twist on this issue

I've been using Flickr to host about half of the images in my blog posts. I use htaccess to block hotlinking from my domain and have blocked google image search from my site for most of this year.

However, the images that I put on my blog posts from Flickr have been showing up on Google Images, even though I've blocked outside searches from my Flickr account.

I've been trying to figure out how to get those Flickr images off of Google Image search (and of course those folks coming to my blog that way). I hadn't even considered that some splog type creature out there might be using it in some other way.

A Bad Situation

Redrose:

I realize what you're saying, or at least think that I do, but the question is what should Google do? A few years ago, the solution would have been simple, hotlinking was not a common practice so you just make sure to show the image on the site it appears on. However, with sites like Photobucket, Flickr and ImageShack making hotlinking images the norm, that isn't an option. Most people would rather have their blog or site how up in the image search than their Flickr account.

It is entirely possible that multiple sites could be hotlinking the same image completely legally and Google has to take a best guess about which is the original and place it in the results. It's not fair and, apparently, quite often wrong, but it is all that Google can do in their situation.

From a copyright standpoint, Google can't act without a DMCA notice lest it face legal challenges from the spammer. Google has to protect its legal interests.

One thing that you can do is report these sites as being spam. It is much faster and easier than a DMCA and seems to work well in cases where the site is doing something against standard SEO practices. You can report them here:

http://www.google.com/contact/spamreport.html

One thing I don't understand and hope you can elaborate on is how exactly .htaccess blocking does not work. If the site can not load the image, why would Google index a broken file? That seems odd to me. If you could shed some light on that, I might be able to offer up some other suggestions.

Typetive:

It is very likely that the problem you're experiencing is due to this kind of hotlinking. It might not be a spammer in every case, but someone is hotlinking the image without permission and that page is being indexed. I would take a look at your raw access logs and see if you can find anything out of the ordinary. It shouldn't be too hard to identify. You can then use .htaccess or scripts to block the site from accessing the image, or simply send them a regular cease and desist.

However, a DMCA notice might not be appropriate in this case since a copy of the material doesn't reside on their server and their host isn't actually hosting the content. A standard abuse report should be adequate though, most legitimate hosts do not tolerate this sort of thing.

If you need any help looking through your logs, let me know. I might have some tools that can help!

Re: A Bad Situation

This problem has happened to me even though I have had effective .htaccess hotlink protection enabled for years. The google crawler does not check to see if inline-linked images actually show up on the webpage or if they instead generate a 403 Forbidden error from the hotlink protection.

I've had to keep having my images removed from hotlinking sites, where the hotlinkers are free to associate any alt text keywords they please (including adult keywords) with my images, which not only allows my hotlinked images to show up for some strange or lucrative searches, but could also get my images filtered as adult content.

Here is what I've seen happen:

1. An image (original-site.com/rose.jpg) on the page (original-site.com/my-pics.html) will be indexed in Google Image Search.

2. A hotlinker will then create an inline link to this image (original-site.com/rose.jpg) on a page on his or her site (hotlinker-site.com/stolen-pics.html). Google will then reindex this same image as also belonging to the hotlinker page, and when you click on it in the image search results, it takes you to the hotlinker's page.

If the same image (same url and everything) is already indexed as belonging to a page (especially when that page domain is the same as that of the image), then Google should not reindex that image for a second time later, especially when the image domain is different from that of the hotlinker page domain.

Also, image hosting sites should have an "allowed image-linking sites" option for each account to prevent the unauthorized hotlinking of images (which can also use up the account's allowed monthly bandwidth).

My sites are ad-supported, and Google Image Search not only brings me lots of traffic, it is also a way to market my photos. When my hotlinked photos in Google Image Search outrank and/or replace my original photos and take image searchers to other pages instead of my own, it harms me financially. I'm very upset that I'm left to deal with this endless, growing problem all on my own without any help from Google.

Well, that's just plain stupid...

I have to say that the fact Google is indexing 403 images the same as 200 images is the dumbest thing I've heard about Google since I woke up a few hours ago. That is insanely stupid.

It would stand to reason that, if Google were doing its job, it would make sure that the site was actually displaying said image because not only is that a copyright issue, but also a user experience one. Users will not like being directed to a site where the image can't be found.

As I said above, Google is pretty much hands tied on the copyright front by the DMCA and will always require a notice to deal with alleged infringements. However, you can still report the results as spam, and I would do so. Not only is it easier, but results from the spam form often get entire sites banned from the search engine, fixing the problem more long term.

I can definitely see why this is an issue. I'll have to see about covering it sometime next week. For the moment though, the best thing you can do is report spam results as you see them and file DMCA notices as appropriate. This is clearly an area where Google needs to do some work, as with 302 referral spam and proxy hijacking.

It's getting a bit too easy to game Google.

wow

it's amazing how a second party can reap the rewards of the original website

403 Forbidden Images / HTTP Referrers

Image requests returning the 403 Forbidden HTTP status code will not show up in Google Images (at least not that I know of). In fact, if the server issues an HTTP 403 error code, then no image will be sent. Instead a text/html error message is sent... unless of course someone specifically modified their server to return the image anyway, which would be rather absurd (it would be the functional equivalent of entering an incorrect password but having the system grant access regardless!)

Keep in mind that most of the .htaccess hot-linking prevention methods employ referrer blocking. That is, they block requests (give a 403) for images if the referring page is something other than an authorized site i.e. your site. Knowing this... the crux of the issue is that Google is very unlikely to see such 403 errors, because their robot does not send accompanying HTTP REFERRER values with requests. This means that, although Google will crawl an image on your site even if it was "discovered" through a "hot-link" from another (unauthorized) Website, your server will never know where Google's robot found references to that image. Hence, your server won't know if access should be denied (403) or granted (200). If Google were to send HTTP Referrer values with each request, then the .htaccess hot-link image blocking would be a complete success.

Although it is a technical impossibility to actually prevent this problem from occurring (at least until Google send HTTP Referrers), we have been pretty successful in minimizing the collateral damage for the majority of our clients. When the traffic is being "stolen" through Google images, there is at least 1 HTTP request made to the server hosting the image. That is a great time to employ some sanity checks :)

Regards...
Darrin J. Ward

Hotlinking

With your google account join "webmaster" which is one of their programmes, in there you can setup site indexes for your site etc, but you can also set if you want google to index the images on the site for google images.

Hotlinking - How to break it and still have my images indexed?

I am facing a similar problem for some of my sites. My Google image traffic is being diverted to other sites who have hotlinked to my images. I would like my images to stay in Google image index. What changes should I make in .htacess to restore my traffic? Thanks for your help.

Why Google not tweaking its algorithm to prevent this ?

I noticed recently drop in traffic for some of my sites and I found that the drop was in the traffic from Google image search. When I checked further I noticed that Google image search was showing my images, but when clicked on them, another site opened up. After doing a little research I found that it was because of hotlinking to images on my site. I wonder why Google is allowing this flaw in their algorithm!

Google image search and .htaccess - does it work?

I've been looking for an explanation into using .htaccess to prevent image hotlinking, but still allowing google to index the images in the image search.
By using .htaccess to block hotlinks, it seems the googlebot, or imagebot (or whatever it's called) can still find these images and display them as thumbs. There's no way any search engine will store the images, so why doesn't .htaccess work on this occasion??
I'd love to know the answer!

thats a big issue and google

thats a big issue and google should take care of dis

I think it is an absolute joke

As a designer, it makes no sense at all that Google or, anyone feels it is ok to take my work in order to display it anyway they feel fit. I create images for (a) website. Where the (images) should be shown (only). That is the (idea). If I was creating work for the whole planet to do whatever with, I'd be a manager at Taco Bell instead and on the side spend oodles of money on Graphic Design software just so I can give my work away for free.

If I go to the beach, take my shoes off so I can go play in the water, it does not mean that anyone can take off with my shoes. It is stealing!!!!

I am tired of hearing lame brain reasons that since I put images up on the net, it makes it automatically ok to do whatever with my work. If I can only have a minute with people that think like this.... Than again, I these are probably the same type of people that apply graffiti to signs, cars and buildings where ever they go, simply because they are touchable by anyone whom passes.

One smeg head..

One smeg head was linking to my images so i just change them all to mess up his site.. :)

You can tell google via the webmaster tools not to pick up the images.. i dont know if it works or not though.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <h2> <h3> <h4> <img> <div> <a> <em> <strong> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote> <span> <table> <td> <tr> <caption> <th> <hr> <pre> <br> <p> <object> <param> <embed> <strike>
  • Lines and paragraphs break automatically.
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
4 + 2 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.