Surfing disguised as Google Bot

Sure will be one or the other when "googling", especially in the "images" search mode, noticed that contained oddly pictures and texts of access guns pages in the search index. Such results are characterized in that one does not get further on the target page without prior login or user account.

But as the Google bot then came to the contents, finally he has yes in the search index?

The answer are HTTP headers, such as the browser id differ identifying it as a Google-Bot Bot and of the normal web surfers. Some site admins have therefore a browser switch tinkered certain bots to pass through, but required by normal web surfers log on.

This makes the content of a site for the Google bots available without in the same for the public release. Background for such behavior is, of course, owe some of the sites their popularity on search engines respectively their findability, but they generate their revenues by ABO, Pay-Per-View, or similar models, which definitely require a user login.

But even for such Browerweichen there are elegant ways to bypass them as a normal user. Surely you can manipulate the HTTP header of the browser directly, which means Firefox extensions even goes pretty easy. The course assumes some knowledge and some handles.

Better is because the way a web proxy such as Be-The-Bot , must they not change anything on the browser and access to certain sites are "more or less" anonymous through the proxy.

Source

http://www.avivadirectory.com/bethebot/

Send to Kindle
Leave a comment

2 comments.

  1. As interesting as the theory is: It must be already a fairly straight-ized Site Admin act when he granted access to Google in this way actually protected content. Because Google has actually protected content for the possibility to deposit the corresponding access. So Google can log into the visit itself and crawl the content.

    Nevertheless, an interesting article, and perhaps actually uses so one or the other Site Admin only one type "HTTP header" crossover. But I would recommend to anyone, not emulate this. ;-)

  2. Now that's a well-written article, thank you. You must process first. Generally I find the blog allowing quick access.

Leave a comment


Note - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>