Surf Disguised as Google Bot

Sure will be one or the other when "Googling", especially in the "Images" search mode, noticed the oddly contained pictures and texts from access guns pages in the search index. Such results are notable in that one does not get further on the landing page without previous login or user account.

But as the Google bot then came to the contents, finally he did so in the search index?

The answer are HTTP headers, such as the browser ID differ identify the Google bot as a bot and of normal web surfers. Some Site Admins have therefore a browser switch tinkered certain bots to pass through, but requires normal web surfers an application.

This makes the content of a site for the Google bots available without so equal for the general release. Background for such behavior is, of course, that some sites owe their popularity to search engines respectively their findability, but derives its revenues by ABO, pay-per-view or similar models, which definitely require a user login.

But even for those Browerweichen there are elegant way to circumvent this as a normal user. Surely you can manipulate the HTTP header of the browser directly, which means Firefox Extensions even goes pretty easy. The course assumes some knowledge and some handles.

It is better because of the way through a web proxy such as Be-The-Bot , must they not change the browser and the access to certain sites are "more or less" anonymous through the proxy.

Source

http://www.avivadirectory.com/bethebot/

Send to Kindle
Leave a comment

2 comments.

  1. As interesting as the theory is: it must be already at a fairly Unversed SiteAdmin act when he granted access to Google in this way is actually protected content. Because Google has actually for protected content, the ability to define appropriate access. So Google may sign themselves when visiting and crawl the content.

    Nevertheless, an interesting article, and perhaps actually uses yes the one or the other Site Admin only one type "HTTP header" Soft. But I would recommend to anyone, not this intervention. 😉

  2. Now that's a well-written article, thanks. You have to process first. Generally I find the blog easily accessible.

Leave a comment


Note - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>