OpenCrypt - Membership Software
Sale now on! 10% off everything!
Username:
Password:
Register / My Account / Forgotten Password?
 Home  Features and Benefits  Online Demonstration  Purchase  Our Services  Help and Support  my.OpenCrypt 
 Welcome  New Visitor  Latest NewsRSS Feed Company Information  Press/Testimonials  Frequently Asked Questions  BlogNew Content
Optimising your secure content for Google
Wednesday 22nd October 2008

An important consideration when securing your content is how this will affect your search engine optimisation and pagerank. After all, you work on great content so you want people to be able to find it - they may have to pay to view it but you still want them to be able to find it. If Googlebot (the tool Google uses to crawl and index web pages) can't view your secure content, it can't index your content and your web site may not be ranked as highly as it should be.
 
This article discusses how to enable Googlebot to access your secure content, the article also covers how to enable Googlebot to login to your OpenCrypt membership software 'PHP Login Interface' which enables you to control what secure content Google can see, and enables you to track Googlebot's usage of your website.
 
Google and the .htaccess pop-up login prompt

If you use a .htaccess pop-up prompt to secure your content then it is not possible to reliably detect the visitor is Googlebot, however the .htaccess file can check Googlebot's IP address or hostname against a stored list and if it matches, allow Googlebot to access the secure content. The issue with doing this is a user could falsify their own IP information in order to trick your system into thinking they are Googlebot. Google recommend using the Googlebot IP address to check the authenticity of the visitor, but their advice is to detect the visitor's hostname from the IP address, then detect the IP address for the hostname and compare the IP addresses to double check the visitor isn't providing false headers.
 
Reference:
http://www.google.com/support/webmasters/bin/answer.py?answer=80553

 
To allow Googlebot to access your .htaccess protected secure content, place the following at the end of your .htaccess file:
order deny,allow
deny from all
allow from googlebot.com google.com
satisfy any

Using this is not recommended on sites where security is of importance!
 
One issue with allowing Googlebot to access your secure content is the 'Cached' page feature, you may have noticed when searching on Google next to the search results a small link for 'Cached', this takes you to a version of the web site stored on the Google server. Google rarely stores many pages for a site or all the images, but it can be useful if a web site goes offline or is very slow. If you allow Googlebot to access your secure content, the 'Cached' link may provide a method for visitors to view your secure content without even visiting your web site!
 
To stop Google from caching your pages, place the following HTML tag in your page header between your <head> and </head> tags:
<meta name="googlebot" content="noarchive">

This tag only removes the 'Cached' link for the search results, Google will continue to index the page and display a snippet.
 
Reference:
http://www.google.com/support/webmasters/bin/answer.py?answer=35306

 
Optimising OpenCrypt's 'PHP Login Interface' for Google

If you use OpenCrypt's 'PHP Login Interface' to secure your content we can optimise your secure content for Googlebot. We are focusing on OpenCrypt's login interface but anyone who has a custom PHP login system should be able to adjust this code to suit their needs.
 
OpenCrypt users, simply place the following code after your require "login.php"; statement (this is usually in your /oc/header.php file):
if (($login_successful!="1") && ($dbusername=="")) {
  if (stristr($envbrow,"googlebot")) {
    if ($envip!="") {
      $envaddr = gethostbyaddr($envip);
      if (stristr($envaddr,"googlebot.com")) {
        $envaddrip = gethostbyname($envaddr);
        if ($envip==$envaddrip) {
          $login_successful = "1";
          $result = "4";
          $dbusername = "googlebot";
          $input_username = $dbusername;
          $header_html = "<meta name=\"googlebot\"
           content=\"noarchive\">";
        }
      }
    }
  }
}

Note, you will need to include the $header_html variable in your HTML headers to display the 'noarchive' tag so Googlebot doesn't offer a cached version of the page. To repeat what is written above but to ensure you don't miss it; one issue with allowing Googlebot to access your secure content is the 'Cached' page feature, the 'Cached' link may provide a method for visitors to view your secure content without even visiting your web site!
 
Here is the same code with line by line explanations:
 
if (($login_successful!="1") && ($dbusername=="")) {
Avoid unnecessary checks for logged in users.
 
  if (stristr($envbrow,"googlebot")) {
Useragent includes googlebot text.
 
    if ($envip!="") {
Check IP address is present.
 
      $envaddr = gethostbyaddr($envip);
Get hostname for IP address.
 
      if (stristr($envaddr,"googlebot.com")) {
Hostname includes googlebot.com text.
 
        $envaddrip = gethostbyname($envaddr);
Use the hostname to detect the IP address.
 
        if ($envip==$envaddrip) {
Check the users IP matches the googlebot.com server IP to verify authenticity.
 
References:
http://googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.html
http://www.google.com/support/webmasters/bin/answer.py?answer=80553

 
          $login_successful = "1";
          $result = "4";
Success, let googlebot access your secure content.
 
          $dbusername = "googlebot";
          $input_username = $dbusername;
Set a username for googlebot to track usage and control what content can be viewed.
 
          $header_html = "<meta name=\"googlebot\"
           content=\"noarchive\">";
Prevent Google from caching your page so users can't view the secure content via the 'Cached' link on Google - very important! This tag only removes the 'Cached' link for the search results, Google will continue to index the page and display a snippet.
 
Reference:
http://www.google.com/support/webmasters/bin/answer.py?answer=35306

 
        }
      }
    }
  }
}
Close the if statements..
 
Tracking Googlebot's Usage of Your Website

Once you've setup the above code to work with your 'PHP Login Interface', simply create an account in your OpenCrypt system with the username 'googlebot'. When Googlebot visits your web site you will be able to see what pages Googlebot has accessed via the 'Statistics' system. You can of course setup a subscription specifically for the 'googlebot' account to control what content Google can see, or you can use if statements to detect the 'googlebot' username and display different content based on that.
 
For example:
if ($dbusername=="googlebot") {
  print "Some content just for Google";
}

Of course, be very careful displaying content just to Google because your ranking can be penalised, for example if you were to detect Googlebot and display blocks of keywords which weren't visible to general website users.
 
Suggestions for Limiting Content Displayed to Google

If you are concerned about Google viewing all of your secure content you could consider limiting what content is displayed. For example, the OpenCrypt version 1.7 Article Manager add-on provides a method for securing articles for specific subscription groups, this facility can be customised to display article snippets to non-registered users and could easily be advanced to display for example, the first 500 words of an article to Googlebot to enhance your rankings. Another suggestion would be to display every five words out of ten, this would display half the text of the article for Googlebot to rank, but it wouldn't make much sense to a human visitor - of course this may cause Google to penalise you.
 
Reference:
http://www.google.com/support/webmasters/bin/answer.py?answer=66355

 
Share this article:
Delicious Stumble It!
Stumble It!
 
<- Membership business models
 
Customer Showcase: A-Z-Animals.com ->
Articles
Subscribe to RSS Feed
Website Pre-Launch Checklist
ionix in 2010
PHP Security Tips - Part 1
OpenCrypt 1.8 New Feature Highlights
Video Conversion API
We're on Twitter!
IP Location API
Integrating OpenCrypt's PHP Login Interface
Developing Mobile Web Sites
Service Recovery
Building relationships with your customers
Doing the unexpected
PHP/JavaScript World Map with Continent and Ocean Selection
We've Moved!
Customer Showcase: A-Z-Animals.com
Optimising your secure content for Google
Membership business models
AJAX Calendar with PHP and mySQL
PHP Function for Reciprocal Linkback Checking
AJAX Tree Menu with PHP - Revisited
What is membership software?
Login form designs and inspiration
PHP subscriber counter with Addicted to Feedburner class and Feedburner Awareness API
AJAX Tree Menu with PHP
AJAX data listings engine with PHP and mySQL (BETA)
Join the ionix Team!
Press Release: OpenCrypt 2.0 in Development!
PHP: Basic functions for quickstart PHP framework
Welcome to the OpenCrypt Blog!
Visitor Comments
Kerstin-Edelgard 10 out of 10
Space Holder
I cannot believe this will work!
Space Holder
Space Holder
Jessicaved 10 out of 10
Space Holder
Wow! Thank you very much! I always wanted to write in my site something like that
Space Holder
Space Holder
OpenCrypt Team 10 out of 10
Space Holder
Update: OpenCrypt version 1.7 now includes the 'PHP Login Interface' search engine optimisation for Googlebot (Mobile, Image, Ads and Mediapartners), Yahoo Slurp!, SearchMe (Charlotte), MSN (msnbot) and Live.com (livebot).
Space Holder
Space Holder
Average Rating: 9.51 out of 10 (53 Votes)
Space Holder
Submit Comment
Space Holder
Article Rating:
Space Holder
Your Name:
Space Holder
Your Comment/Response:
Space Holder
Space Holder
Space Holder

Follow us on Twitter

Copyright © 1999 - 2010 ionix Limited. Affiliates, Contact Us

Powered by OpenCrypt Membership Software, Backups by myRepono Website Backup