The Robots Exclusion is a mechanism that works to limit the indexing of certain parts of a website, which an administrator or content provider does not want to be visited by robots for some reason. This method relies on cooperation from the Robot and is not guaranteed to work for every Robot. Simply stated, The Robots Exclusion Protocol is a method that allows Web site administrators to indicate which parts of their site should not be visited by the robot.
Content providers must learn about their needs for Robots Exclusion Protocol. Google works on constant improvement for content providers to strengthen their hold on the indexing process. Two new features added to Robots exclusion protocol makes it much more flexible and provide convenient ways to improve the detailed control you have with Google.
Firstly, if you know in advance that a page is going to expire, or you have a temporary page that will be removed at the end of the month, the best thing would be to allow the page to show in Google search results until it expires, then have it removed. The same would apply to certain pages that are available free for a week, but after that are put into an archive that users pay to access. Else, the result would be sheer frustration on the part of regular users of your website, who will try to find a page that appears on Google search results but is not available for them to access.
To make your job easier, Google has introduced a META tag that allows website administrators or content providers to inform whenever a page has expired and is required to be removed from the main Google web search results. The tag has been named "unavailable_after", and follows syntax similar to the other Robots Exclusion META Tags. Here is an example to illustrate the working of the new META tag: This is a removal request--
<META NAME="GOOGLEBOT" CONTENT="unavailable_after: 20-Aug-2007 10:00:00 EST">
According to the above, the specified page will be removed from the Google search results on the given time (Eastern Standard Time), and date. The date and time is specified in the RFC 850 format. The specified page will stop showing in Google search results, but continues to exist in the system. For complete removal of the page from Google, you should use the existing URL removal tool.
Secondly, Robots Exclusion protocol META tags work only for HTML pages. To control access to other types of documents, such as Adobe PDF files, video and audio files and other types, you can simply add any supported META tag to a new X-Robots-Tag directive in the HTTP Header used to serve the file.
Use these two new features to enjoy the flexibility for indexing and inclusion in Google's search results!
|