Keeping a Page out of Google’s Index
Contents
- 1Keeping a Page out of Google’s Index (Avoid Indexation)
- 1.1Password-Protecting your server directories [Best Option for Secure Information]
- 1.2Robots Txt Directive [Not Recommended]
- 1.3Indexing Directive Meta Tags [Best Option for Non Secure Content]
- 2Removing a Page from the Index (Deindexing a page)
- 2.1Additional Advice
Keeping a Page out of Google’s Index (Avoid Indexation)
There may be many reasons for keeping a page out of google’s index. Below are some of the best ways to keep such content out of google’s index:
Password-Protecting your server directories [Best Option for Secure Information]
Typically it’s a good idea to keep your content hidden in password protected directories if you don’t want unauthorised users to see it, or search engines to crawl it. Web Crawlers typically cant access content hidden in pass word protected directories.
Robots Txt Directive [Not Recommended]
Using the “disallow” command would normally stop google from crawling the resource that you would like to keep out of the index. However a word of warning here – if that url is linked to from anywhere that google CAN crawl, it WILL index the URL:
Indexing Directive Meta Tags [Best Option for Non Secure Content]
The best way to keep search engines out of content without password protection is to use the “noindex” meta tag on a pages code. This will keep a page completely out of the index, regardless if it has links.
Warning: If you use the Robots.TXT directive and the “noindex” directive, google will index the url – as these cause a conflict. You are basically telling Google:
“Don’t crawl this page or directory”
So google won’t read the “noindex” on the page and end up indexing the url if a link to it exists from a source google can crawl. Here is a practical example:

That’s Matt Cutts Robot.txt file. Now see (click for full view):
That is a test page Matt did a while ago. His meta level rule doesnt work, because of the robots.txt disallow, and thus the URL is indexed:

Removing a Page from the Index (Deindexing a page)
The best way to remove a page or directory from Google is to do this via webmaster tools.

- Login to Webmaster tools
- Pick the site profile you want content removed from
- On the left click on “Google Index”
- Click on Remove URLS
- Click on “New Removal Request”
- Enter URL or directory you want removed
- Click on “Yes remove this page”
- Click on “Submit Request”
Additional Advice
Your removal request is only valid for 90 days – so ideally use one of the ways suggested above for keeping the content out of the index. For virtual folders, you may want to try the x Robots method.


Pingback: 3 Things I Learnt From Spamming Matt Cutt’s Blog | RefuGeeks()
Pingback: Search Results pages - Should we "No Index" them in Robots txt files? | RefuGeeks()