This tutorial will cover how to create a robots.txt file for your wordpress blog, website, drupal site, or static HTML site.
What is a robots.txt file? A robots.txt file is a text file you can place on your website to instruct robots on where to craw, or more importantly where not to crawl. This is important because unless you want all of the pages& files in your site to show up online in search engine results, you will want to learn how to create a robots.txt file.
Also, some SEO gurus have argued that having a robots.txt file can attract spiders & increase your search engine positioning.
So how does a robots.txt file work, and how do you create one on your website? Well the robots.txt is a simple file (mostly created using notepad) that includes a set of instructions for the search engines.
How to Create a Robots.txt File
These instructions can tell the spiders which pages they are allowed to crawl for indexing, and which ones they should not crawl. You can also give specific instructions for specific search engines, and you can include different commands.
Below is an example of common instructions used:
To allow All search engine spiders to crawl your site & to all ALL files to be indexed, use this command:
User-agent: * Disallow:
The command above means all spiders can crawl your site & they can include all files in their index. The * means “attention ALL spiders” & by leaving the Disallow field blank it means you are telling them they can crawl all files.
To instruct ALL search engine spiders to stay away from certain files (for instance your /images/ folder, you would simply use the following command:
User-agent: * Disallow: /images/
This instructs ALL robots that visit your site to not “crawl” or include all of the files in your /images/ folder to be included.
If you would like to include more folders, just keep adding more Disallow commands:
User-agent: * Disallow: /images/ Disallow: /PDF/
Create a Robots.txt File Using Notepad
To upload a create & upload a robots file to your site, open Windows Notepad. Then type the command in Notepad that you want to instruct for the Robots (use an example above).
Then save the name of the document as a “robots” and make sure it has the .txt file extension. Now, go to your website & import (or “upload”) the file. Then publish your website. After you publish your website, your robots.txt file should show up. To check, just type in www.YOURSITE.com/robots.txt and see if it show up. If it does then this should be working & it will prevent robots from viewing your files/folders that you do not want to be seen.
This is a great way to keep secret files, e-books, personal documents, PDF files, etc. from being indexed and placed on search engines.
Creating a Robots Meta Tag
Robot.txt files are a great way to prevent search engines from viewing entire files or folders, but what about keeping them out of individual web pages?
The solution is to use a special HTML Meta tag that will keep your webpage from showing up on search engines (such as Google, Yahoo, etc.).
To use an HTML Meta Tag to prevent you page from being indexed, simply type the tag below into your HTML code between the <head> tags.
<meta name=”robots” content=”noindex,nofollow” />
This will allow you to keep individual pages from being included in search engine directories.