Friday 5 July 2013

SEO Question - A Little About Robots.txt

Robots.txt - Robots.txt is a text (not html) file you put on your site to let the know search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but in general search engines obey what they are asked not to do. It is important to clarify that robots.txt is not a way from preventing search engines from crawling your site (i.e. it is not a firewall, or a kind of password protection) and the fact that you put a robots.txt file is something like putting a note “Please, do not enter” on an unlocked door – e.g. you cannot prevent thieves from coming in but the good guys will not open to door and enter. Thats why we say that if you have really sensitive data, it is too naïve to rely on robots.txt to protect it from being indexed and displayed in search results.

If we talk in a simple way it is one of the best way to let search engine know which files and folders on your Web site to avoid is with the use of the Robots meta tag Ourple. But since not all search engines read metatags, the Robots matatag can simply go unnoticed. A better way to inform search engines about your will is to use a robots.txt file.SEO Question

Meta Robots Tags, About Robots.txt and Search Indexing Robots


Entry
Meaning
User-agent: *
Disallow:
Because nothing is disallowed, everything is allowed for every robot.
User-agent: mybot
Disallow: /
mybot robot may not index anything, because the root path (/) is disallowed.
User-agent: *
Allow: /
For all user agents, allow.
User-agent: BadBot
Allow: /About/robot-policy.htmlDisallow: /
The BadBot robot can see the robot policy document, but nothing else.All other user-agents are by default allowed to see everything.This only protects a site if "BadBot" follows the directives in robots.txt
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /private
In this example, all robots can visit the whole site, with the exception of the two directories mentioned and any path that starts with private at the host root directory, including items in privatedir/mystuff and the file privateer.html
User-agent: BadBot
Disallow: /
User-agent: *
Disallow: /*/private/*
The blank line indicates a new "record" - a new user agent command.
All other robots can see everything except any subdirectory named "private" (using the wildcard character)
User-agent: WeirdBot
Disallow: /links/listing.html
Disallow: /tmp/
Disallow: /private/
User-agent: *
Allow: /
Disallow: /temp*
Alllow: *temperature*
Disallow: /private/
This keeps the WeirdBot from visiting the listing page in the links directory, the tmp directory and the private directory.
Allother robots can see everything except the temp directories or files,but should crawl files and directories named "temperature", and shouldnot crawl private directories. Note that the robots will use thelongest matching string, so temps and temporary will match the Disallow, while temperatures will match the Allow.
Bad Examples - Common Wrong Entries
use one of the robots.txt checkers to see if your file is malformed
User-agent: googlebot
Disallow /
NO! This entry is missing the colon after the disallow.
User-agent: sidewiner
Disallow: /tmp/
NO! Robots will ignore misspelled User Agent names (it should be "sidewinder"). Check your server logs for User Agent name and the listings of User Agent names.
User-agent: MSNbot
Disallow: /PRIVATE
WARNING! Many robots and webservers are case-sensitive. So this path will not match any root-level folders named private or Private.
User-agent: *
Disallow: /tmp/
User-agent: Weirdbot
Disallow: /links/listing.html
Disallow: /tmp/
Robots generally read from top to bottom and stop when they reach something that applies to them. So Weirdbot would probably stop at the first record, *.
Ifthere's a specific User Agent, robots don't check the * (all useragents) block, so any general directives should be repeated in thespecial blocks.

A Process To Create A Google Sitemap.xml File

On Here I am Going to Tell You,  How to Create a Google Sitemap.xml


1.Sign in to Google webmaster tools with your Google account at www.google.com/webmasters/sitemaps. If you don't have a Google account, you will need to create one.

2. Add your site to your webmaster tools account.

3. Verify your site, either by creating a blank html file with a certain name (generated by Google), or by  inserting a specific Meta tag in your site index Ourple.

4. Go to Sitemap Generator http://www.xml-sitemaps.com/ website and create a auto generator sitemap.

5. Save the sitemap file to desktop and then upload in root level of your website.

6. Go back to Google's webmaster tools, and click on the Sitemaps tab.

7. Click "Add a Sitemap".

8. Select General Web Sitemap on the drop down list.

9. Enter the Urls of you sitemap. Eg. Sitemap.xml Click on Add Web Sitemap.

10. That's it! It can take up to a couple of days for Google to download your new sitemap. Click Here