Friday, 5 July 2013

SEO Question - A Little About Robots.txt

Robots.txt - Robots.txt is a text (not html) file you put on your site to let the know search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but in general search engines obey what they are asked not to do. It is important to clarify that robots.txt is not a way from preventing search engines from crawling your site (i.e. it is not a firewall, or a kind of password protection) and the fact that you put a robots.txt file is something like putting a note “Please, do not enter” on an unlocked door – e.g. you cannot prevent thieves from coming in but the good guys will not open to door and enter. Thats why we say that if you have really sensitive data, it is too naïve to rely on robots.txt to protect it from being indexed and displayed in search results.

If we talk in a simple way it is one of the best way to let search engine know which files and folders on your Web site to avoid is with the use of the Robots meta tag Ourple. But since not all search engines read metatags, the Robots matatag can simply go unnoticed. A better way to inform search engines about your will is to use a robots.txt file.SEO Question

Meta Robots Tags, About Robots.txt and Search Indexing Robots


Entry
Meaning
User-agent: *
Disallow:
Because nothing is disallowed, everything is allowed for every robot.
User-agent: mybot
Disallow: /
mybot robot may not index anything, because the root path (/) is disallowed.
User-agent: *
Allow: /
For all user agents, allow.
User-agent: BadBot
Allow: /About/robot-policy.htmlDisallow: /
The BadBot robot can see the robot policy document, but nothing else.All other user-agents are by default allowed to see everything.This only protects a site if "BadBot" follows the directives in robots.txt
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /private
In this example, all robots can visit the whole site, with the exception of the two directories mentioned and any path that starts with private at the host root directory, including items in privatedir/mystuff and the file privateer.html
User-agent: BadBot
Disallow: /
User-agent: *
Disallow: /*/private/*
The blank line indicates a new "record" - a new user agent command.
All other robots can see everything except any subdirectory named "private" (using the wildcard character)
User-agent: WeirdBot
Disallow: /links/listing.html
Disallow: /tmp/
Disallow: /private/
User-agent: *
Allow: /
Disallow: /temp*
Alllow: *temperature*
Disallow: /private/
This keeps the WeirdBot from visiting the listing page in the links directory, the tmp directory and the private directory.
Allother robots can see everything except the temp directories or files,but should crawl files and directories named "temperature", and shouldnot crawl private directories. Note that the robots will use thelongest matching string, so temps and temporary will match the Disallow, while temperatures will match the Allow.
Bad Examples - Common Wrong Entries
use one of the robots.txt checkers to see if your file is malformed
User-agent: googlebot
Disallow /
NO! This entry is missing the colon after the disallow.
User-agent: sidewiner
Disallow: /tmp/
NO! Robots will ignore misspelled User Agent names (it should be "sidewinder"). Check your server logs for User Agent name and the listings of User Agent names.
User-agent: MSNbot
Disallow: /PRIVATE
WARNING! Many robots and webservers are case-sensitive. So this path will not match any root-level folders named private or Private.
User-agent: *
Disallow: /tmp/
User-agent: Weirdbot
Disallow: /links/listing.html
Disallow: /tmp/
Robots generally read from top to bottom and stop when they reach something that applies to them. So Weirdbot would probably stop at the first record, *.
Ifthere's a specific User Agent, robots don't check the * (all useragents) block, so any general directives should be repeated in thespecial blocks.

4 comments:

Anonymous said...

I'm extremely inspired along with your writing skills as smartly as with the structure for your blog. Is that this a paid topic or did you customize it yourself? Anyway keep up the excellent high quality writing, it'ѕ rare to looκ a nice weblog lіke thіs onе noωaԁayѕ.
.

Also νіѕіt my web pagе :: seo optimization free

Anonymous said...

At this time it sounds like BlogEngine is the best blogging platform
out there right now. (from what I've read) Is that
what you're using on your blog?

Stop by my web site: Start Making Money

Anonymous said...

Hi there, I wish for to subscribe for this blog to take latest updates,
so where can i do it please assist.

Also visit my site: lauren woodard

Anonymous said...

Paragraph writing is also a fun, if you know afterward you can write if not it is complex to write.


Look at my weblog :: lauren woodard