| We
all know search engine optimization is a tricky business.
Sometimes we rank well on one engine for a particular keyphrase
and assume that all search engines will like our pages,
and hence we will rank well for that keyphrase on a number
of engines. Unfortunately this is rarely the case. All the
major search engines differ somewhat, so what's get you
ranked high on one engine may actually help to lower your
ranking on another engine.
It is for this reason that some people
like to optimize pages for each particular search engine.
Usually these pages would only be slightly different but
this slight difference could make all the difference when
it comes to ranking high.
However because search engine spiders crawl
through sites indexing every page it can find, it might
come across your search engine specific optimizes pages
and because they are very similar, the spider may think
you are spamming it and will do one of two things, ban your
site altogether or severely punish you in the form of lower
rankings.
So
what can you do to say stop Google indexing pages that are
meant for Altavista, well the solution is really quite simple
and I'm surprised that more webmaster's who do optimize
for each search engine don't use it more. It's done using
a robots.txt file which resides on your
webspace.
A Robots.txt
file is a vital part of any webmasters battle against getting
banned or punished by the search engines if he or she designs
different pages for different search engine's.
The
robots.txt file is just a simple text file
as the file extension suggests. It's created using a simple
text editor like notepad or wordpad, complicated word processors
such as Microsoft Word will only corrupt the file.
Here's the code you need to insert into
the file to work:
Red
text
is compulsory and never changes, while the blue
text you will have to change to suit the file and
the engine which you want to avoid it.
User-Agent:
(Spider
Name)
Disallow: (File
Name)
The User-Agent is the name of the search
engines spider and Disallow is the name of the file that
you don't want that spider to spider. I'm not entirely sure
if the code is case sensitive or not (maybe someone can
let me know) but I do know that the code above works, so
to be sure to check that the U and A are in caps and likewise
the D in disallow.
You have to start a new batch of code for
each engine, but if you want to list multiply disallow files
you can one under another. For example -
User-Agent:
Slurp (Inktomi's spider)
Disallow: internet-marketing-gg.html
Disallow: internet-marketing-al.html
Disallow: advertising-secrets-gg.html
Disallow: advertising-secrets-al.html
In the above code, I have disallowed Inktomi
to spider two pages optimized for Google (internet-marketing-gg.html
& advertising-secrets-gg.html) and two pages optimized
for Altavista (internet-marketing-al.html & advertising-secrets-al.html).
If Inktomi were allowed to spider these pages as well as
the pages specifically made for Inktomi, I run the risk
of being banned or penalized, So it's always a good idea
to use a robots.txt file.
I mentioned earlier that the robots.txt
file resides on your webspace, but where on your webspace?
The root directory that's where, if you upload your file
to sub-directories it will not work. If you want to block
certain engines from certain files that do not reside in
your root directory you simply need to point to the right
directory and then list the file as normal, For example
-
User-Agent:
Slurp (Inktomi's spider)
Disallow:
folder/internet-marketing-gg.html
Disallow: folder/internet-marketing-al.html
If you wanted to disallow all engines from indexing a file,
you simply use the * character where the engines name would
usually be. However beaware that the * character won't work
on the Disallow line.
Here's the names of a few of the big engines,
Excite - ArchitextSpider
Altavista - Scooter
Lycos - Lycos_Spider_(T-Rex)
Google - Googlebot
Alltheweb - FAST-WebCrawler/
Be sure to check over the file before uploading
it, as you may have made a simple mistake, which could mean
your pages are indexed by engines you don't want to index
them, or even worse none of your pages might not be indexed.
A little
note before I go, I have listed the User-Agent names of
a few of the big search engines, but in reality, it's not
worth creating different pages for more than 6-7 search
engines. It's very time consuming and results would be similar
to those if you created different pages for the only the
top five. So more is not always best.
Written by David
Callan
|