Indexing control
You can stop the indexing of your pages performed by SeznamBot in two ways:
- Stop all crawling performed by SeznamBot, which will stop indexing the content as well (for details see Crawling Control). Nevertheless, the URL of the page itself can still appear in the search results if there are some links that point to it from other pages.
- Let SeznamBot explore your web and download your pages, but not index specific pages or follow specific links. In this case, if you disallow indexing of a page, that page will not appear in the search results at all.
This page describes the latter option. You can control indexing and links processing on a page-by-page basis or even control links processing of individual links. However, please do note that it takes some time for the crawler to revisit your page after you've made your changes. Only then it can register any new restrictions and apply them.
Blocking Whole Web Page - Robots Meta Tag
If you want to keep robots, including SeznamBot, from indexing your content or following the links on your whole web page, place the robots HTML meta tag in the head section of the HTML code of the page. Just specify the name and content attributes as described below:
- Value of the
name
attribute:robots
- Valid values of the
content
attribute:
noindex |
The content of the web page will not be indexed. |
index |
The content of the web page will be indexed (this is the default). |
nofollow |
The links will not be followed. |
follow |
The links will be followed (this is the default). |
all |
Allow all (same as index, follow ). |
Note: You can use two values (index
or noindex,
and follow
or nofollow
) separated by a comma.
Example
<html> <head>
<meta name="robots" content="noindex, nofollow">
<title>Web Page Title</title> </head> <body> ...
You should use the robots meta tag when, e.g.:
- You don't want your page to appear as a result of a web search.
- You do not have full access to your webserver and cannot use a
robots.txt
file to control robots access through it. - You are using paid links or advertisements on your page that robots shouldn't follow.
CAUTION
If you want to use the robots meta tag, since it is contained directly in the HTML source of the page, make sure to also allow SeznamBot to actually download the page (see Crawling Control). If you restrict indexing through robots meta tag and at the same time disallow downloading through the robots.txt
file, SeznamBot won't download the page and thus won't be able to see and obey the indexing restriction.
Blocking Individual Links - Nofollow Attribute
You can also apply the nofollow directive at a more fine-grained level through the rel="nofollow"
attribute of the <a>
HTML tag, which applies to the specific link only. In this case, SeznamBot will process the text of the link (if not restricted through the robots meta tag) but not the content of the linked page, or any pages further linked solely from it. The crawler will simply stop exploring the web beyond the restricted link. This doesn't mean that the linked page won't be accessed at all (e.g. from a different link without the nofollow
attribute), only that the individual link will not be followed.
It is recommended to use the rel="nofollow"
attribute with links that possibly cannot be trusted, e.g. when:
- Your web page contains a blog with public comments.
- There can appear user content on your web page that you cannot fully control (e.g. guestbooks, discussion forums, etc.).
- You want to link to a page that you don't fully trust.
Example
The link to the<a href="http://www.example.com"
rel="nofollow"
>
example page</a>
will not be followed by any complying robot.