Wiki source for RobotFriendly


Show raw source

To prevent web SearchEngines' spiders from crawling non-content pages, most of WikkaWiki's WikiEdit, PageHistoryInfo, etcetera pages include robots meta tags to prevent the spiders from indexing them. This keeps the databases of the SearchEngines cleaner, at least for your website. This page aims to help make your website more friendly to robots.

To start, you probably don’t want deleted pages to show up in the SearchEngines’ indices. There are a few ways to do this.


===Robots Meta Tag===
//Contributed by BarkerJr//
This patch adds the robots meta tag to the header of deleted pages. This works good, but some SearchEngines don’t support it, and will still show up in some indexes. Sending 404 (see below) works on all SearchEngines, but can sometimes display incorrectly due to a “feature” in Internet Explorer (see the note below it).

%%(php)diff -ur wiki.orig/actions/header.php wiki/actions/header.php
--- wiki.orig/actions/header.php Tue Feb 15 21:47:56 2005
+++ wiki/actions/header.php Tue Feb 15 21:51:43 2005
@@ -9,7 +9,7 @@
<head>
<title><?php echo $this->GetWakkaName().": ".$this->PageTitle(); ?></title>
<base href="<?php echo $site_base ?>" />
- <?php if ($this->GetMethod() != 'show' || $this->page["latest"] == 'N' || $this->page["tag"] == 'SandBox') echo "<meta name=\"robots\" content=\"noindex, nofollow, noarchive\" />\n"; ?>
+ <?php if ($this->GetMethod() != 'show' || !$this->page || $this->page["latest"] == 'N' || $this->page["tag"] == 'SandBox') echo "<meta name=\"robots\" content=\"noindex, nofollow, noarchive\" />\n"; ?>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<meta name="keywords" content="<?php echo $this->GetConfigValue("meta_keywords") ?>" />
<meta name="description" content="<?php echo $this->GetConfigValue("meta_description") ?>" />%%
To install a patch, place it in a file in your wiki's directory and execute: ##patch -p1 < //filename//##


===Sending 404 Not Found===
//Contributed by DotMG//
[[Ticket:258]]

Modify ./handlers/page/show.php like this :

%%(php;9)if (!$this->page)
{
$httpversion = isset($_SERVER["SERVER_PROTOCOL"]) ? $_SERVER["SERVER_PROTOCOL"] : 'HTTP/1.1';
header("$httpversion 404 Not Found");
print("<p>This page doesn't exist yet. Maybe you want to <a href=\"".$this->Href("edit")."\">create</a> it?</p></div>");
}%%
Note: On IE, there is a number of bytes required, and if the length of body is less than this limit, IE displays its own default content. But normally, the page should always display the content we expect (This page doesn't exist. Maybe you want to create it).


===Disabling robots with robots.txt===
//Contributed by DotMG//
The Robots Meta Tag suggested by BarkerJr has the inconvenient that friendly spiders must load the page before knowing that they cannot archive them. Here is another solution, in which [[RobotsDotTxt robots.txt]] will instruct them they aren't allowed to access pages.
The idea is to use another url beginning with the terms **nobot/** for each page not allowed to spiders. Ie, all links on the site will be changed from ""http://wikkasite/notallowedtorobots"" to ""http://wikkasite/nobot/notallowedtorobots""

__First__: [[robotsdottxt Robots.txt]] should be available with url ""http://wikkasite/robots.txt"". Its content should be :%%User-Agent: *
Disallow: nobot%%
__Second__: Create another page nobot.php near wikka.php, its content will be %%(php)<?php include ("wikka.php"); ?>%% This page will be used if mod_rewrite is disabled.
__Third__: Modify ./.htaccess like this :%%<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^(.*/[^\./]*[^/])$ $1/
RewriteRule ^nobot\/(.*)$ wikka.php?wakka=$1 [QSA,L]
RewriteCond %{REQUEST_URI} !=/favicon.ico
RewriteCond %{REQUEST_URI} !=/robots.txt
RewriteRule ^(.*)$ wikka.php?wakka=$1 [QSA,L]
</IfModule>%%These are used if mod_rewrite is enabled. The line ##RewriteRule ^nobot\/(.*)$ wikka.php?wakka=$1 [QSA,L]## remove the ''nobot/'' in url. So ""http://wikkasite/nobot/HomePage/edit"" and ""http://wikkasite/HomePage/edit"" will point to the same location (for mod_rewrite enabled) . The other two pairs ""http://wikkasite/nobot.php?wakka=HomePage/edit"" and ""http://wikkasite/wikka.php?wakka=HomePage/edit"" also point to the same location.
__Finally__: We modify the ""MiniHref()"" and Href() methods, so that for non-archivable pages, the URL used will contain nobot...
%%(php) function MiniHref($method = "", $tag = "")
{
if (!$tag = trim($tag)) $tag = $this->tag;
//if mod_rewrite enabled, and if method is not show, and if page not found in $this->config['nobot'],
//we prepend nobot/ to link.
if ( ( (($method != "") && ($method != 'show'))
|| (stristr($this->config['nobot'], $tag)))
&& ($this->config["rewrite_mode"]))
$tag = "nobot/$tag";
return $tag.($method ? "/".$method : "");
}
// returns the full url to a page/method.
function Href($method = "", $tag = "", $params = "")
{
$base_url = $this->config["base_url"];
//if mod_rewrite disabled, we use nobot.php?wakka=notallowedtorobots instead of wikka.php?wakka=notallowedtorobots
if (!$this->config["rewrite_mode"])
{
$base_url = preg_replace('/wikka\.php=$/', 'nobot.php=', $base_url);
}
$href = $base_url.$this->MiniHref($method, $tag);
if ($params)
{
$href .= ($this->config["rewrite_mode"] ? "?" : "&").$params;
}
return $href;
}%%
__Todo__: If wikka is installed in a subdirectory, like ""http://wikkasite/wikkadir/HomePage"", robots.txt will be changed to ##Disallow: wikkadir/nobot##
For upgrading site, Bots already know the url ""http://wikkasite/HomePage/edit"", we must send the status **moved permanently** if a page is requested with that old url.

~& wouldn't it be much easier to just rel="nofollow" a link that should not be followed by search Engines (<a href="revisions" rel="nofollow">revisions</a>) and do that for the edit, history, revision and referrers link by changing the actions/footer.php
~~& Links may be followed from another non-wiki site. You cannot expect other sites to add rel="nofollow". So, you also have to add "noindex, nofollow, noarchive" in a meta //robot// tag.

~& The above mentioned nobot mod somehow breaks the loading of the editor images. The server gets trapped in a loop.
~& %%"GET /nobot/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/indent.gif HTTP/1.1" 403 1226 "http://wiki.xxxx.de/nobot/HomePage/edit" "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9"%%
~& Any ideas how to fix this? --DaC

===Sample robots.txt for Google & Yahoo===
Googlebot and Yahoo! Slurp support wildcards in their robots.txt. As I prefer blocking the URL instead of sending a noindex meta tag (merely bandwidth reasons) I am using the below mentioned robots.txt to insure that only wanted parts are index by Google & Yahoo!. Another advantage of this robots.txt is the prevention of duplicate content (e.g. WikkaDocumentation or PasswordForgotten exists a million times on the web and it makes no sense to let it get indexed).

%%User-agent: Slurp
Disallow: /MenuConfig
Disallow: /WikiName
Disallow: /WikkaDocumentation
Disallow: /WikkaReleaseNotes
Disallow: /UserSettings
Disallow: /TextSearch
Disallow: /SysInfo
Disallow: /PasswordForgotten
Disallow: /InterWiki
Disallow: /MyPages
Disallow: /MyChanges
Disallow: /FormattingRules
Disallow: /CategoryWiki
Disallow: /nobot
Disallow: /SandBox
Disallow: /*?
Disallow: /*/edit
Disallow: /*/history
Disallow: /*/revisions
Disallow: /*/acls
Disallow: /*/referrers
Disallow: /*/backlinks
Disallow: /*/recentchanges.xml
Disallow: /*/showcode
Disallow: /*/raw

User-agent: Googlebot
Disallow: /MenuConfig
Disallow: /WikiName
Disallow: /WikkaDocumentation
Disallow: /WikkaReleaseNotes
Disallow: /UserSettings
Disallow: /TextSearch
Disallow: /SysInfo
Disallow: /PasswordForgotten
Disallow: /InterWiki
Disallow: /MyPages
Disallow: /MyChanges
Disallow: /FormattingRules
Disallow: /CategoryWiki
Disallow: /nobot
Disallow: /SandBox
Disallow: /*?
Disallow: /*/edit
Disallow: /*/history
Disallow: /*/revisions
Disallow: /*/acls
Disallow: /*/referrers
Disallow: /*/backlinks
Disallow: /*/recentchanges.xml
Disallow: /*/showcode
Disallow: /*/raw%%

----
CategoryUserContributions
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki