To start, you probably don’t want deleted pages to show up in the SearchEngines’ indices. There are a few ways to do this.
Robots Meta Tag
Contributed by BarkerJrThis patch adds the robots meta tag to the header of deleted pages. This works good, but some SearchEngines don’t support it, and will still show up in some indexes. Sending 404 (see below) works on all SearchEngines, but can sometimes display incorrectly due to a “feature” in Internet Explorer (see the note below it).
diff -ur wiki.orig/actions/header.php wiki/actions/header.php
--- wiki.orig/actions/header.php Tue Feb 15 21:47:56 2005
+++ wiki/actions/header.php Tue Feb 15 21:51:43 2005
@@ -9,7 +9,7 @@
<head>
<title><?php echo $this->GetWakkaName().": ".$this->PageTitle(); ?></title>
<base href="<?php echo $site_base ?>" />
- <?php if ($this->GetMethod() != 'show' || $this->page["latest"] == 'N' || $this->page["tag"] == 'SandBox') echo "<meta name=\"robots\" content=\"noindex, nofollow, noarchive\" />\n"; ?>
+ <?php if ($this->GetMethod() != 'show' || !$this->page || $this->page["latest"] == 'N' || $this->page["tag"] == 'SandBox') echo "<meta name=\"robots\" content=\"noindex, nofollow, noarchive\" />\n"; ?>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<meta name="keywords" content="<?php echo $this->GetConfigValue("meta_keywords") ?>" />
<meta name="description" content="<?php echo $this->GetConfigValue("meta_description") ?>" />
--- wiki.orig/actions/header.php Tue Feb 15 21:47:56 2005
+++ wiki/actions/header.php Tue Feb 15 21:51:43 2005
@@ -9,7 +9,7 @@
<head>
<title><?php echo $this->GetWakkaName().": ".$this->PageTitle(); ?></title>
<base href="<?php echo $site_base ?>" />
- <?php if ($this->GetMethod() != 'show' || $this->page["latest"] == 'N' || $this->page["tag"] == 'SandBox') echo "<meta name=\"robots\" content=\"noindex, nofollow, noarchive\" />\n"; ?>
+ <?php if ($this->GetMethod() != 'show' || !$this->page || $this->page["latest"] == 'N' || $this->page["tag"] == 'SandBox') echo "<meta name=\"robots\" content=\"noindex, nofollow, noarchive\" />\n"; ?>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<meta name="keywords" content="<?php echo $this->GetConfigValue("meta_keywords") ?>" />
<meta name="description" content="<?php echo $this->GetConfigValue("meta_description") ?>" />
To install a patch, place it in a file in your wiki's directory and execute: patch -p1 < filename
Sending 404 Not Found
Contributed by DotMGTicket:258
Modify ./handlers/page/show.php like this :
- if (!$this->page)
- {
- print("<p>This page doesn't exist yet. Maybe you want to <a href=\"".$this->Href("edit")."\">create</a> it?</p></div>");
- }
Note: On IE, there is a number of bytes required, and if the length of body is less than this limit, IE displays its own default content. But normally, the page should always display the content we expect (This page doesn't exist. Maybe you want to create it).
Disabling robots with robots.txt
Contributed by DotMGThe Robots Meta Tag suggested by BarkerJr has the inconvenient that friendly spiders must load the page before knowing that they cannot archive them. Here is another solution, in which RobotsDotTxt robots.txt will instruct them they aren't allowed to access pages.
The idea is to use another url beginning with the terms nobot[^\./]*[^/])$ $1/
RewriteRule ^nobot\/(.*)$ wikka.php?wakka=$1 [QSA,L]
RewriteCond %{REQUEST_URI} !=/favicon.ico
RewriteCond %{REQUEST_URI} !=/robots.txt
RewriteRule ^(.*)$ wikka.php?wakka=$1 [QSA,L]
</IfModule>
These are used if mod_rewrite is enabled. The line ##RewriteRule ^nobot\/(.*)$ wikka.php?wakka=$1 [QSA,L]## remove the ''nobot/'' in url. So ""http://wikkasite/nobot/HomePage/edit"" and ""http://wikkasite/HomePage/edit"" will point to the same location (for mod_rewrite enabled) . The other two pairs ""http://wikkasite/nobot.php?wakka=HomePage/edit"" and ""http://wikkasite/wikka.php?wakka=HomePage/edit"" also point to the same location. __Finally__: We modify the ""MiniHref()"" and Href() methods, so that for non-archivable pages, the URL used will contain nobot...
{
if (!$tag = trim($tag)) $tag = $this->tag;
if mod_rewrite enabled, and if method is not show, and if page not found in $this->config['nobot'],
we prepend nobot/ to link.
if ( ( (($method != ) && ($method != 'show')) || (stristr($this->config['nobot'], $tag))) && ($this->config["rewrite_mode"])) $tag = "nobot/$tag"; return $tag.($method ? "/".$method : );
}
returns the full url to a page/method.
function Href($method = , $tag = , $params = ) { $base_url = $this->config["base_url"]; //if mod_rewrite disabled, we use nobot.php?wakka=notallowedtorobots instead of wikka.php?wakka=notallowedtorobots if (!$this->config["rewrite_mode"]) { $base_url = preg_replace('/wikka\.php=$/', 'nobot.php=', $base_url); } $href = $base_url.$this->MiniHref($method, $tag); if ($params) { $href .= ($this->config["rewrite_mode"] ? "?" : "&").$params; } return $href; }%% __Todo__: If wikka is installed in a subdirectory, like http://wikkasite/wikkadir/HomePage, robots.txt will be changed to ##Disallow: wikkadir/nobot## For upgrading site, Bots already know the url http://wikkasite/HomePage/edit"", we must send the status moved permanently if a page is requested with that old url.
- wouldn't it be much easier to just rel="nofollow" a link that should not be followed by search Engines (<a href="revisions" rel="nofollow">revisions</a>) and do that for the edit, history, revision and referrers link by changing the actions/footer.php
- Links may be followed from another non-wiki site. You cannot expect other sites to add rel="nofollow". So, you also have to add "noindex, nofollow, noarchive" in a meta robot tag.
- The above mentioned nobot mod somehow breaks the loading of the editor images. The server gets trapped in a loop.
-
"GET /nobot/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/images/3rdparty/plugins/wikiedit/images/indent.gif HTTP/1.1" 403 1226 "http://wiki.xxxx.de/nobot/HomePage/edit" "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9"
- Any ideas how to fix this? --DaC
Sample robots.txt for Google & Yahoo
Googlebot and Yahoo! Slurp support wildcards in their robots.txt. As I prefer blocking the URL instead of sending a noindex meta tag (merely bandwidth reasons) I am using the below mentioned robots.txt to insure that only wanted parts are index by Google & Yahoo!. Another advantage of this robots.txt is the prevention of duplicate content (e.g. WikkaDocumentation or PasswordForgotten exists a million times on the web and it makes no sense to let it get indexed).User-agent: Slurp Disallow: /MenuConfig Disallow: /WikiName Disallow: /WikkaDocumentation Disallow: /WikkaReleaseNotes Disallow: /UserSettings Disallow: /TextSearch Disallow: /SysInfo Disallow: /PasswordForgotten Disallow: /InterWiki Disallow: /MyPages Disallow: /MyChanges Disallow: /FormattingRules Disallow: /CategoryWiki Disallow: /nobot Disallow: /SandBox Disallow: /*? Disallow: /*/edit Disallow: /*/history Disallow: /*/revisions Disallow: /*/acls Disallow: /*/referrers Disallow: /*/backlinks Disallow: /*/recentchanges.xml Disallow: /*/showcode Disallow: /*/raw User-agent: Googlebot Disallow: /MenuConfig Disallow: /WikiName Disallow: /WikkaDocumentation Disallow: /WikkaReleaseNotes Disallow: /UserSettings Disallow: /TextSearch Disallow: /SysInfo Disallow: /PasswordForgotten Disallow: /InterWiki Disallow: /MyPages Disallow: /MyChanges Disallow: /FormattingRules Disallow: /CategoryWiki Disallow: /nobot Disallow: /SandBox Disallow: /*? Disallow: /*/edit Disallow: /*/history Disallow: /*/revisions Disallow: /*/acls Disallow: /*/referrers Disallow: /*/backlinks Disallow: /*/recentchanges.xml Disallow: /*/showcode Disallow: /*/raw
CategoryUserContributions