Configuration via .htaccess
See also:
Documentation: HtaccessConfigInfo.For documentation of what the current .htaccess files do, see HtaccessConfigInfo; here, we're going to look at what the main .htaccess file might do for us - while also getting rid of the separate .htaccess files in subdirectories.
Each item has a "Prerequisite" box: this specifies what is required in Apache's main configuration (httpd.conf) for the current or higher directory to enable the proposed directive(s) in the .htaccess file. Multiple bulleted items (list) imply 'or': one of these must be specified to enable the directive in an .htaccess file.
This page assumes we're working on Apache; other web servers may have similar mechanisms to exploit (even may use .htaccess files for that) but we're not dealing with that here.
Users working with a different webserver are welcome to add their own equivalent methods to this page or a separate one for their webserver!
This page is still unfinished; sections where more content will be added are indicated as such.
Efficiency
Default Apache behavior
Normally an Apache server has a DirectoryIndex directive in its server configuration (httpd.conf) that tells it which "index" file(s) to look for, in what order, when a user agent makes a request for a directory. Each of the file names specified will be tried in turn; if none of them is found (or none are defined), the directory content will be shown (if enabled) or a '403 Forbidden' response will be given if directory browsing is disabled (see Security options below).
Getting to Wikka's main file: wikka.php
Wikka runs operates completely through wikka.php, and it comes with an index.php file in its installation directory which does nothing but redirect to wikka.php - which in its turn will redirect to the declared home page if not given a page parameter. The problem with this method is that it assumes that Apache's configured search path actually contains not just the standard index.html but also index.php - not guaranteed to be the case! But even if it is, going first to index.php only to be redirected to wikka.php is rather ineffcient, because it means three browser requests just to get at the home page when it's requesting the installation directory.Making it more efficient
Prerequisite:
If .htaccess is not enabled on the server, we'll still need that index.php page redirecting to wikka.php (and hope it is declared in the Apache configuration as one of the directory index files to look for) - so we'll have to keep it. But if it is enabled we can save one browser roundtrip by telling Apache to go looking directly for wikka.php when a directory is requested - instead of going through its declared search path until it stumbles onto index.html which does exist in the directory. This will do it:- AllowOverride Indexes
- AllowOverride All
DirectoryIndex wikka.php
Security options
Combating referrer spam
Prerequisite:
The default Wikka .htaccess file has something like this (blank line removed and capitalization adapted to conform to the norm):- AllowOverride FileInfo Limit
- AllowOverride All
SetEnvIfNoCase Referer ".*(...|...|...).*" BadReferrer
Order Deny,Allow
Deny from env=BadReferrer
Order Deny,Allow
Deny from env=BadReferrer
The (...|...|...) part here represents where the substrings to match referrer URLs with would go (separated by pipe characters).
In principle, this will work to stop some URLs from re-appearing in the referrers list (which isn't indexed anyway). But the problem with the current expression (see HtaccessConfigInfo) is that it is old, and probably does not reflect current referrer spam "habits".
So I propose we keep this (syntax rewritten as above) but we should revise the list of substrings, using our own referer blacklist here as a guideline. Looking at that, I notice two major themes: porn and erotica, and online pharmacies (not surprisingly). Some care should be taken with the actual list of substrings, though. For instance (I'm just making this up) there might be a quite legitimate "gay pride wiki" using Wikka; if such a site exist, it would cause quite legitimate referrer entries because it actually links back to our site. Which indicates we should avoid using a generic terms (like 'gay') but look for keywords that suggest porn (or online pharmacies, etc.) instead.
A new list consisting of medicine names and some 'porn' indicators would probably be more effective than what we have now. But most important is to take care that the list would not deny access to sites actually linking to a Wikka site.
- As a note, the current format of the regular expression can potentially cause valid pages to fail. For example if you have .*(rape|porn).* as the filter, you will be unable to edit valid wiki pages like OperaPerformances or TopOrnamentation. A work around is to use \b instead of .* — IanAndolina
- Quite right - and it's better to miss stopping some referrers than to stop legitimate referrers. So our more conservative expression would become
\b(...|...|...)\b
While that would miss 'pornography' it would still catch many of the referers we now have in our referrer blacklist. --JavaWoman
Preventing directories from being browsed
see also Directory requests below!We have a nice files action that makes it possible (at least for admins) to upload files and for visitors to download them. Using the files action on a page gives people access to one or all of the files "attached" to that page - and if there is no files action on the page, the attached files are hidden.
But are they? It really depends on the main Apache configuration. On many systems, directory browsing is permitted, which means that when one requests a directory and there is no index file found in that directory, Apache will construct a page which gives access to the (non-hidden) files in that directory. So if Apache is not configured to prevent this, all files will actually be accessible for anyone with a little bit of knowledge of the Wikka directory structure. The same goes for other directories - in fact all that do not have an index file that Apache recognizes.
Generally it's a good idea not to allow access to files via directory browsing (unless you specifically want to give "FTP-like" access to them). We can do this easily via the .htaccess file as well - regardless of what Apache's main setting is.
Method 1: hiding the files
Prerequisite:
The first possible approach is to tell Apache to "ignore" the files you don't want to be visible. You can do that by extension, or simply tell it ito ignore "all" by using a '*' wildcard:- AllowOverride Indexes
- AllowOverride All
IndexIgnore *
- Pro: the visitor gets an index page that doesn't contain any files
- Con: confusing if the visitor already knows there must be files in the directory requested
Method 2: denying access
Prerequisite:
Another possibility is to simply make it impossible to access a directory. This can be done with another directive:- AllowOverride Indexes
- AllowOverride All
Options -Indexes
- Pro: the visitor won't be confused by an "empty" list of files
- Con: the result is actually a '403 Forbidden' server response which by itself is not very friendly.
Prerequisite:
The unfriendly '403' response could be handled by adding a custom error page - or better simply using that mechanism to redirect right back to the homepage:- AllowOverride FileInfo
- AllowOverride All
ErrorDocument 403 /wikka.php
If Wikka is not installed in the root but in a subdirectory, the path here should reflect the true location of the wikka.php file, of course.
Directory requests
The current main .htaccess file makes use of mod_rewrite to ensure directory requests actually end in a slash (as is required); the following code is used:
- <IfModule mod_rewrite.c>
- RewriteEngine on
- RewriteCond %{REQUEST_FILENAME} -d
- RewriteRule ^(.*/[^\./]*[^/])$ $1/
- RewriteRule ^(.*)$ wikka.php?wakka=$1 [QSA,L]
- </IfModule>
While line 11 is not actually part of the directory request mechanism, it is important to our discussion here, so it's shown anyway.
There are several problems with this mechanism though:
- The RewriteRule would not match directories that happen to have a . (period) in their name, which is quite legal
- If condition is true and the rewrite rule is applied, Apache is not told to stop at that with an 'L' flag, so it continues with the next rewrite rule... leading to totally incorrect URLs. For instance, on this site, try going to:
http://wikka.jsnx.com/3rdparty/plugins
this will lead to a page called "Wikka : 3rdparty" with error message Unknown method "page/plugins/.php" clearly not the the directory http://wikka.jsnx.com/3rdparty/plugins/ as intended: the double rewrite completely messes things up.
- The major problem is however that with these rewrite rules we're trying to do something (badly) with an external module (mod_rewrite) that may not even be installed or enabled what Apache will do excellently by itself with a base module (mod_dir) that is extremely unlikely to be disabled. (Base modules are always present and enabled by default; if you really want to you must explicitly disable them - but one of the responsibilities of mod_dir is precisely to take care that requests for directories end in a slash as required, so why would anyone even think of disabling it?)
So: let's get rid of these two lines in our .htaccess file:
- RewriteCond %{REQUEST_FILENAME} -d
- RewriteRule ^(.*/[^\./]*[^/])$ $1/
The effect will still be a '301 moved permanently' response - but without the errors.
Friendly URLs
Let's start with the relevant lines in the current .htaccess file (version 1.1.6.0 and earlier):
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteRule ^(.*)$ wikka.php?wakka=$1 [QSA,L]
</IfModule>
RewriteEngine on
RewriteRule ^(.*)$ wikka.php?wakka=$1 [QSA,L]
</IfModule>
We know already we have a problem here, too: as we saw above under Directory requests, Apache will happily apply this rule even when the request was actually for a directory - which is not our intention. This can be easily prevented though with an extra condition:
RewriteCond %{REQUEST_FILENAME} !-d
This is like the condition that was used to recognise a directory (-d) only now it's negated so the RewriteRule will be applied only if the request is for something that is not a directory.
And as explained on HtaccessConfigInfo, if we link to or embed a file in a page, the browser will make a request for that file, and Apache will rewrite that request unless we stop it doing that. We solved that by adding extra .htaccess files to those directories that contain files we need to reference from the generated HTML files (or CSS files). The big disadvantage with this is that it's not very robust: if someone adds an extension to their Wikka installation that uses an extra directory which contains files to be referenced, they would also have to remember to add an extra .htaccess file to make that actually work. We can make our .htaccess mechanism far more robust by directly excluding URLs that refer actual files from being rewritten. Remember: a single rewrite rule can have any number of RewriteCond conditions. Just like !-d means "not a directory", !-f means "not a file", so we add:
RewriteCond %{REQUEST_FILENAME} !-f
Prerequisite:
Our resulting mod_rewrite section will now look like this:- AllowOverride FileInfo
- AllowOverride All
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ wikka.php?wakka=$1 [QSA,L]
</IfModule>
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ wikka.php?wakka=$1 [QSA,L]
</IfModule>
And URLs will be rewritten only if they do not match an existing directory or an existing file. We can also do away with all of the extra .htaccess files.
Wikka in a subdirectory
laterSide effects
Using just one .htaccess file to both do URL rewriting and prevent that from happening for actual directories and files actually has advantages as well as disadvantages we need to be aware of (and do something about).
Advantages
- It is simple: just one file to install that takes care of everything. We could actually delete the old .htaccess files in the subdirectories when doing an update install.
- It is robust and resistant to extensions being added to Wikka: authors and implementors of extensions no longer have to worry whether or not a .htaccess would be needed in a newly-created directory or what it should contain.
- It also solves another problem that is raised frequently: uploaded images can now directly be referred to from an image action (and with a relative path). What's more, we could extend our image action so that for an image file "attached" to the current page, only the file name needs to be specified for the url and the image action will know where to find it: this would greatly facilitate the usage of uploaded images for things like screenshots of software, graphs, or maps to support otherwise textual data in pages which wikis traditionally tend to rely on. (Or, for that matter, using a wiki to build an image gallery.)
Disadvantages
- Advantage 3. above actually implies a disadvantage as well: if uploaded images can be directly linked to, so can any uploaded file. But under Preventing directories from being browsed above we were trying to prevent access to uploaded file other than by the files action to make only download possible, and only from a files action on the page the file was "attached" to. Catch 22.
- Worse, it would now be possible to directly link to any file in Wikka's directory tree. Clearly that was not our intention.
Possible counter measures
- Provide a configuration option to set whether or not files can be linked to, to what extent, or maybe limited by directory. In addition, provide the Formatter with the smarts to detect links and 'bare' URLs pointing to files within the Wikka tree; depending on configuration options, either allow the link or turn the URL into a link, or turn the link into plain text if a link is not allowed.
- The files action itself could also be outfitted with some filters, not merely looking at size, but for instance also at file type (regardless of extension!). There could also be an option to send an email to the WikiAdmin each time a file is uploaded, or holding them for moderation.
- The image action should also check whether the file referred to in the src/url parameter really is an image (see EnhancedImageAction for a nearly-finished solution).
- A more robust solution would be to store uploaded files directly in the database - or in the file system but outside of the document root so the browser cannot get at them - and provide an interface to them only via the database (both files action and image action would them use this interface). Storing them in the databease would also make it possible to store meta data along with the files themselves so for instance the image action could use that to retrieve a (default) alt text for an uploaded file used the action.
The golden mean
While the database solution is probably the most robust, it is also the most work to implement.A preliminary solution (much easier to implement) might be to only exclude directories from rewriting in the .htaccess file, and provide a way to designate a specific page and associated upload directory for images; if we provide an interface for this, we could also automatically generate an .htaccess file for that directory so the images there can be used in an image action. This would still require the extra security provided by the EnhancedImageAction to check that it is handling a real image. Unfortunately, we would also have to keep the current separate .htaccess files in the subdirectories, and would gain none of the robustness of having a single file to handle the complete configuration.
Putting it all together
laterExtra measures outside of .htaccess files
All of the above is dependent on .htaccess configuration being enabled - which we cannot be certain of. But some of the mechanisms can be mimicked without that. This section looks at what we can do without having .htaccess available.
Installation
The current installation and update instructions tell the WikiAdmin to go to the installation directory with the browser. Since we cannot be sure that .htaccess configuration is enabled or that index.php is in the Apache index search path, this may not actually work. The instruction should be to go to the wikka.php file in that directory instead.Directory index
As pointed out above, the file name index.php will not necessarily be declared in Apache's standard DirectoryIndex directive. If this is the case, a browser may actually end up showing the content of the Wikka installation directory - especially since .htaccess configuration is also not guaranteed to be enabled and whatever we put in there to prevent directory browsing may be ineffective. On the other hand, index.html is practically guaranteed to be in the index search path. We could add a minimal index.html file that does a "meta redirect" and - for browsers and search engines that don't support this - provides a link to wikka.php as well:<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<title>Redirect to home page</title>
<meta http-equiv="refresh" content="0; url=wikka.php" />
</head>
<body>
<h1>Redirect to home page</h1>
<p>If your browser doesn't automatically redirect, follow this link to take you to the <a href="wikka.php">home page</a></p>
</body>
</html>
<html>
<head>
<title>Redirect to home page</title>
<meta http-equiv="refresh" content="0; url=wikka.php" />
</head>
<body>
<h1>Redirect to home page</h1>
<p>If your browser doesn't automatically redirect, follow this link to take you to the <a href="wikka.php">home page</a></p>
</body>
</html>
Note that we don't make any assumptions about the name of the Wikka site; of course a WikiAdmin can edit the file to reflect the name but we can't do it dynamically: this is a static HTML file.
It's entirely possible that the (root) directory where Wikka is installed already contains an index.html file. To prevent blindly overwriting that file, our file should not come as a physical file in the installation archive, but be generated dynamically from the installer program. The installer can detect the pre-existence of an index.html file and ask the WikiAdmin permission for overwriting it (and renaming the original as a backup!). And when we are generating the file from the installer, we can also insert the name of the Wikka site, giving a warning to the WikiAdmin that this file must be edited if the name is changed in the configuration at some point. (A future user interface for maintaining the configuration could do this automatically, of course.)
more later
External sites
CategoryDevelopmentArchitecture