Wiki source for HtaccessConfig

Show raw source

=====Configuration via .htaccess=====

>>==See also:==
Documentation: HtaccessConfigInfo.>>This is the development page for the .htaccess files (to be) distributed with Wikka.::c::
For documentation of what the current .htaccess files do, see HtaccessConfigInfo; here, we're going to look at what the main .htaccess file //might// do for us - while also getting rid of the separate .htaccess files in subdirectories.
''Each item has a "Prerequisite" box: this specifies what is required in Apache's main configuration (##httpd.conf##) for the current or higher directory to enable the proposed directive(s) in the .htaccess file. Multiple bulleted items (list) imply 'or': **one** of these must be specified to enable the directive in an ##.htaccess## file.''

This page assumes we're working on Apache; other web servers may have similar mechanisms to exploit (even may use .htaccess files for that) but we're not dealing with that here.
//Users working with a different webserver are welcome to add their own equivalent methods to this page or a separate one for their webserver!//

''//This page is still unfinished; sections where more content will be added are indicated as such.//''


===Default Apache behavior===

Normally an Apache server has a ##""DirectoryIndex""## directive in its server configuration (##httpd.conf##) that tells it which "index" file(s) to look for, in what order, when a user agent makes a request for a directory. Each of the file names specified will be tried in turn; if none of them is found (or none are defined), the directory content will be shown (if enabled) or a '403 Forbidden' response will be given if directory browsing is disabled //(see **Security options** below)//.

==Getting to Wikka's main file: ##wikka.php##==
Wikka runs operates completely through ##wikka.php##, and it comes with an ##index.php## file in its installation directory which does nothing but redirect to ##wikka.php## - which in its turn will redirect to the declared home page if not given a page parameter. The problem with this method is that it assumes that Apache's configured search path actually contains not just the standard ##index.html## but also ##index.php## - not guaranteed to be the case! But even if it is, going first to ##index.php## only to be redirected to ##wikka.php## is rather ineffcient, because it means **three** browser requests just to get at the home page when it's requesting the installation directory.

===Making it more efficient===

~-##""AllowOverride"" Indexes##
~-##""AllowOverride"" All##
>>If ##.htaccess## is not enabled on the server, we'll still need that ##index.php## page redirecting to ##wikka.php## (and hope it is declared in the Apache configuration as one of the directory index files to look for) - so we'll have to keep it. But if it //is// enabled we can save one browser roundtrip by telling Apache to go looking directly for ##wikka.php## when a directory is requested - instead of going through its declared search path until it stumbles onto ##index.html## which does exist in the directory. This will do it:
%%(apache)DirectoryIndex wikka.php%%

====Security options====

===Combating referrer spam===

~-##""AllowOverride FileInfo Limit""##
~-##""AllowOverride All""##>>The default Wikka ##.htaccess## file has something like this (blank line removed and capitalization adapted to conform to the norm):::c::
%%(apache)SetEnvIfNoCase Referer ".*(...|...|...).*" BadReferrer
Order Deny,Allow
Deny from env=BadReferrer
The ##(...|...|...)## part here represents where the substrings to match referrer URLs with would go (separated by pipe characters).

In principle, this will work to stop some URLs from re-appearing in the referrers list (which isn't indexed anyway). But the problem with the current expression (see HtaccessConfigInfo) is that it is old, and probably does not reflect current referrer spam "habits".

So I propose we keep this (syntax rewritten as above) but we should revise the list of substrings, using our own referer blacklist here as a guideline. Looking at that, I notice two major themes: porn and erotica, and online pharmacies (not surprisingly). Some care should be taken with the actual list of substrings, though. For instance (I'm just making this up) there might be a quite legitimate "gay pride wiki" using Wikka; if such a site exist, it //would// cause quite legitimate referrer entries because it actually links back to our site. Which indicates we should avoid using a generic terms (like 'gay') but look for keywords that suggest porn (or online pharmacies, etc.) instead.

A new list consisting of medicine names and some 'porn' indicators would probably be more effective than what we have now. But most important is to take care that the list would not deny access to sites **actually** linking to a Wikka site.

~&As a note, the current format of the regular expression can potentially cause valid pages to fail. For example if you have ##.*(rape|porn).*## as the filter, you will be unable to edit valid wiki pages like Ope//raPe//rformances or To//pOrn//amentation. A work around is to use \b instead of .* — IanAndolina
~~&Quite right - and it's better to miss stopping some referrers than to stop legitimate referrers. So our more conservative expression would become %%\b(...|...|...)\b%% ---While that would miss 'pornography' it would still catch many of the referers we now have in our referrer blacklist. --JavaWoman

===Preventing directories from being browsed===
//see also **Directory requests** below!//

We have a nice files action that makes it possible (at least for admins) to upload files and for visitors to download them. Using the files action on a page gives people access to one or all of the files "attached" to that page - and if there is no files action on the page, the attached files are hidden.

But are they? It really depends on the main Apache configuration. On many systems, directory browsing is permitted, which means that when one requests a directory and there is no index file found in that directory, Apache will construct a page which gives access to the (non-hidden) files in that directory. So if Apache is not configured to prevent this, all files will actually be accessible for anyone with a little bit of knowledge of the Wikka directory structure. The same goes for other directories - in fact all that do not have an index file that Apache recognizes.

Generally it's a good idea not to allow access to files via directory browsing (unless you specifically want to give "FTP-like" access to them). We can do this easily via the .htaccess file as well - regardless of what Apache's main setting is.

==Method 1: hiding the files==
~-##""AllowOverride Indexes""##
~-##""AllowOverride All""##>>The first possible approach is to tell Apache to "ignore" the files you don't want to be visible. You can do that by extension, or simply tell it ito ignore "all" by using a '*' wildcard:::c::
%%(apache)IndexIgnore *%%
~-Pro: the visitor gets an index page that doesn't contain any files
~-Con: confusing if the visitor already **knows** there must be files in the directory requested

==Method 2: denying access==
~-##""AllowOverride Indexes""##
~-##""AllowOverride All""##>>Another possibility is to simply make it **impossible** to access a directory. This can be done with another directive:::c::
%%(apache)Options -Indexes%%
~-Pro: the visitor won't be confused by an "empty" list of files
~-Con: the result is actually a '403 Forbidden' server response which by itself is not very friendly.

~-##""AllowOverride FileInfo""##
~-##""AllowOverride All""##>>The unfriendly '403' response could be handled by adding a custom error page - or better simply using that mechanism to redirect right back to the homepage:::c::
%%(apache)ErrorDocument 403 /wikka.php%%
''If Wikka is not installed in the root but in a subdirectory, the path here should reflect the true location of the ##wikka.php## file, of course.''

====Directory requests====

The current main .htaccess file makes use of **##mod_rewrite##** to ensure directory requests actually end in a slash (as is required); the following code is used:
%%(apache;7)<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^(.*/[^\./]*[^/])$ $1/
RewriteRule ^(.*)$ wikka.php?wakka=$1 [QSA,L]
''While line 11 is **not** actually part of the directory request mechanism, it **is** important to our discussion here, so it's shown anyway.''
There are several problems with this mechanism though:
~-The ##""RewriteRule""## would not match directories that happen to have a . (period) in their name, which is quite legal
~-If condition is true and the rewrite rule is applied, Apache is not told to stop at that with an '**##L##**' flag, so it continues with the next rewrite rule... leading to totally incorrect URLs. For instance, on this site, try going to: %% --- this will lead to a page called "Wikka : 3rdparty" with error message //Unknown method "page/plugins/.php"// clearly **not** the the directory ##""""## as intended: the double rewrite completely messes things up.
~-The major problem is however that with these rewrite rules we're trying to do something (badly) with an external module (##mod_rewrite##) that may not even be installed or enabled **what Apache will do excellently by itself with a base module** (##mod_dir##) that is //extremely// unlikely to be disabled. (Base modules are always present and enabled by default; if you really want to you must explicitly disable them - but one of the responsibilities of ##mod_dir## is precisely to take care that requests for directories end in a slash as required, so why would anyone even think of disabling it?)

So: let's get rid of these two lines in our ##.htaccess## file:
%%(apache;9)RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^(.*/[^\./]*[^/])$ $1/%%
The effect will still be a '301 moved permanently' response - but without the errors.

====Friendly URLs====

Let's start with the relevant lines in the //current// ##.htaccess## file //(version and earlier)//:
%%(apache)<IfModule mod_rewrite.c>
RewriteEngine on
RewriteRule ^(.*)$ wikka.php?wakka=$1 [QSA,L]

We know already we have a problem here, too: as we saw above under **Directory requests**, Apache will happily apply this rule even when the request was actually for a directory - which is not our intention. This can be easily prevented though with an extra condition:
%%(apache)RewriteCond %{REQUEST_FILENAME} !-d%%
This is like the condition that was used to recognise a directory (##-d##) only now it's negated so the ##""RewriteRule""## will be applied only if the request is for something that is **not** a directory.

And as explained on HtaccessConfigInfo, if we link to or embed a file in a page, the browser will make a request for that file, and Apache will rewrite that request **unless** we stop it doing that. We solved that by adding extra ##.htaccess## files to those directories that contain files we need to reference from the generated HTML files (or CSS files). The big disadvantage with this is that it's not very robust: if someone adds an extension to their Wikka installation that uses an extra directory which contains files to be referenced, they would also have to remember to add an extra ##.htaccess## file to make that actually work. We can make our ##.htaccess## mechanism far more robust by directly excluding URLs that refer actual files from being rewritten. Remember: a single rewrite rule can have any number of ##""RewriteCond""## conditions. Just like ##**!-d**## means "not a directory", ##**!-f**## means "not a file", so we add:
%%(apache)RewriteCond %{REQUEST_FILENAME} !-f%%

~-##""AllowOverride FileInfo""##
~-##""AllowOverride All""##>>Our resulting ##mod_rewrite## section will now look like this:::c::
%%(apache)<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ wikka.php?wakka=$1 [QSA,L]
And URLs will be rewritten **only** if they do not match an existing directory or an existing file. We can also do away with all of the extra ##.htaccess## files.

===Wikka in a subdirectory===

===Side effects===

Using just one ##.htaccess## file to both do URL rewriting and prevent that from happening for actual directories and files actually has advantages as well as disadvantages we need to be aware of (and do something about).

~1)It is simple: just one file to install that takes care of everything. //We could actually delete the old ##.htaccess## files in the subdirectories when doing an update install.//
~1)It is robust and resistant to extensions being added to Wikka: authors and implementors of extensions no longer have to worry whether or not a ##.htaccess## would be needed in a newly-created directory or what it should contain.
~1)It also solves another problem that is raised frequently: uploaded images can now directly be referred to from an image action (and with a relative path). What's more, we could extend our image action so that for an image file "attached" to the current page, only the file name needs to be specified for the url and the image action will know where to find it: this would greatly facilitate the usage of uploaded images for things like screenshots of software, graphs, or maps to support otherwise textual data in pages which wikis traditionally tend to rely on. (Or, for that matter, using a wiki to build an image gallery.)

~1)Advantage 3. above actually implies a disadvantage as well: if uploaded images can be directly linked to, so can //any// uploaded file. But under **Preventing directories from being browsed** above we were trying to prevent access to uploaded file other than by the files action to make only download possible, and only from a files action on the page the file was "attached" to. Catch 22.
~1)Worse, it would now be possible to directly link to **any** file in Wikka's directory tree. Clearly that was not our intention.

==Possible counter measures==
~1)Provide a configuration option to set whether or not files can be linked to, to what extent, or maybe limited by directory. In addition, provide the Formatter with the smarts to detect links and 'bare' URLs pointing to files within the Wikka tree; depending on configuration options, either allow the link or turn the URL into a link, or turn the link into plain text if a link is not allowed.
~1)The files action itself could also be outfitted with some filters, not merely looking at size, but for instance also at file type (regardless of extension!). There could also be an option to send an email to the WikiAdmin each time a file is uploaded, or holding them for moderation.
~1)The image action should also check whether the file referred to in the src/url parameter really is an image (see EnhancedImageAction for a nearly-finished solution).
~1)A more robust solution would be to store uploaded files directly in the database - or in the file system but outside of the document root so the browser cannot get at them - and provide an interface to them only via the database (both files action and image action would them use this interface). Storing them in the databease would also make it possible to store meta data along with the files themselves so for instance the image action could use that to retrieve a (default) alt text for an uploaded file used the action.

==The golden mean==
While the database solution is probably the most robust, it is also the most work to implement.
A **preliminary solution** (much easier to implement) might be to only exclude directories from rewriting in the ##.htaccess## file, and provide a way to designate a specific page and associated upload directory for images; if we provide an interface for this, we could also automatically generate an ##.htaccess## file for that directory so the images there can be used in an image action. This would still require the extra security provided by the EnhancedImageAction to check that it is handling a real image. //Unfortunately, we would also have to keep the current separate ##.htaccess## files in the subdirectories, and would gain none of the robustness of having a single file to handle the complete configuration.//

====Putting it all together====

====Extra measures outside of ##.htaccess## files====

All of the above is dependent on ##.htaccess## configuration being enabled - which we cannot be certain of. But some of the mechanisms can be mimicked without that. This section looks at what we can do //without// having ##.htaccess## available.

The current [[Docs:WikkaInstallation | installation and update instructions]] tell the WikiAdmin to go to the installation **directory** with the browser. Since we cannot be sure that ##.htaccess## configuration is enabled or that ##index.php## is in the Apache index search path, this may not actually work. The instruction should be to go to the **##wikka.php## file in that directory** instead.

==Directory index==
As pointed out above, the file name ##index.php## will not necessarily be declared in Apache's standard ##""DirectoryIndex""## directive. If this is the case, a browser may actually end up showing the content of the Wikka installation directory - especially since ##.htaccess## configuration is also not guaranteed to be enabled and whatever we put in there to prevent directory browsing may be ineffective. On the other hand, ##index.html## is practically guaranteed to be in the index search path. We could add a minimal ##index.html## file that does a "meta redirect" and - for browsers and search engines that don't support this - provides a link to ##wikka.php## as well:
%%(html4strict)<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "">
<title>Redirect to home page</title>
<meta http-equiv="refresh" content="0; url=wikka.php" />
<h1>Redirect to home page</h1>
<p>If your browser doesn't automatically redirect, follow this link to take you to the <a href="wikka.php">home page</a></p>
Note that we don't make any assumptions about the name of the Wikka site; of course a WikiAdmin can edit the file to reflect the name but we can't do it dynamically: this is a static HTML file.

''It's entirely possible that the (root) directory where Wikka is installed already contains an ##index.html## file. To prevent blindly overwriting that file, our file should not come as a physical file in the installation archive, but be **generated dynamically** from the installer program. The installer can detect the pre-existence of an ##index.html## file and ask the WikiAdmin permission for overwriting it (and renaming the original as a backup!). And when we are **generating** the file from the installer, we can also insert the name of the Wikka site, giving a warning to the WikiAdmin that this file must be edited if the name is changed in the configuration at some point. //(A future user interface for maintaining the configuration could do this automatically, of course.)//''

//more later//

==External sites==
~-[[ | Using .htaccess Files with Apache]]
~-[[ | Apache Tutorial: .htaccess files]]

Valid XHTML :: Valid CSS: :: Powered by WikkaWiki