How to optimize Wikka?


Some faster queries

If we change SELECT * in LoadAllPages by SELECT tag, owner, we gain in time and memory. (We just need $page["tag"] and $page["owner"] in pages that use LoadAllPages
Id est with SELECT tag, time, user, note in LoadRecentlyChanged. --DotMG

See also Efficiency in the getCatMembers() section of CompatibilityCode for a few similar changes. --JavaWoman

Optimizing the Number of Database Queries

OK, I installed Wikka on a new host and observed quite slow page generation times, especially for pages with quite a number of wiki words and usernames (e.g. RecentChanges). I turned on sql_debugging and to my horror saw Wikka performing nearly 60 database queries when constructing the RecentChanges page! Looking at the code it was immediately obvious why; recentchanges.php performs a database query for every edit to see if the user is registered or not and a query to check if the page ACL gives permission to show a link. So if you have 50 recent changes you can assume at least 100 queries!!! The answer is to cache the list of users, ACL names (and page names too for good measure) the first time a query is performed, and then use the cached version from then on. So I've modified $wakka->ExistsPage, $wakka->LoadAllACLs and created $wakka->ExistsUser functions which cache results:
function ExistsPage($page)
    {
        if (!isset($this->tagCache))
        {
            $this->tagCache = array();
            $query = "SELECT DISTINCT tag FROM ".$this->config['table_prefix']."pages";
            if ($r = $this->Query($query))
            {
                while($row = mysql_fetch_row($r))
                {
                    $this->tagCache[]=strtolower($row[0]);
                }
                mysql_free_result($r);
            }
        }      
        return is_int(array_search(strtolower($page),$this->tagCache));
    }

    function ExistsUser($name) {
        if (!isset($this->userCache))
        {
            $this->userCache = array();
            $query = "SELECT DISTINCT name FROM ".$this->config['table_prefix']."users";
            if ($r = $this->Query($query))
            {
                while($row = mysql_fetch_row($r))
                {
                    $this->userCache[]=strtolower($row[0]);
                }
                mysql_free_result($r);
            }
        }
        return is_int(array_search(strtolower($name),$this->userCache));
    }

The new $wakka->LoadAllACLs loads just the page_tag values from the acls table (which only stores values different to the defaults). Only if the tag being asked for is one of those pages with modified ACLs will it load the ACL values; otherwise it uses the defalts and avoids a query. Before this change, it ALWAYS did a query on the database even if the page ACL wasn't there!
    function LoadAllACLs($tag, $useDefaults = 1)
    {
        if (!isset($this->aclnameCache))
        {
            $this->aclnameCache = array();
            $query = "SELECT page_tag FROM ".$this->config['table_prefix']."acls";
            if ($r = $this->Query($query))
            {
                while($row = mysql_fetch_row($r))
                {
                    $this->aclnameCache[]=strtolower($row[0]);
                }
                mysql_free_result($r);
            }
        }
        if (is_int(array_search(strtolower($tag),$this->aclnameCache)) && $usedefaults!==1)
        {
            $acl = $this->LoadSingle("SELECT * FROM ".$this->config["table_prefix"]."acls WHERE page_tag = '".mysql_real_escape_string($tag)."' LIMIT 1");
        }
        else
        {
            $acl = array("page_tag" => $tag, "read_acl" => $this->GetConfigValue("default_read_acl"), "write_acl" => $this->GetConfigValue("default_write_acl"), "comment_acl" => $this->GetConfigValue("default_comment_acl"));
        }
        return $acl;
    }

Normally, $wakka->link uses $wakka->LoadPage to check if a page is an existing wiki page or not. LoadPage does kind of have a cache, but the whole page is cached, which with a lot of big pages will take up much more memory etc. So now we have a much more light-weight and speedy ExistsPage and ExistsUser lets modify $wakka->Link and actions/recentchanges.php and see what we can improve.

$wakka->Link — we just change $this->LoadPage to $this->ExistsPage and $linkedPage['tag'] to $tag
    else
    {
        // it's a wiki link
        if ($_SESSION["linktracking"] && $track) $this->TrackLinkTo($tag);
        $linkedPage = $this->ExistsPage($tag);
        // return ($linkedPage ? "<a href=\"".$this->Href($method, $linkedPage['tag'])."\">".$text."</a>" : "<span class=\"missingpage\">".$text."</span><a href=\"".$this->Href("edit", $tag)."\" title=\"Create this page\">?</a>");
        return ($linkedPage>=0 ? "<a href=\"".$this->Href($method, $tag)."\" title=\"$title\">".$text."</a>" : "<a class=\"missingpage\" href=\"".$this->Href("edit", $tag)."\" title=\"Create this page\">".$text."</a>");
    }

And actions/recentchanges.php to use our new ExistsUser:
            $timeformatted = date("H:i T", strtotime($page["time"]));
            $page_edited_by = $page["user"];
            if (!$this->ExistsUser($page_edited_by)) $page_edited_by .= " (unregistered user)";


Benchmarks:
OK, I have two sites which use the same database and one is a stock wikka install and one is my modified wikka (you can see the database queries at the bottom of the page):

http://nontroppo.dreamhosters.com/wikka2/RecentChanges - the standard wikka
http://nontroppo.dreamhosters.com/wikka/RecentChanges - the optimized wikka

On the recent changes page, the optimizations have reduced the database queries from 61 to just 6!:

Standard: 61 queries take an average (5 reloads) of 0.8769seconds
Optimized: 6 queries take an average (5 reloads) of 0.2481seconds >70% faster

On the PageIndex, the changes have reduced the database queries from 29 to just 6!:

Standard: 29 queries take an average (5 reloads) of 0.3907seconds
Optimized: 5 queries take an average (5 reloads) of 0.1628seconds >50% faster

IanAndolina


And if we serve Css and Javascript files with content-encoding = gzip?

See WikkaOptimizationCompressedStaticFiles for an approach to achieve this.
To save bandwidth, we may use gzip content encoding with text files, like Css and Javascript. I exploited the file mime_types.txt distributed with Wikka but css files are served as application/x-ilinc-pointplus, 'coz css extension is registered with this content-type. I need advices.
elseif (preg_match('/\.css$/', $this->method))
{
    #header('Location: css/' . $this->method); We replace this with :
    $filename = "css/{$this->method}.gz";
    if (file_exists($filename))
    {
        $content_length = filesize($filename);
        $etag = md5($filename . filemtime($filename) . filesize($filename)); #If the file wasn't modified, we will get the same etag.
        $expiry = gmdate("D, j M Y G:i:s", time()+28512000); #expires after 11 months
        header("Etag: $etag");
        if (strstr($_SERVER['HTTP_IF_NONE_MATCH'], $etag))
        {
            header('HTTP/1.1 304 Not Modified');
            die();
        }
        header('Content-Encoding: gzip');
        header("Content-Length: $content_length");
        header("Expires: $expiry GMT");
        header("Cache-Control: public, must-revalidate");
        header("Content-Type: text/css");  #Very important, because php scripts will be served as text/html by default
        $data = implode('', file($filename));
        die ($data);
    }
    else
    {
        header('HTTP/1.1 404 Not Found');
        die();
    }
}



Wikka's ETag is meaningless

See the code below (found in ./wikka.php) :
$etag =  md5($content);
header('ETag: '. $etag);

$content is the content of the page, including header (action header.php) and footer (action footer.php). But you see that in footer.php, the phrase 'Generated in x,xxxx seconds' is very rarely the same. Thus, a wiki page loaded at time (t) and reloaded at time (t+1) will have two different values for the header ETag.

I think the header and the footer should be excluded when calculating ETag. Ie, implement the method Run like this :
            print($this->Header());
   $content = $this->Method($this->method);
   echo $content;
   $GLOBALS['ETag'] = md5($content);
   print ($this->Footer());
        }
    }
}

and send the ETag header like this :
header("ETag: {$GLOBALS['ETag']}");


Another simple way is to use md5 of the date of latest change of the page instead of the content.

$etag = md5("$page_tag : $date_last_change : $date_last_comment"); --DotMG


Question : How does a webserver handle the If-Match, If-None-Match and If-Range request lines? Because Wikka sets manually the header ETag, I think it has also to handle manually these type of request-line.


A Potential Solution for Wikka's Meaningless ETag - Flexible and fast cacheing!


OK, So based on DotMG's valid critique of the current meaningless ETag output, and wanting to speed up Wikka by only sending pages that have changed, here is some beta code to play with:

Add this to $wakka->Run
    // THE BIG EVIL NASTY ONE!
    function Run($tag, $method = "")
    {
        // do our stuff!
        if (!$this->method = trim($method)) $this->method = "show";
        if (!$this->tag = trim($tag)) $this->Redirect($this->Href("", $this->config["root_page"]));
        if ((!$this->GetUser() && isset($_COOKIE["wikka_user_name"])) && ($user = $this->LoadUser($_COOKIE["wikka_user_name"], $_COOKIE["wikka_pass"]))) $this->SetUser($user);
        $this->SetPage($this->LoadPage($tag, (isset($_REQUEST["time"]) ? $_REQUEST["time"] :'')));
        //This is the new cache mechnaism-------------------------------------------------------------
        if (!preg_match($this->config["no_cache"],$tag) && $this->method == "show") //only lets in pages not in the exclusion list
        {
            $etag = md5($this->page["time"].$this->page["user"].$this->GetUserName());
                    $expires = $this->config["cache_age"]; //number of seconds to stay in cache, 0 means check validity each time
            header("Etag: $etag");
            header("Cache-Control: cache, max-age=".$expires."");
            header('Expires: '.gmdate('D, d M Y H:i:s',time()+$expires).' GMT');
            header("Pragma: cache");
            if (strstr($_SERVER['HTTP_IF_NONE_MATCH'], $etag))
            {
                header("HTTP/1.0 304 Not Modified");
                //ob_end_clean();
                //header('Content-Length: 0');
                die();
            }
        }
        else {header("Cache-control: no-cache");}*/
        //Cache mechanism END-------------------------------------------------------------------------

Added to wikka.config.php so an admin can configure this:
"no_cache" => "/(RecentChanges|RecentlyCommented|RecentComments)/",
"cache_age" => "0",


As you see a page will only ever return a 304 not modified IF: the page date and user hasn't changed, it is using the show method AND it doesn't match a RegEx of pages that should always be served fresh.

cache_age enables setting the cache validity time in seconds. So 600 would allow the client to not have to revalidate its cache for 10 minutes. When set to 0, the browser must all send a conditional GET, and only if the server sends a 304 response will it show the cached content.

One needs to remove the junk at the end of the main wikka.php with the current broken headers and one should have a simple client-based cache mechanism which serves fresh content when needed. Tested on Opera V8.0build7483 and FireFox V1.01 — someone needs to test it for IE (where Angels fear to tread) — IE may do something wrong as it has substantial numbers of cacheing bugs… After testing it superficially, IE 6.0 seems to be working as the other browsers!

See it in action here:

http://nontroppo.dreamhosters.com/wikka/HomePage

Problem
The major problem is that if a page is commented on, the cache will not fetch a new page. As DotMG suggested above, one needs a $date_last_comment for a page, and this is then used when first computing the ETag. For that, the easiest way would be to make a table field in wikka_pages for each page, and when a comment is added, update that field with the date. That should cause the cache to always update on the latest page change or comment added to that page. One could do a database query using the comments table, but that is a little more overhead and thus will be slightly slower. I prefer using a new table field...


Conditional GET and RSS Feeds


A fair amount of bandwidth is wasted on RSS syndication. Currently Wikka fails to manage cacheing of recentchanges and revisions RSS feeds. This causes a significant drain both in bandwidth and repeated generation of content, a lot of this waste can be avoided. My suggestions are:

header("Content-type: text/xml");
$etag = md5($this->page["time"].$this->page["user"]);
header('ETag: '.$etag);
header('Cache-Control: cache');
header('Pragma: cache');
if (strstr($_SERVER['HTTP_IF_NONE_MATCH'], $etag))
{
	header('HTTP/1.1 304 Not Modified');
	ob_end_clean();
	exit();
}


These two steps will drop bandwidth and the resources used in constant dynamic RSS (re)creation. --IanAndolina


(Google:rfc2616 for Documentation about Etag ...)

3.11 Entity Tags
Entity tags are used for comparing two or more entities from the same
requested resource. HTTP/1.1 uses entity tags in the ETag (section
14.19), If-Match (Section 14.24), If-None-Match (Section 14.26), and
If-Range (Section 14.27) header fields. The definition of how they
are used and compared as cache validators is in Section 13.3.3. An
entity tag consists of an opaque quoted string, possibly prefixed by
a weakness indicator.

entity-tag = [ weak ] opaque-tag
weak = "W/"
opaque-tag = quoted-string

A "strong entity tag" MAY be shared by two entities of a resource
only if they are equivalent by octet equality.

A "weak entity tag," indicated by the "W/" prefix, MAY be shared by
two entities of a resource only if the entities are equivalent and
could be substituted for each other with no significant change in
semantics. A weak entity tag can only be used for weak comparison.

An entity tag MUST be unique across all versions of all entities
associated with a particular resource. A given entity tag value MAY
be used for entities obtained by requests on different URIs. The use
of the same entity tag value in conjunction with entities obtained by
requests on different URIs does not imply the equivalence of those
entities.

 




Ticket:133

I have changed all href in anchor tags from absolute to relative with the following modification to Href function in wikka.php
php
	function Href($method = "", $tag = "", $params = "")
	{
		//$href = $this->config["base_url"].$this->MiniHref($method, $tag); //original code
		$href = $this->MiniHref($method, $tag);  //modified code


and added one line in wikka.php

php
	function Redirect($url='', $message='')
	{
		if ($message != '') $_SESSION["redirectmessage"] = $message;
		$url = ($url == '' ) ? $this->Href() : $url;
		$url = $this->config["base_url"].$url;  //added this line 
		header("Location: $url");
		exit;
	}

with this I have reduced the html send to the client by 13% in some cases (percent gain can be bigger if page has many internal links and base URL is long)
Is there any real reason for leaving this code in, since after all the base URL is defined in everypage the wikka is sending to the user
html
<base href="http://www.example.com/wikka/" />


ALSO
There are at least two calls to redirect function that use "base_url" as an argument before calling it that need to be changed if this optimization is applied

actions/delete.php
  1.         // redirect back to main page
  2.         $this->Redirect($this->config["base_url"], "Page has been deleted!");


should be:
  1.         // redirect back to main page
  2.         $this->Redirect($this->config["root_page"], "Page has been deleted!");

actions/newpage.php
  1.     else
  2.     {
  3.         $url = $this->config['base_url'];
  4.         $this->redirect($url.$pagename.'/edit');
  5.         $showform = FALSE;
  6.     }


should be:
  1.     else
  2.     {
  3.         $this->redirect($pagename.'/edit');
  4.         $showform = FALSE;
  5.     }



all other calls to redirect seem to work OK


CategoryDevelopmentArchitecture
There is one comment on this page. [Display comment]
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki