=====Fighting spam in Wikka===== >>**see also:** ~-HideReferrers ~-RemovingUsers ~-WikkaAndEmail ~-DeleteSpamAction ~-AdvancedReferrersHandler ~-SecurityModules ~- EditModeration >>As it may have dawned on you by now, spam is getting to be a problem in wiki's - both the type of spam that also plagues many blogs in the form of comment spam (only in a wiki it would (also) affect page content), and referrer spam. And then there are spambots intent on gathering email addresses. Wikka sites are no exception any more (and other WakkaWiki forks seem to be having problems, too). This page is intended to gather ideas for how to fight spam (of all types) in Wikka, so we can coordinate our efforts and get a spammer-hardened Wikka out there. You can also find some general information about (fighting) wiki spam and what Wikka has already implemented as defense measures. ====Spam in Wikka pages==== ~//About how to discourage spammers to post links on spam pages in the first place, and what to do when your pages have been spammed already.// ===Blocking Agents=== Bad Behavior is a set of PHP scripts which prevents spambots from accessing your site by analyzing their actual HTTP requests and comparing them to profiles from known spambots. (quote from the [[http://www.ioerror.us/software/bad-behavior/ | homepage]]) ~&copied to BadBehavior. --NilsLindenberg ===Two Suggestions=== ==Content Filter== Wacko wiki has implemented a content filter based on a word/phrase list. I'm not sure how sophisticated it is (it's not a Bayesian filter), but uses a list updated from ++chongqed.org++. Read more about it [[http://wackowiki.com/SPAM?v=1dlz | here]]. I thought this might contribute to our conversations about spamfighting. --GmBowen ~&Mike, I see using the blacklist from ++chongqed.org++ mentioned as an option but I don't see any reference that this is what actually has been implemented - merely that //some// content filtering has been implemented. The best way to use ++chongqed.org++ is to use their blacklist dynamically. --JavaWoman Preliminary list of links to (apparent) content blocking systems in wikis (more as I find them): ~-[[http://esw.w3.org/topic/BadContent | BadContent]] (//MoinMoin//) --JavaWoman ==Bayesian Filter: Focus on the content== Many of these suggestions will stop a certain degree of spam, but spammers can easily break these anti-spam measures such as adding random tokens (modern spam bots can already scan a page for form elements and submit all of them). Therefore, I suggest analyzing the content based on what might constitute spam (text frequency, link frequency, blacklist, bayesian filter) and then assigning a score to the post. If the post has over, let's say, a 50% chance for spam, then perhaps email validation, post approval, or a captcha can be used to further validate the user. I'm particularly supportive of the bayesian filter. For instance, many spam fighting programs today use the bayesian filter (ie. Thunderbird). The bayesian algorithm is adaptive and learning which will work best when used in conjunction with other standard filters. The process might be like this: 1) The standard filters (ie. blacklist) catches a suspicious post. The post is marked for approval. 2) The admins will review the post at the post moderation panel. If the post is "ham" then the bayesian filters will automatically adapt to allow future posts that resemble the approved post through. However, if the post is "spam", then the bayesian filter will automatically adapt to block future posts with those certain keywords. Therefore, a bayesian filter cannot be solely implemented, but rather, it requires admin intervention (to help the filter learn) and other standard filters. Bayesian filters have been extremely successful in eliminating over 98% of common spam after a few weeks of adaptation. --MikeXstudios ~&Nice idea Mike - but do you know of a Baysian filter implementation in PHP that could be easily integrated with Wikka? Preferably "lightweight" too, as we don't want an anti-spam solution be more weighty than Wikka itself. --JavaWoman ~~&In fact, I do :). I was playing around with [[http://www.phpgeek.com/pragmacms/index.php?layout=main&cslot_1=14 | this small Bayesian filter I found last week]]. It's around 55KB unzipped. Implementing it in Wikka would be less-than easy however. --MikeXstudios ==Adding Random Tokens for Form Submissions?== // [[Ticket:154]] // Based on [[http://shiflett.org/archive/96 | this post]], I wonder whether providing randomised session tokens for form submission may provide just one more step to impede spambots. Very simple to implement: wikka.php: %%(php)function FormOpen($method = "", $tag = "", $formMethod = "post") { if(!isset($_SESSION['token'])) { $token = md5(uniqid(rand(), true)); $_SESSION['token'] = $token; } $result = "