Revision [3666]
This is an old revision of IncludeRemote made by DarTar on 2004-12-25 19:37:07.
Fetching Remote Wikka Content
FetchRemote v.0.6 available for testing
Download the source and save it as:
actions/fetchremote.php
Feedback is welcome!
Download the source and save it as:
actions/fetchremote.php
Feedback is welcome!
FetchRemote Action
Version 0.7
Note:
JavaWoman has done a huge work in improving/debugging the link rewrite engine, which now works almost perfectly.
Hope she won't mind if I post here the 'debugging version' of the code ;)
What it does
- Connects to the main Wikka server and fetches Wikka Documentation Pages.
A "raw" handler must be available on the main Wikka server, in order to produce raw wikka-formatted content with header and footer stripped.
- Displays an error message if remote pages do not exist on the server or if a connection is not available.
- Parses the fetched page and rewrites internal links as links to fetchable pages.
- Prints the fetched page locally, together with a header.
- Allows fetched pages to be safely stored on the Wikka client.
- If a page with the same name already exists on the Wikka client, a "see local version" button instead of the "download" button is displayed.
How to use it
Simply add {{fetchremote}} in one of your pages.You can specify a starting page by adding: {{fetchremote page="HomePage"}}
Notes
- Basically, the idea is to make the main Wikka site work as a server providing wikka-formatted content to Wikka-clients. There are several advantages in this approach, compared to merely fetching HTML:
- the fetched content integrates seamlessly with the layout and structure of the Wikka-client;
- the user can choose to download locally a fetched page, so as to make it available in its Wikka site.
- No MySQL connection to the central database is needed, provided that a method exists for retrieving pure page content with the header and footer stripped;
- Remote fetching of pages through fopen() must be allowed by php (by default it is).
Long-term development ideas
The potential utility of such a plugin is pretty large. Just think of scenarios in which central Mother-wikis distribute wiki-formatted content to Child-wikis.Providing up-to-date documentation is only one of the possible uses of this plugin.
And now, for something completely different
<mode sci-fi="on">
Imagine that the set of patterns used by the rewrite engine to format the local version of the fetched page might be user-configurable and extended beyond link formatting. One day, we could have a plugin to retrieve content from remote 'non-wikka-powered' wikis, translate the wiki-content in wikka syntax and seamlessly integrate/save it locally. Sounds exciting, doesn't it? :)
<mode sci-fi="off">
The code (actions/fetchremote.php)
Note I had to modify a line in the code below because it contained two "%" in a row (which broke the code display on this page):
define('PATTERN_CODE', '% %.*?% %'); # ignore code block
Before testing this code please remove the space I added between the two "%":
define('PATTERN_CODE', '%%.*?%%'); # ignore code block
<?php
/**
* Connects to a specified Wikka server, fetches a remote page and formats it for local use.
*
* This action allows the user to locally browse in a Wikka client content fetched from a
* remote Wikka server. It displays an error message if the remote page does not exist
* on the server or if a connection is not available.
* Once a connection is established, the fetched page is parsed for internal links, which
* are rewritten as links to fetchable pages, and printed on the screen.
* Fetched pages can then be safely stored on the Wikka client. If a local version
* of a fetched page is available, a "see local version" button replaces the default
* "download" button.
*
* A "raw" method must be available on the main Wikka server, in order to
* produce raw wikka-formatted content with header and footer stripped.
*
* @package Actions
* @name FetchRemote
*
* @author {@link http://wikka.jsnx.com/DarTar DarTar}
* @author {@link http://wikka.jsnx.com/JavaWoman JavaWoman} - replacing double by single quotes, better patterns
* @version 0.7
* @since Wikka 1.1.X
*
* @input string $page optional: Starting page on the main Wikka server;
* default: WikkaDocumentation
* can be overridden by a $_REQUEST['page'] parameter.
* @output prints fetched documentation pages
*
* @todo -CamelCase link rewriting: check regex for consistency with wikka formatters.
* -Interwiki link rewriting => don't rewrite! just prevent CamelCase rewriting here
*/
// pattern defines
// NOTE: (initial) REs for URL taken from wakka.php formatter - same potential problems.. - now adapted since there WERE indeed problems!
// string to mark a "don't replace me" camel words and other strings with
define('IGNOREMARKER', '!!!');
define('PATTERN_IGNOREMARKER', '!!!'); # @@@ PHP function to escape for RE?
// patterns to be ignored for rewriting
define('PATTERN_IGNORE', PATTERN_IGNOREMARKER.'.*?'.PATTERN_IGNOREMARKER); # string "marked up" to be ignored
//Note: REMOVE spaces between % % in the following line before using the plugin
define('PATTERN_CODE', '% %.*?% %'); # ignore code block
define('PATTERN_LITERAL', '"".*?""'); # ignore Wikka literal
define('PATTERN_ACTION', '{{(?!image).*?}}'); # ignore action _except_ image
define('PATTERN_ATTRIB', '\b(\w*?\s*)(=\s?"[^\n]*?"|=\s?\'[^\n]*?\')'); # attributes (HTML, action)
#define('PATTERN_URL', '\b[a-z]+:\/\/\S+'); # copied from formatter
define('PATTERN_URL', '[a-z]+:\/\/\S+'); # copied from formatter - adapted
#define('PATTERN_URL2', '^([a-z]+:\/\/\S+?)([^[:alnum:]^\/])?$'); # copied from formatter
define('PATTERN_URL2', '\b[a-z]+:\/\/[[:alnum:]][-_[:alnum:]\/@:\.,_\?&;=]+[-_[:alnum:]\/\?&;=]'); # copied from formatter - adapted to recognize more URLs, @@@ not perfect yet
define('PATTERN_INTERWIKI', '\b[A-Zƒ÷‹][A-Za-zƒ÷‹?‰ˆ¸]+[:](?![=_])\S*\b'); # copied from formatter
define('PATTERN_FORCEDURL', '\[\[(?!")'.PATTERN_URL2.'(\s+(.*?))?\]\]'); # forced link with URL (ignore) @@@ (?!") still needed??
// regex pattern for forced links: accept "internal pages" (camelwords) on remote server but ignore URLs
define('PATTERN_FORCED', '\[\[(?!")([^\s\/\]]+)(\s+(.*?))?\]\]'); # forced link not with URL (rewrite) @@@ (?!") still needed??
// regex patterns to recognize a "CamelWord"
#define('PATTERN_CAMELWORD', '[A-Z]+[a-z]+[A-Z][A-Za-z0-9]+'); # @@@ make equivalent to formatter (see below)
#define('PATTERN_CAMELWORD', '\b[A-Zƒ÷‹]+[a-z?‰ˆ¸]+[A-Z0-9ƒ÷‹][A-Za-z0-9ƒ÷‹?‰ˆ¸]*\b'); # copied from formatter but removed brackets
define('PATTERN_CAMELWORD', '[A-Zƒ÷‹]+[a-z?‰ˆ¸]+[A-Z0-9ƒ÷‹][A-Za-z0-9ƒ÷‹?‰ˆ¸]*'); # copied from formatter but removed brackets
#define('PATTERN_FREECAMEL', '(\s*)('.PATTERN_CAMELWORD.')'); # @@@ not needed? leave for now
// regex pattern to recognize an image link (imaghe links with URLs are left to the formatter)
define('PATTERN_IMGLINK', 'link="('.PATTERN_CAMELWORD.')"');
/* problems solved so far
forced links:
- a forced link like [[MHM]] just disappeared (see CreateNewPage)
- forced links of the form [[WikiName]]s are misinterpreted (mangled result) (example on WikkaBugsResolved "Interwiki is broken")
- some URLs (in forced links) not recognized but should be ignored (see DarTar)
- forced links on NotifyOnChange not recognized at all (caused by the credits in (single) [] ?) => No: solution: single LinkRewrite!
camelwords:
- JsnX not recognised (see WikkaBugsResolved) => incorrect RE
- Words like Mod040fSmartPageTitles not recognized (see WikkaBugsResolved) => incorrect RE
ignores:
- ignore literals ""[[double bracket]]"" or ""WikiWord"" were rewritten when they shouldn't be (see also CreateNewPage)
- ignore code blocks (may contain forced links or WikiWords)
- ignore URLs that contain camelwords => simply ignore URLs
- URL with embedded camelword on its own on a line: URL not recognized (see Mod039fMindMapMod) => error in preg_replace_callback RE
- ignore InterWiki links (see WikiName for an example; better xmp at WikkaBugsResolved "Interwiki is broken")
- code not recognized on LoggedUsersHomepage and RedirectOnLogin => solution: single LinkRewrite!
- literal not recognized on LoggedUsersHomepage (Camel matched first on ""IntraNet"" - why?) => solution: single LinkRewrite!
- interwiki links broken again in single-function rewrite (see WikkaBugsResolved "Interwiki is broken") => clumsy fix with extra function
- OrphanedPages shows error message:
"Unknown action; an action name can consist only of US-ASCII characters and/or digits." but no page names at all...
=> add ignore for actions
other:
- code blocks may disappear or be broken (see FeedbackAction for an example) => incorrect code block ignore; RE must be match over multiple lines
*/
/* outstanding problems
- rewritten image links show up as external links - unavoidable, I think: the image does link to an external URL after rewriting! (see AddingLinks for an example)
*/
/* list of important TEST pages
- CreateNewPage - forced link without description (test correct RE and matching elements for forced links)
- literals that should be ignored (including literal containing URL containing camelword)
- NotifyOnChange - more forced links (test not getting confused by extra [] around forced links)
- WikkaBugsResolved - forced links of the form [[WikiName]]s - see "Interwiki is broken"
- InterWiki links
- camelwords like JsnX and Mod040fSmartPageTitles (test correct RE for camelwords)
- DarTar - forced links with external URLs (test not rewriting such forced links)
- LoggedUsersHomepage - literals to be ignored (such as ""IntraNet"") as well as code blocks
- FeedbackAction - code blocks (containing camelwords, literals and forced links) to be ignored
- FreeMind - forced link with URL containing underscore (test correct URL RE)
- Mod039fMindMapMod - lone camelword on one line followed by lone URL with camelword on next line (test URL RE and preg_replace_callback RE)
- OrphanedPages - action (with camelword!) should be ignored
- AddingLinks - image actions should NOT be ignored
*/
/* (possible) server-side bugs
- XBUG: problem with googleform on UsingActions => cause: bug in googleform itself! => REPORTED on WikkaBugs
- XBUG? OrphanedPages - shown directly starts with an "orphan" '12Action!' (does not exist) followed by page names; database problem?
*/
// SET DEFAULTS
$remote_server_root = 'http://wikka.jsnx.com/'; # set remote server root
//$remote_server_root = "http://test/wikka-1.1.5.0/wikka.php?wakka="; # debug server
$defaultpage = 'WikkaDocumentation'; # define default page to be fetched
if (isset($page)) $defaultpage = $page; # pick up action parameter
if (isset($_REQUEST['page'])) $defaultpage = $_REQUEST['page']; # pick up URL parameter
$page = $defaultpage; # ready to roll
// PERFORM REDIRECTIONS
// redirect to main documentation page
if ($_POST['action'] == 'Return to Wikka Documentation') $this->Redirect($this->GetPageTag());
// redirect to Wikka homepage on disconnection
if ($_POST['action'] == 'Disconnect') $this->Redirect($this->GetConfigValue('root_page'));
// switch to local version of the page
if ($_POST['action'] == 'See local version') $this->Redirect($page);
// automatically redirect to local page if it exists
// NOTE: the use of this feature is discouraged since it traps users 'locally'
// and prevents them from accessing recently updated versions of the Wikka documentation
//if ($this->LoadPage($page)) $this->Redirect($page);
// SET HEADER & FORM ELEMENTS
// header style
// to be replaced by a CSS selector in the definitive version
$style = 'text-align: center; margin: 30px 25%; border: 1px dotted #333; background-color: #EEE; padding: 5px;';
// build form chunks
$form_local = '<input type="submit" name="action" value="See local version" />'; # i18n
$form_main = '<input type="submit" name="action" value="Return to Wikka Documentation" />'; # i18n
$form_disconnect = '<input type="submit" name="action" value="Disconnect" />'; # i18n
$form_page = '<input type="hidden" name="page" value="'.$page.'" />';
$form_download = '<input type="submit" name="action" value="Download this page" />'; # i18n
// TRY TO CONNECT
$remote_page = fopen($remote_server_root.$page."/raw", "r");
if (!$remote_page) {
// NO CONNECTION AVAILABLE
echo $this->Format('=====Wikka Documentation===== --- Visit the **[[http://wikka.jsnx.com/WikkaDocumentation Wikka Documentation Project]]** --- --- ');
// if a local version of the starting page is available:
if ($this->LoadPage($page)) print $this->FormOpen().$form_local.$this->FormClose();
} else {
// CONNECTION ESTABLISHED
// fetch raw content of remote page
while (!feof($remote_page)) {
$content .= fgets($remote_page, 1024);
}
if (!$content)
{
// missing or empty page: show error message
$header = 'Sorry, **';
$header .= '""<a href="'.$this->Href('','','page='.$page).'">'.$page.'</a>""';
$header .= '** cannot be found on the [['.$remote_server_root.$page.' Wikka server]]! --- --- ';
$form = $this->FormOpen().$form_page;
$form .= ($this->LoadPage($page)) ? $form_local : '';
$form .= $form_main.$this->FormClose();
}
else
{
// START LINK-REWRITING ENGINE
// define callback functions
// mark strings to be ignored for rewriting
function MarkIgnore($things)
{
/* DEBUG - remove later
if ('' != $things[0])
{
echo '<br/>START MarkIgnore - $things:<pre>';
print_r($things);
echo '</pre>';
}
/**/
$thing = $things[0];
// ignore things BEFORE looking at forced links or camels
if (
// s modifier to match over multiple lines
// i modifier to make case-insensitive
preg_match('/'.PATTERN_CODE.'/s',$thing) # ignore code block
|| preg_match('/'.PATTERN_LITERAL.'/s',$thing) # ignore literals
|| preg_match('/'.PATTERN_ACTION.'/is',$thing) # ignore actions (keywords are case-insensitive and may be camelword!)
|| preg_match('/'.PATTERN_INTERWIKI.'/',$thing) # ignore Interwiki links
)
{
/* DEBUG - remove later
echo 'CODE, LITERAL or INTERWIKI match: {'.htmlspecialchars($thing).'}<br/>';
/**/
$output = IGNOREMARKER.$thing.IGNOREMARKER; # mark to be ignored
}
// ignore attributes except in image (action) links - MUST come before checking URLs
elseif (preg_match('/'.PATTERN_ATTRIB.'/',$thing,$matches))
{
/* DEBUG - remove later
echo '<br/>ATTRIB match:<pre>';
print_r($matches);
echo '</pre>';
/**/
if ('link' != $matches[1])
{
$output = $matches[1].IGNOREMARKER.$matches[2].IGNOREMARKER;
/* DEBUG - remove later
echo 'ATTRIB output: {'.htmlspecialchars($output).'}<br/>';
/**/
}
else
{
$output = $thing;
/* DEBUG - remove later
echo 'ATTRIB output in image link: {'.htmlspecialchars($output).'}<br/>';
/**/
}
}
// ignore forced links with URLs and 'free' URLs
elseif (
preg_match('/'.PATTERN_FORCEDURL.'/', $thing) # ignore forced links with URLs
|| preg_match('/'.PATTERN_URL2.'/', $thing) # ignore URLs
)
{
/* DEBUG - remove later
if (preg_match('/'.PATTERN_FORCEDURL.'/', $thing)) {
echo '<br/>FORCEDURL or URL match:<pre>';
echo htmlspecialchars($thing);
echo '</pre>';
}
/**/
$output = IGNOREMARKER.$thing.IGNOREMARKER; # mark to be ignored
/* DEBUG - remove later
echo 'REWRITE IGNORE (FORCED) URL - output: {'.htmlentities($output).'}<br/><br/>';
/**/
}
/* DEBUG - remove later
echo 'IGNORE - output: {'.htmlentities($output).'}<br/><br/>';
/**/
return $output;
}
// rewrite links (unless in a to be ignored string)
function RewriteLink($things)
{
/* DEBUG - remove later
if ('' != $things[0])
{
echo '<br/>START RewriteLink - $things:<pre>';
print_r($things);
echo '</pre>';
}
/**/
global $wakka;
$thing = $things[0];
if (preg_match('/'.PATTERN_IGNORE.'/s',$thing)) # already marked as ignore: nothing to do
{
/* DEBUG - remove later
echo 'IGNORE match: {'.htmlspecialchars($thing).'}<br/>';
/**/
$output = $thing;
}
// rewrite forced (non-URL) links
elseif (preg_match('/'.PATTERN_FORCED.'/',$thing,$matches))
{
/* DEBUG - remove later
echo '<br/>FORCED match:<pre>';
print_r($matches);
echo '</pre>';
/**/
if (isset($matches[3]))
#$linktext = preg_replace('/'.PATTERN_CAMELWORD.'/', IGNOREMARKER."$0".IGNOREMARKER, $matches[3]);
$linktext = $matches[3];
else
$linktext = $matches[1]; # use name for forced link without a description (like [[MHM]])
$output = IGNOREMARKER.'""<a href="'.$wakka->Href('','',"page=".$matches[1]).'">'.$linktext.'</a>""'.IGNOREMARKER;
/* DEBUG - remove later
echo 'REWRITE FORCED - output: {'.htmlentities($output).'}<br/><br/>';
/**/
}
// rewrite image links - MUST come before rewriting Camelwords!
elseif (preg_match('/'.PATTERN_IMGLINK.'/',$thing,$matches))
{
/* DEBUG - remove later
echo '<br/>IMGLINK match:<pre>';
print_r($matches);
echo '</pre>';
/**/
$output = 'link="'.$wakka->Href('','',"page=".$matches[1]).'"';
/* DEBUG - remove later/
echo 'REWRITE IMGLINK - output: {'.htmlspecialchars($output).'}<br/><br/>';
/**/
}
// rewrite Camelwords
elseif (preg_match('/'.PATTERN_CAMELWORD.'/',$thing,$matches))
{
/* DEBUG - remove later
echo '<br/>CAMEL match:<pre>';
print_r($matches);
echo '</pre>';
/**/
#$output = $matches[1].'""<a href="'.$wakka->Href('','',"page=".$matches[2]).'">'.$matches[2].'</a>""';`# freecamel
$output = '""<a href="'.$wakka->Href('','',"page=".$matches[0]).'">'.$matches[0].'</a>""'; # camelword
/* DEBUG - remove later/
echo 'REWRITE CAMEL - output: {'.htmlentities($output).'}<br/><br/>';
/**/
}
// nothing to do
else
{
$output = $thing;
}
return $output;
}
// 1) mark things to be ignored for rewriting (formatter wil take care of these when necessary)
$content = preg_replace_callback('/'.
PATTERN_CODE.
'|'.
PATTERN_LITERAL.
'|'.
PATTERN_ACTION.
'|'.
PATTERN_INTERWIKI.
'|'.
PATTERN_FORCEDURL.
'|'.
PATTERN_URL.
'|'.
PATTERN_ATTRIB.
'/s', 'MarkIgnore', $content);
/* DEBUG (!) - remove later
echo '<br/>content before rewriting links:<br/>';
echo '{<pre>'.htmlspecialchars($content).'</pre>}<br/>';
/**/
// 2) rewrite links (unless to be ignored)
$content = preg_replace_callback('/'.
PATTERN_IGNORE. # needed to be able to skip strings to be ignored
'|'.
PATTERN_FORCED. # rewrite
'|'.
PATTERN_IMGLINK. # rewrite
'|'.
PATTERN_CAMELWORD. # rewrite
'/s', 'RewriteLink', $content);
/* DEBUG - remove later
echo '<br/>content before cleaning up ignore markers:<br/>';
echo '{<pre>'.htmlspecialchars($content).'</pre>}<br/>';
/**/
// 3)strip "ignore markers" from content
$content = str_replace(IGNOREMARKER, '', $content);
/* DEBUG - remove later
echo '<br/>content after cleaning up ignore markers:<br/>';
echo '{<pre>'.htmlspecialchars($content).'</pre>}<br/>';
/**/
if ("Download this page" == $_POST['action']) # i18n
{
// SAVING FETCHED PAGE
if ($this->LoadPage($page))
{
// local page with this name already exists => display error message
// in the future we might show a form to ask if the local version should be overwritten
$header = 'Sorry, a page named **[['.$page.']]** already exists on this site! --- '; # i18n
$form = $this->FormOpen().$form_main.$form_disconnect.$this->FormClose();
}
else
{
// local page does not exist => proceed
// write page to database and display message
$note = "fetched from the Wikka server"; # i18n
$this->SavePage($page, $content, $note);
$header = 'This page is now available on your site! --- --- '; # i18n
$form = $this->FormOpen().$form_page.$form_local.$form_main.$this->FormClose();
}
}
else
{
// display default header & form # @@@ i18n!!
$header = 'You are currently browsing: **';
$header .= '""<a href="'.$this->Href('','','page='.$page).'">'.$page.'</a>""';
$header .= '** --- from the **[['.$this->GetPageTag().' Wikka Documentation Project]]** --- ';
$header .= '(fetched from the [['.$remote_server_root.$page.' Wikka server]])';
$form = $this->FormOpen().$form_page;
$form .= ($this->LoadPage($page)) ? $form_local : $form_download;
$form .= $form_disconnect.$this->FormClose();
}
}
/* DEBUG - remove later
echo '<br/>content after defining form:<br/>';
echo '|<pre>'.$content.'</pre>|<br/>';
/**/
// PRINT HEADER AND CONTENT
print '<div style="'.$style.'">'.$this->Format($header).$form.'</div>'.$this->Format($content);
}
// CLOSE CONNECTION
fclose($remote_page);
?>
/**
* Connects to a specified Wikka server, fetches a remote page and formats it for local use.
*
* This action allows the user to locally browse in a Wikka client content fetched from a
* remote Wikka server. It displays an error message if the remote page does not exist
* on the server or if a connection is not available.
* Once a connection is established, the fetched page is parsed for internal links, which
* are rewritten as links to fetchable pages, and printed on the screen.
* Fetched pages can then be safely stored on the Wikka client. If a local version
* of a fetched page is available, a "see local version" button replaces the default
* "download" button.
*
* A "raw" method must be available on the main Wikka server, in order to
* produce raw wikka-formatted content with header and footer stripped.
*
* @package Actions
* @name FetchRemote
*
* @author {@link http://wikka.jsnx.com/DarTar DarTar}
* @author {@link http://wikka.jsnx.com/JavaWoman JavaWoman} - replacing double by single quotes, better patterns
* @version 0.7
* @since Wikka 1.1.X
*
* @input string $page optional: Starting page on the main Wikka server;
* default: WikkaDocumentation
* can be overridden by a $_REQUEST['page'] parameter.
* @output prints fetched documentation pages
*
* @todo -CamelCase link rewriting: check regex for consistency with wikka formatters.
* -Interwiki link rewriting => don't rewrite! just prevent CamelCase rewriting here
*/
// pattern defines
// NOTE: (initial) REs for URL taken from wakka.php formatter - same potential problems.. - now adapted since there WERE indeed problems!
// string to mark a "don't replace me" camel words and other strings with
define('IGNOREMARKER', '!!!');
define('PATTERN_IGNOREMARKER', '!!!'); # @@@ PHP function to escape for RE?
// patterns to be ignored for rewriting
define('PATTERN_IGNORE', PATTERN_IGNOREMARKER.'.*?'.PATTERN_IGNOREMARKER); # string "marked up" to be ignored
//Note: REMOVE spaces between % % in the following line before using the plugin
define('PATTERN_CODE', '% %.*?% %'); # ignore code block
define('PATTERN_LITERAL', '"".*?""'); # ignore Wikka literal
define('PATTERN_ACTION', '{{(?!image).*?}}'); # ignore action _except_ image
define('PATTERN_ATTRIB', '\b(\w*?\s*)(=\s?"[^\n]*?"|=\s?\'[^\n]*?\')'); # attributes (HTML, action)
#define('PATTERN_URL', '\b[a-z]+:\/\/\S+'); # copied from formatter
define('PATTERN_URL', '[a-z]+:\/\/\S+'); # copied from formatter - adapted
#define('PATTERN_URL2', '^([a-z]+:\/\/\S+?)([^[:alnum:]^\/])?$'); # copied from formatter
define('PATTERN_URL2', '\b[a-z]+:\/\/[[:alnum:]][-_[:alnum:]\/@:\.,_\?&;=]+[-_[:alnum:]\/\?&;=]'); # copied from formatter - adapted to recognize more URLs, @@@ not perfect yet
define('PATTERN_INTERWIKI', '\b[A-Zƒ÷‹][A-Za-zƒ÷‹?‰ˆ¸]+[:](?![=_])\S*\b'); # copied from formatter
define('PATTERN_FORCEDURL', '\[\[(?!")'.PATTERN_URL2.'(\s+(.*?))?\]\]'); # forced link with URL (ignore) @@@ (?!") still needed??
// regex pattern for forced links: accept "internal pages" (camelwords) on remote server but ignore URLs
define('PATTERN_FORCED', '\[\[(?!")([^\s\/\]]+)(\s+(.*?))?\]\]'); # forced link not with URL (rewrite) @@@ (?!") still needed??
// regex patterns to recognize a "CamelWord"
#define('PATTERN_CAMELWORD', '[A-Z]+[a-z]+[A-Z][A-Za-z0-9]+'); # @@@ make equivalent to formatter (see below)
#define('PATTERN_CAMELWORD', '\b[A-Zƒ÷‹]+[a-z?‰ˆ¸]+[A-Z0-9ƒ÷‹][A-Za-z0-9ƒ÷‹?‰ˆ¸]*\b'); # copied from formatter but removed brackets
define('PATTERN_CAMELWORD', '[A-Zƒ÷‹]+[a-z?‰ˆ¸]+[A-Z0-9ƒ÷‹][A-Za-z0-9ƒ÷‹?‰ˆ¸]*'); # copied from formatter but removed brackets
#define('PATTERN_FREECAMEL', '(\s*)('.PATTERN_CAMELWORD.')'); # @@@ not needed? leave for now
// regex pattern to recognize an image link (imaghe links with URLs are left to the formatter)
define('PATTERN_IMGLINK', 'link="('.PATTERN_CAMELWORD.')"');
/* problems solved so far
forced links:
- a forced link like [[MHM]] just disappeared (see CreateNewPage)
- forced links of the form [[WikiName]]s are misinterpreted (mangled result) (example on WikkaBugsResolved "Interwiki is broken")
- some URLs (in forced links) not recognized but should be ignored (see DarTar)
- forced links on NotifyOnChange not recognized at all (caused by the credits in (single) [] ?) => No: solution: single LinkRewrite!
camelwords:
- JsnX not recognised (see WikkaBugsResolved) => incorrect RE
- Words like Mod040fSmartPageTitles not recognized (see WikkaBugsResolved) => incorrect RE
ignores:
- ignore literals ""[[double bracket]]"" or ""WikiWord"" were rewritten when they shouldn't be (see also CreateNewPage)
- ignore code blocks (may contain forced links or WikiWords)
- ignore URLs that contain camelwords => simply ignore URLs
- URL with embedded camelword on its own on a line: URL not recognized (see Mod039fMindMapMod) => error in preg_replace_callback RE
- ignore InterWiki links (see WikiName for an example; better xmp at WikkaBugsResolved "Interwiki is broken")
- code not recognized on LoggedUsersHomepage and RedirectOnLogin => solution: single LinkRewrite!
- literal not recognized on LoggedUsersHomepage (Camel matched first on ""IntraNet"" - why?) => solution: single LinkRewrite!
- interwiki links broken again in single-function rewrite (see WikkaBugsResolved "Interwiki is broken") => clumsy fix with extra function
- OrphanedPages shows error message:
"Unknown action; an action name can consist only of US-ASCII characters and/or digits." but no page names at all...
=> add ignore for actions
other:
- code blocks may disappear or be broken (see FeedbackAction for an example) => incorrect code block ignore; RE must be match over multiple lines
*/
/* outstanding problems
- rewritten image links show up as external links - unavoidable, I think: the image does link to an external URL after rewriting! (see AddingLinks for an example)
*/
/* list of important TEST pages
- CreateNewPage - forced link without description (test correct RE and matching elements for forced links)
- literals that should be ignored (including literal containing URL containing camelword)
- NotifyOnChange - more forced links (test not getting confused by extra [] around forced links)
- WikkaBugsResolved - forced links of the form [[WikiName]]s - see "Interwiki is broken"
- InterWiki links
- camelwords like JsnX and Mod040fSmartPageTitles (test correct RE for camelwords)
- DarTar - forced links with external URLs (test not rewriting such forced links)
- LoggedUsersHomepage - literals to be ignored (such as ""IntraNet"") as well as code blocks
- FeedbackAction - code blocks (containing camelwords, literals and forced links) to be ignored
- FreeMind - forced link with URL containing underscore (test correct URL RE)
- Mod039fMindMapMod - lone camelword on one line followed by lone URL with camelword on next line (test URL RE and preg_replace_callback RE)
- OrphanedPages - action (with camelword!) should be ignored
- AddingLinks - image actions should NOT be ignored
*/
/* (possible) server-side bugs
- XBUG: problem with googleform on UsingActions => cause: bug in googleform itself! => REPORTED on WikkaBugs
- XBUG? OrphanedPages - shown directly starts with an "orphan" '12Action!' (does not exist) followed by page names; database problem?
*/
// SET DEFAULTS
$remote_server_root = 'http://wikka.jsnx.com/'; # set remote server root
//$remote_server_root = "http://test/wikka-1.1.5.0/wikka.php?wakka="; # debug server
$defaultpage = 'WikkaDocumentation'; # define default page to be fetched
if (isset($page)) $defaultpage = $page; # pick up action parameter
if (isset($_REQUEST['page'])) $defaultpage = $_REQUEST['page']; # pick up URL parameter
$page = $defaultpage; # ready to roll
// PERFORM REDIRECTIONS
// redirect to main documentation page
if ($_POST['action'] == 'Return to Wikka Documentation') $this->Redirect($this->GetPageTag());
// redirect to Wikka homepage on disconnection
if ($_POST['action'] == 'Disconnect') $this->Redirect($this->GetConfigValue('root_page'));
// switch to local version of the page
if ($_POST['action'] == 'See local version') $this->Redirect($page);
// automatically redirect to local page if it exists
// NOTE: the use of this feature is discouraged since it traps users 'locally'
// and prevents them from accessing recently updated versions of the Wikka documentation
//if ($this->LoadPage($page)) $this->Redirect($page);
// SET HEADER & FORM ELEMENTS
// header style
// to be replaced by a CSS selector in the definitive version
$style = 'text-align: center; margin: 30px 25%; border: 1px dotted #333; background-color: #EEE; padding: 5px;';
// build form chunks
$form_local = '<input type="submit" name="action" value="See local version" />'; # i18n
$form_main = '<input type="submit" name="action" value="Return to Wikka Documentation" />'; # i18n
$form_disconnect = '<input type="submit" name="action" value="Disconnect" />'; # i18n
$form_page = '<input type="hidden" name="page" value="'.$page.'" />';
$form_download = '<input type="submit" name="action" value="Download this page" />'; # i18n
// TRY TO CONNECT
$remote_page = fopen($remote_server_root.$page."/raw", "r");
if (!$remote_page) {
// NO CONNECTION AVAILABLE
echo $this->Format('=====Wikka Documentation===== --- Visit the **[[http://wikka.jsnx.com/WikkaDocumentation Wikka Documentation Project]]** --- --- ');
// if a local version of the starting page is available:
if ($this->LoadPage($page)) print $this->FormOpen().$form_local.$this->FormClose();
} else {
// CONNECTION ESTABLISHED
// fetch raw content of remote page
while (!feof($remote_page)) {
$content .= fgets($remote_page, 1024);
}
if (!$content)
{
// missing or empty page: show error message
$header = 'Sorry, **';
$header .= '""<a href="'.$this->Href('','','page='.$page).'">'.$page.'</a>""';
$header .= '** cannot be found on the [['.$remote_server_root.$page.' Wikka server]]! --- --- ';
$form = $this->FormOpen().$form_page;
$form .= ($this->LoadPage($page)) ? $form_local : '';
$form .= $form_main.$this->FormClose();
}
else
{
// START LINK-REWRITING ENGINE
// define callback functions
// mark strings to be ignored for rewriting
function MarkIgnore($things)
{
/* DEBUG - remove later
if ('' != $things[0])
{
echo '<br/>START MarkIgnore - $things:<pre>';
print_r($things);
echo '</pre>';
}
/**/
$thing = $things[0];
// ignore things BEFORE looking at forced links or camels
if (
// s modifier to match over multiple lines
// i modifier to make case-insensitive
preg_match('/'.PATTERN_CODE.'/s',$thing) # ignore code block
|| preg_match('/'.PATTERN_LITERAL.'/s',$thing) # ignore literals
|| preg_match('/'.PATTERN_ACTION.'/is',$thing) # ignore actions (keywords are case-insensitive and may be camelword!)
|| preg_match('/'.PATTERN_INTERWIKI.'/',$thing) # ignore Interwiki links
)
{
/* DEBUG - remove later
echo 'CODE, LITERAL or INTERWIKI match: {'.htmlspecialchars($thing).'}<br/>';
/**/
$output = IGNOREMARKER.$thing.IGNOREMARKER; # mark to be ignored
}
// ignore attributes except in image (action) links - MUST come before checking URLs
elseif (preg_match('/'.PATTERN_ATTRIB.'/',$thing,$matches))
{
/* DEBUG - remove later
echo '<br/>ATTRIB match:<pre>';
print_r($matches);
echo '</pre>';
/**/
if ('link' != $matches[1])
{
$output = $matches[1].IGNOREMARKER.$matches[2].IGNOREMARKER;
/* DEBUG - remove later
echo 'ATTRIB output: {'.htmlspecialchars($output).'}<br/>';
/**/
}
else
{
$output = $thing;
/* DEBUG - remove later
echo 'ATTRIB output in image link: {'.htmlspecialchars($output).'}<br/>';
/**/
}
}
// ignore forced links with URLs and 'free' URLs
elseif (
preg_match('/'.PATTERN_FORCEDURL.'/', $thing) # ignore forced links with URLs
|| preg_match('/'.PATTERN_URL2.'/', $thing) # ignore URLs
)
{
/* DEBUG - remove later
if (preg_match('/'.PATTERN_FORCEDURL.'/', $thing)) {
echo '<br/>FORCEDURL or URL match:<pre>';
echo htmlspecialchars($thing);
echo '</pre>';
}
/**/
$output = IGNOREMARKER.$thing.IGNOREMARKER; # mark to be ignored
/* DEBUG - remove later
echo 'REWRITE IGNORE (FORCED) URL - output: {'.htmlentities($output).'}<br/><br/>';
/**/
}
/* DEBUG - remove later
echo 'IGNORE - output: {'.htmlentities($output).'}<br/><br/>';
/**/
return $output;
}
// rewrite links (unless in a to be ignored string)
function RewriteLink($things)
{
/* DEBUG - remove later
if ('' != $things[0])
{
echo '<br/>START RewriteLink - $things:<pre>';
print_r($things);
echo '</pre>';
}
/**/
global $wakka;
$thing = $things[0];
if (preg_match('/'.PATTERN_IGNORE.'/s',$thing)) # already marked as ignore: nothing to do
{
/* DEBUG - remove later
echo 'IGNORE match: {'.htmlspecialchars($thing).'}<br/>';
/**/
$output = $thing;
}
// rewrite forced (non-URL) links
elseif (preg_match('/'.PATTERN_FORCED.'/',$thing,$matches))
{
/* DEBUG - remove later
echo '<br/>FORCED match:<pre>';
print_r($matches);
echo '</pre>';
/**/
if (isset($matches[3]))
#$linktext = preg_replace('/'.PATTERN_CAMELWORD.'/', IGNOREMARKER."$0".IGNOREMARKER, $matches[3]);
$linktext = $matches[3];
else
$linktext = $matches[1]; # use name for forced link without a description (like [[MHM]])
$output = IGNOREMARKER.'""<a href="'.$wakka->Href('','',"page=".$matches[1]).'">'.$linktext.'</a>""'.IGNOREMARKER;
/* DEBUG - remove later
echo 'REWRITE FORCED - output: {'.htmlentities($output).'}<br/><br/>';
/**/
}
// rewrite image links - MUST come before rewriting Camelwords!
elseif (preg_match('/'.PATTERN_IMGLINK.'/',$thing,$matches))
{
/* DEBUG - remove later
echo '<br/>IMGLINK match:<pre>';
print_r($matches);
echo '</pre>';
/**/
$output = 'link="'.$wakka->Href('','',"page=".$matches[1]).'"';
/* DEBUG - remove later/
echo 'REWRITE IMGLINK - output: {'.htmlspecialchars($output).'}<br/><br/>';
/**/
}
// rewrite Camelwords
elseif (preg_match('/'.PATTERN_CAMELWORD.'/',$thing,$matches))
{
/* DEBUG - remove later
echo '<br/>CAMEL match:<pre>';
print_r($matches);
echo '</pre>';
/**/
#$output = $matches[1].'""<a href="'.$wakka->Href('','',"page=".$matches[2]).'">'.$matches[2].'</a>""';`# freecamel
$output = '""<a href="'.$wakka->Href('','',"page=".$matches[0]).'">'.$matches[0].'</a>""'; # camelword
/* DEBUG - remove later/
echo 'REWRITE CAMEL - output: {'.htmlentities($output).'}<br/><br/>';
/**/
}
// nothing to do
else
{
$output = $thing;
}
return $output;
}
// 1) mark things to be ignored for rewriting (formatter wil take care of these when necessary)
$content = preg_replace_callback('/'.
PATTERN_CODE.
'|'.
PATTERN_LITERAL.
'|'.
PATTERN_ACTION.
'|'.
PATTERN_INTERWIKI.
'|'.
PATTERN_FORCEDURL.
'|'.
PATTERN_URL.
'|'.
PATTERN_ATTRIB.
'/s', 'MarkIgnore', $content);
/* DEBUG (!) - remove later
echo '<br/>content before rewriting links:<br/>';
echo '{<pre>'.htmlspecialchars($content).'</pre>}<br/>';
/**/
// 2) rewrite links (unless to be ignored)
$content = preg_replace_callback('/'.
PATTERN_IGNORE. # needed to be able to skip strings to be ignored
'|'.
PATTERN_FORCED. # rewrite
'|'.
PATTERN_IMGLINK. # rewrite
'|'.
PATTERN_CAMELWORD. # rewrite
'/s', 'RewriteLink', $content);
/* DEBUG - remove later
echo '<br/>content before cleaning up ignore markers:<br/>';
echo '{<pre>'.htmlspecialchars($content).'</pre>}<br/>';
/**/
// 3)strip "ignore markers" from content
$content = str_replace(IGNOREMARKER, '', $content);
/* DEBUG - remove later
echo '<br/>content after cleaning up ignore markers:<br/>';
echo '{<pre>'.htmlspecialchars($content).'</pre>}<br/>';
/**/
if ("Download this page" == $_POST['action']) # i18n
{
// SAVING FETCHED PAGE
if ($this->LoadPage($page))
{
// local page with this name already exists => display error message
// in the future we might show a form to ask if the local version should be overwritten
$header = 'Sorry, a page named **[['.$page.']]** already exists on this site! --- '; # i18n
$form = $this->FormOpen().$form_main.$form_disconnect.$this->FormClose();
}
else
{
// local page does not exist => proceed
// write page to database and display message
$note = "fetched from the Wikka server"; # i18n
$this->SavePage($page, $content, $note);
$header = 'This page is now available on your site! --- --- '; # i18n
$form = $this->FormOpen().$form_page.$form_local.$form_main.$this->FormClose();
}
}
else
{
// display default header & form # @@@ i18n!!
$header = 'You are currently browsing: **';
$header .= '""<a href="'.$this->Href('','','page='.$page).'">'.$page.'</a>""';
$header .= '** --- from the **[['.$this->GetPageTag().' Wikka Documentation Project]]** --- ';
$header .= '(fetched from the [['.$remote_server_root.$page.' Wikka server]])';
$form = $this->FormOpen().$form_page;
$form .= ($this->LoadPage($page)) ? $form_local : $form_download;
$form .= $form_disconnect.$this->FormClose();
}
}
/* DEBUG - remove later
echo '<br/>content after defining form:<br/>';
echo '|<pre>'.$content.'</pre>|<br/>';
/**/
// PRINT HEADER AND CONTENT
print '<div style="'.$style.'">'.$this->Format($header).$form.'</div>'.$this->Format($content);
}
// CLOSE CONNECTION
fclose($remote_page);
?>
-- DarTar
The code contains references to HelpInfo which has now disappeared and been replaced by WikkaDocumentation - I haven't updated your code here, but I am updating the copy I'm working on... --JavaWoman
done -- DarTar
CategoryDevelopment