Revision [3666]

This is an old revision of IncludeRemote made by DarTar on 2004-12-25 19:37:07.

 

Fetching Remote Wikka Content

Last edited by DarTar:
Posting JW's code and sci-fi ideas
Sat, 25 Dec 2004 19:37 UTC [diff]


FetchRemote v.0.6 available for testing
Download the source and save it as:
actions/fetchremote.php
Feedback is welcome!
 


FetchRemote Action
Version 0.7

Note:
JavaWoman has done a huge work in improving/debugging the link rewrite engine, which now works almost perfectly.
Hope she won't mind if I post here the 'debugging version' of the code ;)

What it does

How to use it
Simply add {{fetchremote}} in one of your pages.
You can specify a starting page by adding: {{fetchremote page="HomePage"}}

Notes
    1. the fetched content integrates seamlessly with the layout and structure of the Wikka-client;
    1. the user can choose to download locally a fetched page, so as to make it available in its Wikka site.

Long-term development ideas
The potential utility of such a plugin is pretty large. Just think of scenarios in which central Mother-wikis distribute wiki-formatted content to Child-wikis.
Providing up-to-date documentation is only one of the possible uses of this plugin.
And now, for something completely different
<mode sci-fi="on">
Imagine that the set of patterns used by the rewrite engine to format the local version of the fetched page might be user-configurable and extended beyond link formatting. One day, we could have a plugin to retrieve content from remote 'non-wikka-powered' wikis, translate the wiki-content in wikka syntax and seamlessly integrate/save it locally. Sounds exciting, doesn't it? :)
<mode sci-fi="off">

The code (actions/fetchremote.php)

Note I had to modify a line in the code below because it contained two "%" in a row (which broke the code display on this page):

define('PATTERN_CODE', '% %.*?% %'); # ignore code block

Before testing this code please remove the space I added between the two "%":

define('PATTERN_CODE', '%%.*?%%'); # ignore code block


<?php

/**
 * Connects to a specified Wikka server, fetches a remote page and formats it for local use.
 *
 * This action allows the user to locally browse in a Wikka client content fetched from a
 * remote Wikka server. It displays an error message if the remote page does not exist
 * on the server or if a connection is not available.
 * Once a connection is established, the fetched page is parsed for internal links, which
 * are rewritten as links to fetchable pages, and printed on the screen.
 * Fetched pages can then be safely stored on the Wikka client. If a local version
 * of a fetched page is available, a "see local version" button replaces the default
 * "download" button.
 *
 * A "raw" method must be available on the main Wikka server, in order to
 * produce raw wikka-formatted content with header and footer stripped.
 *
 * @package     Actions
 * @name        FetchRemote
 *
 * @author      {@link http://wikka.jsnx.com/DarTar DarTar}
 * @author      {@link http://wikka.jsnx.com/JavaWoman JavaWoman} - replacing double by single quotes, better patterns
 * @version     0.7
 * @since       Wikka 1.1.X
 *
 * @input       string  $page   optional: Starting page on the main Wikka server;
 *              default: WikkaDocumentation
 *              can be overridden by a $_REQUEST['page'] parameter.
 * @output      prints fetched documentation pages
 *
 * @todo        -CamelCase link rewriting: check regex for consistency with wikka formatters.
 *              -Interwiki link rewriting => don't rewrite! just prevent CamelCase rewriting here
 */



// pattern defines
    // NOTE: (initial) REs for URL taken from wakka.php formatter - same potential problems.. - now adapted since there WERE indeed problems!
// string to mark a "don't replace me" camel words and other strings with
define('IGNOREMARKER', '!!!');
define('PATTERN_IGNOREMARKER', '!!!');      # @@@ PHP function to escape for RE?

// patterns to be ignored for rewriting
define('PATTERN_IGNORE', PATTERN_IGNOREMARKER.'.*?'.PATTERN_IGNOREMARKER);  # string "marked up" to be ignored
//Note: REMOVE spaces between % % in the following line before using the plugin
define('PATTERN_CODE', '% %.*?% %');            # ignore code block
define('PATTERN_LITERAL', '"".*?""');                                       # ignore Wikka literal
define('PATTERN_ACTION', '{{(?!image).*?}}');                               # ignore action _except_ image
define('PATTERN_ATTRIB', '\b(\w*?\s*)(=\s?"[^\n]*?"|=\s?\'[^\n]*?\')');     # attributes (HTML, action)
#define('PATTERN_URL', '\b[a-z]+:\/\/\S+');                                                         # copied from formatter
define('PATTERN_URL', '[a-z]+:\/\/\S+');                                                            # copied from formatter - adapted
#define('PATTERN_URL2', '^([a-z]+:\/\/\S+?)([^[:alnum:]^\/])?$');                                   # copied from formatter
define('PATTERN_URL2', '\b[a-z]+:\/\/[[:alnum:]][-_[:alnum:]\/@:\.,_\?&;=]+[-_[:alnum:]\/\?&;=]');  # copied from formatter - adapted to recognize more URLs, @@@ not perfect yet
define('PATTERN_INTERWIKI', '\b[A-Zƒ÷‹][A-Za-zƒ÷‹?‰ˆ¸]+[:](?![=_])\S*\b');                          # copied from formatter
define('PATTERN_FORCEDURL', '\[\[(?!")'.PATTERN_URL2.'(\s+(.*?))?\]\]');    # forced link with URL (ignore)         @@@ (?!") still needed??

// regex pattern for forced links: accept "internal pages" (camelwords) on remote server but ignore URLs
define('PATTERN_FORCED', '\[\[(?!")([^\s\/\]]+)(\s+(.*?))?\]\]');           # forced link not with URL (rewrite)    @@@ (?!") still needed??

// regex patterns to recognize a "CamelWord"
#define('PATTERN_CAMELWORD', '[A-Z]+[a-z]+[A-Z][A-Za-z0-9]+');                                      # @@@ make equivalent to formatter (see below)
#define('PATTERN_CAMELWORD', '\b[A-Zƒ÷‹]+[a-z?‰ˆ¸]+[A-Z0-9ƒ÷‹][A-Za-z0-9ƒ÷‹?‰ˆ¸]*\b');              # copied from formatter but removed brackets
define('PATTERN_CAMELWORD', '[A-Zƒ÷‹]+[a-z?‰ˆ¸]+[A-Z0-9ƒ÷‹][A-Za-z0-9ƒ÷‹?‰ˆ¸]*');                   # copied from formatter but removed brackets
#define('PATTERN_FREECAMEL', '(\s*)('.PATTERN_CAMELWORD.')');                                       # @@@ not needed? leave for now

// regex pattern to recognize an image link (imaghe links with URLs are left to the formatter)
define('PATTERN_IMGLINK', 'link="('.PATTERN_CAMELWORD.')"');

/* problems solved so far
forced links:
- a forced link like [[MHM]] just disappeared (see CreateNewPage)
- forced links of the form [[WikiName]]s are misinterpreted (mangled result) (example on WikkaBugsResolved "Interwiki is broken")
- some URLs (in forced links) not recognized but should be ignored (see DarTar)
- forced links on NotifyOnChange not recognized at all (caused by the credits in (single) [] ?) => No: solution: single LinkRewrite!
camelwords:
- JsnX not recognised (see WikkaBugsResolved) => incorrect RE
- Words like Mod040fSmartPageTitles not recognized (see WikkaBugsResolved) => incorrect RE
ignores:
- ignore literals ""[[double bracket]]"" or ""WikiWord"" were rewritten when they shouldn't be (see also CreateNewPage)
- ignore code blocks (may contain forced links or WikiWords)
- ignore URLs that contain camelwords => simply ignore URLs
- URL with embedded camelword on its own on a line: URL not recognized (see Mod039fMindMapMod) => error in preg_replace_callback RE
- ignore InterWiki links (see WikiName for an example; better xmp at WikkaBugsResolved "Interwiki is broken")
- code not recognized on LoggedUsersHomepage and RedirectOnLogin => solution: single LinkRewrite!
- literal not recognized on LoggedUsersHomepage (Camel matched first on ""IntraNet"" - why?) => solution: single LinkRewrite!
- interwiki links broken again in single-function rewrite (see WikkaBugsResolved "Interwiki is broken") => clumsy fix with extra function
- OrphanedPages shows error message:
    "Unknown action; an action name can consist only of US-ASCII characters and/or digits." but no page names at all...
    => add ignore for actions
other:
- code blocks may disappear or be broken (see FeedbackAction for an example) => incorrect code block ignore; RE must be match over multiple lines
*/


/* outstanding problems
- rewritten image links show up as external links - unavoidable, I think: the image does link to an external URL after rewriting! (see AddingLinks for an example)
*/


/* list of important TEST pages
- CreateNewPage         - forced link without description (test correct RE and matching elements for forced links)
                        - literals that should be ignored (including literal containing URL containing camelword)
- NotifyOnChange        - more forced links (test not getting confused by extra [] around forced links)
- WikkaBugsResolved     - forced links of the form [[WikiName]]s - see "Interwiki is broken"
                        - InterWiki links
                        - camelwords like JsnX and Mod040fSmartPageTitles (test correct RE for camelwords)
- DarTar                - forced links with external URLs (test not rewriting such forced links)
- LoggedUsersHomepage   - literals to be ignored (such as ""IntraNet"") as well as code blocks
- FeedbackAction        - code blocks (containing camelwords, literals and forced links) to be ignored
- FreeMind              - forced link with URL containing underscore (test correct URL RE)
- Mod039fMindMapMod     - lone camelword on one line followed by lone URL with camelword on next line (test URL RE and preg_replace_callback RE)
- OrphanedPages         - action (with camelword!) should be ignored
- AddingLinks           - image actions should NOT be ignored
*/


/* (possible) server-side bugs
- XBUG: problem with googleform on UsingActions => cause: bug in googleform itself! => REPORTED on WikkaBugs
- XBUG? OrphanedPages   - shown directly starts with an "orphan" '12Action!' (does not exist) followed by page names; database problem?
*/


// SET DEFAULTS

$remote_server_root = 'http://wikka.jsnx.com/'; # set remote server root
//$remote_server_root = "http://test/wikka-1.1.5.0/wikka.php?wakka="; # debug server

$defaultpage = 'WikkaDocumentation'; # define default page to be fetched
if (isset($page)) $defaultpage = $page; # pick up action parameter
if (isset($_REQUEST['page'])) $defaultpage = $_REQUEST['page']; # pick up URL parameter
$page = $defaultpage; # ready to roll

// PERFORM REDIRECTIONS

// redirect to main documentation page
if ($_POST['action'] == 'Return to Wikka Documentation') $this->Redirect($this->GetPageTag());

// redirect to Wikka homepage on disconnection
if ($_POST['action'] == 'Disconnect') $this->Redirect($this->GetConfigValue('root_page'));

// switch to local version of the page
if ($_POST['action'] == 'See local version') $this->Redirect($page);

// automatically redirect to local page if it exists
// NOTE: the use of this feature is discouraged since it traps users 'locally'
// and prevents them from accessing recently updated versions of the Wikka documentation
//if ($this->LoadPage($page)) $this->Redirect($page);

// SET HEADER & FORM ELEMENTS

// header style
// to be replaced by a CSS selector in the definitive version
$style = 'text-align: center; margin: 30px 25%; border: 1px dotted #333; background-color: #EEE; padding: 5px;';

// build form chunks
$form_local = '<input type="submit" name="action" value="See local version" />';            # i18n
$form_main = '<input type="submit" name="action" value="Return to Wikka Documentation" />'; # i18n
$form_disconnect = '<input type="submit" name="action" value="Disconnect" />';              # i18n
$form_page = '<input type="hidden" name="page" value="'.$page.'" />';
$form_download = '<input type="submit" name="action" value="Download this page" />';        # i18n


// TRY TO CONNECT
$remote_page = fopen($remote_server_root.$page."/raw", "r");

if (!$remote_page) {

    // NO CONNECTION AVAILABLE
    echo $this->Format('=====Wikka Documentation===== --- Visit the **[[http://wikka.jsnx.com/WikkaDocumentation Wikka Documentation Project]]** --- --- ');
    // if a local version of the starting page is available:
    if ($this->LoadPage($page)) print $this->FormOpen().$form_local.$this->FormClose();

} else {

    // CONNECTION ESTABLISHED

    // fetch raw content of remote page
    while (!feof($remote_page)) {
        $content .= fgets($remote_page, 1024);
    }

    if (!$content)
    {
        // missing or empty page: show error message
        $header = 'Sorry, **';
        $header .=  '""<a href="'.$this->Href('','','page='.$page).'">'.$page.'</a>""';
        $header .= '** cannot be found on the [['.$remote_server_root.$page.' Wikka server]]! --- --- ';
        $form = $this->FormOpen().$form_page;
        $form .= ($this->LoadPage($page)) ? $form_local : '';
        $form .= $form_main.$this->FormClose();
    }
    else
    {

        // START LINK-REWRITING ENGINE

        // define callback functions
        // mark strings to be ignored for rewriting
        function MarkIgnore($things)
        {
/* DEBUG - remove later
if ('' != $things[0])
{
    echo '<br/>START MarkIgnore - $things:<pre>';
    print_r($things);
    echo '</pre>';
}
/**/

            $thing = $things[0];
            // ignore things BEFORE looking at forced links or camels
            if (
                // s modifier to match over multiple lines
                // i modifier to make case-insensitive
                    preg_match('/'.PATTERN_CODE.'/s',$thing)                            # ignore code block
                ||  preg_match('/'.PATTERN_LITERAL.'/s',$thing)                         # ignore literals
                ||  preg_match('/'.PATTERN_ACTION.'/is',$thing)                         # ignore actions (keywords are case-insensitive and may be camelword!)
                ||  preg_match('/'.PATTERN_INTERWIKI.'/',$thing)                        # ignore Interwiki links
                )
            {
/* DEBUG - remove later
echo 'CODE, LITERAL or INTERWIKI match: {'.htmlspecialchars($thing).'}<br/>';
/**/

                $output = IGNOREMARKER.$thing.IGNOREMARKER;                             # mark to be ignored
            }
            // ignore attributes except in image (action) links - MUST come before checking URLs
            elseif (preg_match('/'.PATTERN_ATTRIB.'/',$thing,$matches))
            {
/* DEBUG - remove later
echo '<br/>ATTRIB match:<pre>';
print_r($matches);
echo '</pre>';
/**/

                if ('link' != $matches[1])
                {
                    $output = $matches[1].IGNOREMARKER.$matches[2].IGNOREMARKER;
/* DEBUG - remove later
echo 'ATTRIB output: {'.htmlspecialchars($output).'}<br/>';
/**/

                }
                else
                {
                    $output = $thing;
/* DEBUG - remove later
echo 'ATTRIB output in image link: {'.htmlspecialchars($output).'}<br/>';
/**/

                }
            }
            // ignore forced links with URLs and 'free' URLs
            elseif (
                    preg_match('/'.PATTERN_FORCEDURL.'/', $thing)                       # ignore forced links with URLs
                ||  preg_match('/'.PATTERN_URL2.'/', $thing)                            # ignore URLs
                )
            {
/* DEBUG - remove later
if (preg_match('/'.PATTERN_FORCEDURL.'/', $thing)) {
    echo '<br/>FORCEDURL or URL match:<pre>';
    echo htmlspecialchars($thing);
    echo '</pre>';
}
/**/

                $output = IGNOREMARKER.$thing.IGNOREMARKER;                             # mark to be ignored
/* DEBUG - remove later
echo 'REWRITE IGNORE (FORCED) URL - output: {'.htmlentities($output).'}<br/><br/>';
/**/

            }
/* DEBUG - remove later
echo 'IGNORE - output: {'.htmlentities($output).'}<br/><br/>';
/**/

            return $output;
        }

        // rewrite links (unless in a to be ignored string)
        function RewriteLink($things)
        {
/* DEBUG - remove later
if ('' != $things[0])
{
    echo '<br/>START RewriteLink - $things:<pre>';
    print_r($things);
    echo '</pre>';
}
/**/

            global $wakka;
            $thing = $things[0];
            if (preg_match('/'.PATTERN_IGNORE.'/s',$thing))                         # already marked as ignore: nothing to do
            {
/* DEBUG - remove later
echo 'IGNORE match: {'.htmlspecialchars($thing).'}<br/>';
/**/

                $output = $thing;
            }
            // rewrite forced (non-URL) links
            elseif (preg_match('/'.PATTERN_FORCED.'/',$thing,$matches))
            {
/* DEBUG - remove later
echo '<br/>FORCED match:<pre>';
print_r($matches);
echo '</pre>';
/**/

                if (isset($matches[3]))
                    #$linktext = preg_replace('/'.PATTERN_CAMELWORD.'/', IGNOREMARKER."$0".IGNOREMARKER, $matches[3]);
                    $linktext = $matches[3];
                else
                    $linktext = $matches[1];                                            # use name for forced link without a description (like [[MHM]])
                $output = IGNOREMARKER.'""<a href="'.$wakka->Href('','',"page=".$matches[1]).'">'.$linktext.'</a>""'.IGNOREMARKER;
/* DEBUG - remove later
echo 'REWRITE FORCED - output: {'.htmlentities($output).'}<br/><br/>';
/**/

            }
            // rewrite image links -  MUST come before rewriting Camelwords!
            elseif (preg_match('/'.PATTERN_IMGLINK.'/',$thing,$matches))
            {
/* DEBUG - remove later
echo '<br/>IMGLINK match:<pre>';
print_r($matches);
echo '</pre>';
/**/

                $output = 'link="'.$wakka->Href('','',"page=".$matches[1]).'"';
/* DEBUG - remove later/
echo 'REWRITE IMGLINK - output: {'.htmlspecialchars($output).'}<br/><br/>';
/**/

            }
            // rewrite Camelwords
            elseif (preg_match('/'.PATTERN_CAMELWORD.'/',$thing,$matches))
            {
/* DEBUG - remove later
echo '<br/>CAMEL match:<pre>';
print_r($matches);
echo '</pre>';
/**/

                #$output = $matches[1].'""<a href="'.$wakka->Href('','',"page=".$matches[2]).'">'.$matches[2].'</a>""';`# freecamel
                $output = '""<a href="'.$wakka->Href('','',"page=".$matches[0]).'">'.$matches[0].'</a>""';              # camelword
/* DEBUG - remove later/
echo 'REWRITE CAMEL - output: {'.htmlentities($output).'}<br/><br/>';
/**/

            }
            // nothing to do
            else
            {
                $output = $thing;
            }
            return $output;
        }

        // 1) mark things to be ignored for rewriting (formatter wil take care of these when necessary)
        $content = preg_replace_callback('/'.
            PATTERN_CODE.
            '|'.
            PATTERN_LITERAL.
            '|'.
            PATTERN_ACTION.
            '|'.
            PATTERN_INTERWIKI.
            '|'.
            PATTERN_FORCEDURL.
            '|'.
            PATTERN_URL.
            '|'.
            PATTERN_ATTRIB.
            '/s', 'MarkIgnore', $content);

/* DEBUG (!) - remove later
echo '<br/>content before rewriting links:<br/>';
echo '{<pre>'.htmlspecialchars($content).'</pre>}<br/>';
/**/


        // 2) rewrite links (unless to be ignored)
        $content = preg_replace_callback('/'.
            PATTERN_IGNORE.                                                         # needed to be able to skip strings to be ignored
            '|'.
            PATTERN_FORCED.                                                         # rewrite
            '|'.
            PATTERN_IMGLINK.                                                        # rewrite
            '|'.
            PATTERN_CAMELWORD.                                                      # rewrite
            '/s', 'RewriteLink', $content);

/* DEBUG - remove later
echo '<br/>content before cleaning up ignore markers:<br/>';
echo '{<pre>'.htmlspecialchars($content).'</pre>}<br/>';
/**/


        // 3)strip "ignore markers" from content
        $content = str_replace(IGNOREMARKER, '', $content);

/* DEBUG - remove later
echo '<br/>content after cleaning up ignore markers:<br/>';
echo '{<pre>'.htmlspecialchars($content).'</pre>}<br/>';
/**/


        if ("Download this page" == $_POST['action'])                                   # i18n
        {
            // SAVING FETCHED PAGE
            if ($this->LoadPage($page))
            {
                // local page with this name already exists => display error message
                // in the future we might show a form to ask if the local version should be overwritten
                $header = 'Sorry, a page named **[['.$page.']]** already exists on this site! --- ';    # i18n
                $form = $this->FormOpen().$form_main.$form_disconnect.$this->FormClose();
            }
            else
            {
                // local page does not exist => proceed
                // write page to database and display message
                $note = "fetched from the Wikka server";                                # i18n
                $this->SavePage($page, $content, $note);
                $header = 'This page is now available on your site! --- --- ';          # i18n
                $form = $this->FormOpen().$form_page.$form_local.$form_main.$this->FormClose();
            }
        }
        else
        {
            // display default header & form                                            # @@@ i18n!!
            $header  = 'You are currently browsing: **';
            $header .=  '""<a href="'.$this->Href('','','page='.$page).'">'.$page.'</a>""';
            $header .= '** --- from the **[['.$this->GetPageTag().' Wikka Documentation Project]]** --- ';
            $header .= '(fetched from the [['.$remote_server_root.$page.' Wikka server]])';
            $form  = $this->FormOpen().$form_page;
            $form .= ($this->LoadPage($page)) ? $form_local : $form_download;
            $form .= $form_disconnect.$this->FormClose();
        }
    }
/* DEBUG - remove later
echo '<br/>content after defining form:<br/>';
echo '|<pre>'.$content.'</pre>|<br/>';
/**/


    // PRINT HEADER AND CONTENT
    print '<div style="'.$style.'">'.$this->Format($header).$form.'</div>'.$this->Format($content);
}

// CLOSE CONNECTION
fclose($remote_page);
?>


-- DarTar

The code contains references to HelpInfo which has now disappeared and been replaced by WikkaDocumentation - I haven't updated your code here, but I am updating the copy I'm working on... --JavaWoman

done -- DarTar

CategoryDevelopment
There are 17 comments on this page. [Show comments]
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki