Revision [2760]
This is an old revision of ValidPageNames made by JavaWoman on 2004-12-02 12:36:21.
Pagename validation and link formatters
I open this page to discuss problems related to pagename validation and the underlying regex that are needed to validate and format both camelcase and forced links.
Current pattern for valid pagetags
$validtag = "/^[A-Z,a-z,ÄÖÜ,ßäöü]+[A-Z,a-z,0-9,ÄÖÜ,ßäöü]*$/s";
Some considerations off the cuff:
- The German eszed (ß) can't appear at the beginning of a word in any language, so we might drop it from the first character class.
- If we are to allow accented characters in valid page tags (are we?), we should consider allowing also other characters like for instance èéêëñç that are part of the extended ASCII charset (iso-8859-1).
- We should prevent non-escaped URIs to be parsed as pagetags or at least encode them before applying a validator: http://wikka.jsnx.com/ÄrgerMich is correctly encoded, but what if a user pastes this URL directly in the address field of a browser?
Apart from a possible "German" origin, I never understood the bias here to allowing German characters but not non-ASCII characters used in other languages. That said, I don't think an RE should look for a "word" but merely a "string-consisting-of-letters-and-digits-and-starting-with-a-letter". By using a hex encoding inside the RE for "letters" we would also make thsi encoding-independent, thus not limiting to ISO-8859-1 (why not a Turkish Wiki with Turkish page (and user) names?).
Also, the commas in that RE are puzzling - do we allow a Wiki name to start with or contain a comma? I think not - and in that case they should go.
Give me a moment and I'll come up with an alternative RE to match what I propose... --JavaWoman
Current pattern for valid usernames
JavaWoman pointed out that Wikka currently restricts valid usernames to camelcase-formatted WikiName WikiNames. Is this consistent with the fact that we actually do allow valid pagetags in forced links beyond the camelcase format? And what about special characters in usernames?
- DarTar is an allowed username and it is correctly parsed as a link.
- SchönesMädchen is an allowed username (you can actually register with this name) and is parsed as a link, also if you force it as [[SchönesMädchen]]: SchönesMädchen .
- Because of the currently used validation pattern, French users are discriminated while German users aren't :) - SchönesMädchen is allowed (with the above restrictions) while BelleFrançaise or NiñaHermosa aren't (look BTW at the incorrect WikiName segmentation produced by the cedille). On the other hand, they produce inconsistent links if you force them as [[BelleFrançaise]] and [[NiñaHermosa]]: BelleFrançaise NiñaHermosa). This should be IMO fixed as soon as possible.
A better formatter for forced internal links
I think that the current forced link formatter should be improved to allow GET parameters, anchors and titles to be parsed as part of valid internal links.
For example it would be nice if we could not only use forced links like:
[[HomePage Internal forced link]]
or
[[http://www.google.com External forced link]]
but also the following:- Forced internal link with URL parameter
[[HomePage (? "par1=ba,par2=bo") Internal forced link]]
- Forced internal link with anchor
[[HomePage (# "this") Internal forced link]]
=> http://wikka.jsnx.com/HomePage#this
- Forced internal link with Title
[[HomePage (§ "This is a link to the HomePage") Internal forced link]]
But I don't have a clue on how to modify the current formatter to send to the Link() function all this stuff.
-- DarTar
CategoryDevelopment CategoryRegex