Bug fix for Syntaxer.pmod
It didn’t take too long to notice that I had f–ked up the HTMLParser class a little bit. It was how entities was handled that didn’t really worked as expected – entities in tag attributes was duplicated and inserted in the tag content – but the good side of it is that I learned about the HTMLParser method of the HTMLParser Pike class. I only want to match entities in the data section – i.e. tag content – and not in attributes and the HTMLParser method tells you, as the name implies, in what context the entity is found. So my entity callback function now looks like:
- //! Entity callback
- protected void ecb(Parser.HTML p, string _data)
- {
- if (p->context() == "data")
- line += colorize(entify(_data), "entity");
- }
which hopefully will be completely bug free now. So one down 854 to go
Codify RXML tag and Syntaxer.pmod 17:31, Sat 17 October 2009 :: 183.9 kB
A Pike syntax highlighting module
So I thought I should try to port my syntax highlighting script, Syntaxer, written in PHP to Pike. Mostly for the fun of it but also to improve my knowledge of string handling in Pike. The greatest concern here is that PHP is a dynamic language and Pike is not (in the same sense) and the PHP version of Syntaxer heavily depends on dynamic loading of PHP files. The reason for this is that I generate the “syntax maps” dynamically from syntax files of Edit+. That means that if you want support for a new language just drop a .stx file in the right location and there you go. My script will convert that into a static PHP file, so that the conversion only needs to be done once, and load that file on the fly when that particular language is requested.
I thought that this method would be hard to implement in Pike – although it might be possible – so I had to come up with a slightly different approach. Frankly; it’s not that often you alter the .stx files or implement support for new languages so my solution is to manually create definitions for what ever language. But I still use the .stx files from Edit+ although one needs to copy and paste bit.
In the Pike solution each language is its own class that inherits the master class .stx. The only thing you pretty much need to put in the derived class is some .stx, .stx and .stx that specify what is what in the language. For example, the C++ definition looks like this:
- inherit .Hilite;
- public string title = "C++";
- //| Override the keywords mapping
- private mapping(string:multiset(string)) keywords = ([
- "keywords" : (<
- "auto","bool","break","case","catch","char","cerr","cin",
- "class","const","continue","cout","default","delete","do",
- "double","else","enum","explicit","extern","float","for",
- "friend","goto","if","inline","int","long","namespace","new",
- "operator","private","protected","public","register","return",
- "short","signed","sizeof","static","struct","switch","template",
- "this","throw","try","typedef","union","unsigned","virtual",
- "void","volatile","while","__asm","__fastcall","__based",
- "__cdecl","__pascal","__inline","__multiple_inheritance",
- "__single_inheritance" >),
- "compiler" : (<
- "define","error","include","elif","if","line","else","ifdef","pragma" >)
- ]);
- //| Override the default since # is no line comment in C++
- protected array(string) linecomments = ({ "//" });
- void create()
- {
- ::create();
- colors += ([ "compiler" : "#060" ]);
- styles += ([ "compiler" : ({ "<b>", "</b>" }) ]);
- }
And you really don’t need to make it more fancy than that. For most C-based languages the definitions in the master class .stx is enough. Just add the keywords to the .stx mapping and it looks better than nothing
HTML parser
One thing that differs from the PHP version of Syntaxer is that SGML-based, or tag based, languages will be run through a HTML-parser. The downside of the PHP version is that tag content will be highlighted as well, which of course isn’t what we want, but since Pike has a decent HTML parser that behaves like a SAX parser so I wrote a class, .stx, that uses that for highlighting tag based stuff. The .stx class also inherits .stx so the methods and members are the same.
I wonder why there’s no, built-in, HTML parser for PHP?
A Roxen tag module
Of course I had to write a Roxen tag module so that we can highlight source code in Roxen web pages. This was the reason for writing the Pike module at all. The tag is named .stx which might not be the most innovative name but what the heck! The beauty of it is that I made it possible, in the module settings tab, to create a surrounding HTML template for the output. When you run some code through the parser you get the highlighted source code as well as the name of the language and how many lines of code was highlighted and it might be nice to present that as well (just like the code blocks on this site). It’s tedious writing that surrounding HTML every time so now it’s just to put that in the settings and the code blocks will always look the same.
Finally
There’s some stuff left to do but the code works well enough to be usable. And I must say that the speed of the Pike version is like a thousand times faster than the PHP version!
Oh, and I have implemented support for the following language:
- ActionScript
- C
- C++
- C#
- CSS
- Java
- JavaScript
- HTML
- Perl
- PHP
- Pike
- Python
- Ruby
- RXML
- XSL
And that’s that for now.
Codify RXML tag and Syntaxer.pmod 17:31, Sat 17 October 2009 :: 183.9 kB
A Pike module and a Roxen tag
When you’r used to one programming language and start learning a new one you sometimes miss some features from the former. That happened to me a couple of years ago when we started using Roxen at work. What I missed was the trim functions from PHP. Sure, Pike and RXML can trim strings from whitespace but the beauty of trim, trim and trim in PHP is that you can trim characters as well as whitespace. And to my knowledge there’s no equivalent to trim and trim – that is only trim the left and right side respectively of the string.
So I wrote a Pike module that did that and also created an RXML tag of it. Just yesterday I rewrote that code since I do think I’ve become a better programmer and have come to learn Pike a lot better since my first try.
And I also wrote a method to shorten a string from the center and out, i.e. trim.
So here are some examples of usage:
- import .trim;
- string path = "/this/is/a/path/";
- write("%s\\n", rtrim(path, "/"));
- // Will output: /this/is/a/path
- write("%s\\n", trim(path, "/"));
- // Will output: this/is/a/path
- string long_str = "This is some string that's too long for us";
- write("%s\\n", ctrim(long_str, 20));
- // Will output: This is...for us
And here’s the RXML implementation:
- <trim right="" char="/">/this/is/a/path/</trim>
- <!-- /this/is/a/path -->
- <trim center="" length="20">This is some string that's too long for us</trim>
- <!-- This is...for us -->
And that’s that.
Trim Pike module 17:31, Sat 17 October 2009 :: 4 kB
Trim Roxen tag module 17:31, Sat 17 October 2009 :: 4.8 kB
Pike multiset
It’s fascinating: I’ve been using Pike for little over two years now and I have never really understood the Pike data type “multiset”. A multiset is the keys in an associative array – or mapping as they are called in Pike, or hash in Perl, or HashTable in C# – with the values left out. So if you have a Pike mapping that looks like ([ "key1" : 1, "key2" : 2, "key3" : 3 ]) a multiset of that would look like ([ "key1" : 1, "key2" : 2, "key3" : 3 ]) and an array would be ([ "key1" : 1, "key2" : 2, "key3" : 3 ]). Mappings and arrays I have used a lot, of course, but it was quite recently it came to me what the multiset is good for!
Lets say you have a function that takes a string as argument and that argument can have like 12 different values but you only want some action to take place if the value is one of three out of the twelve possibilities. In many languages that could be written like this Pike example:
- string my_function(string arg)
- {
- if (arg != "a_value" && arg != "b_value" && arg != "g_value")
- return "no";
- return "yes";
- }
I don’t know how many times I’ve written code like that. But here the Pike multiset really shines. This is how you could use the multiset:
- string my_function(string arg)
- {
- if (!(< "a_value", "b_value", "g_value" >)[arg])
- return "no";
- return "yes";
- }
I think that’s pretty nice. And that’s probably not the only thing the multiset is useful for.
While I'm at it, what about these nice syntactic sugar flakes of Pike:
- string str = "one two three four five six";
- array a_str = str/" ";
- str = a_str*", ";
The same in PHP
- $str = "one two three four five six";
- $a_str = explode(" ", $str);
- $str = implode(", ", $a_str);
Even though I've used PHP for 7-8 years I still have trouble remembering in what order the arguments is supposed to come in the function call.
Beautiful Roxen
I’ve been really busy lately at work with various things – developing a blogging system for starters. Most of the stuff you need for a blogging system is already available in Roxen CMS that we use at work. But one thing I needed, that isn’t available, was some commenting functionality. This is something that you could quite easily implement in the templates – Roxen uses XSLT for the templating system, or their own extension of XSLT so that you can write RXML code in the XSL templates.
RXML handles database driven things quite alright but if you develop things in this layer, the presentation layer, it gets harder to maintain or implement on a different server. What you can do then is writing a Roxen Module in Pike. You can create different types of modules – provider modules, RXML tag modules and so on – and you can also combine two or more modules into one. All RXML tags in Roxen is just wrappers for underlying Pike code, so you hide the logic in Pike and provide simple interfaces in RXML. First, let me give an example of how you loop over a database record set in RXML:
- <emit source="sql" host="myhost" query="SELECT name, email FROM employees">
- <a href="mailto:&_.email;">&_.name;</a>
- </emit><else>
- No records found
- </else>
It doesn’t get more simple than that! Another thing that’s great about the modules is that you can run arbitrary functions when you load the module which means that you can create databases and database tables when you first load the module. Roxen runs on MySQL internally and Roxen provide API:s to the internal MySQL database. What this means is that if you create database driven modules it’s just to install the module and the database and its tables will be created. No fuss!
Roxen, without the CMS, is a standalone open source web server and support the same kind of modules as the CMS does, but there’s more to the CMS!
The CMS modules
I’ve been poking around in the Roxen CMS source code for a while now and I have written some pure RXML tag modules – which is quite simple – but now I saw the opportunity to extend my knowledge and make a real CMS module. Since the CMS provide access control you don’t want emit tags to list stuff that the user don’t have access to. For instance: There’s a emit tag that you can use to list the most recent pages for a given path or pats. For example:
- <h2>Latest news</h2>
- <ul class='newslist'>
- <emit source="site-news" unique-paths="" maxrows="20"
- path="/comp-a/news/*/*.xml,/comp-b/news/*/*.xml,/secret/*/*.xml"
- >
- <li><a href='&_.path;'>
- <date type='iso' date='' iso-time='&_.published;' /> |
- <span>&_.title;</span>
- </a></li>
- </emit>
- </ul>
Now, you don’t want unauthorized users to see the items from the emit path do you. This tag in particular won’t show them either and I wanted the same kind of functionality in my set of emit tags. So I poked some more!
Another thing: Each file in Roxen CMS is version controlled through CVS and I needed to get some information about the first published version so I needed to find out how to access the CVS logs. The CVS thing also means that the files in Roxen CMS is saved to a virtual file system with an actual path, i.e emit, and what if there are comments to a page and that page gets moved? Thankfully Roxen CMS provides an API to run your own code when some kind of action takes place on a particular file – if it get moved, edited, deleted, purged and so forth. And I came to understand that I needed to poke around even more
Anyway; the more you dig around at the dark side of Roxen the more impressed you get by how clean the code is, how nice the API:s are and so on. I’m no Roxen beginner by now but up until now I haven’t really used the API:s to the internal workings of Roxen and I must say that after all, it was a bliss.
The comments module
So what will all this babbling lead up to? Well, what my module now does is:
- Upon load it creates an internal database with the table in which to store the comments. Here you can also make a dump or load a dump of the comments DB. In the settings you can set the default value for how long commenting should be enabled – lets say that after 30 days no further commenting should be allowed. You can also set a default author name if you allow anonymous commenting.
- It listens to the
emitwhich means that when actions takes place on web pages in the CMS the module will be notified and take actions if necessary. If the pageemithas comments to it and the page get moved the paths in the comments table will be updated to the new path so that comments always stick to its page even if the page is moved. Also, if a page gets deleted so will the comments. - It provides a set of RXML tags, one for listing comments – either to the page to which they belong or through globs so that you can make lists of “latest comments”, one for listing a particular comment for lets say editing, one tag for adding, one for updating, one for deleting a comment, one
emittag for checking if a user have admin permission to a given comment and oneemittag to check if the commenting form has expired or not and last one tag for counting the comments for a given page. - Not the least: I followed through and really documented, through Roxens module documentation functionality, the usage of the RXML tags
So I think that it covers what you’d expect a “commenting system” should be capable of doing. There’s may be some small things to solve as always when you develop stuff but it seems to work quite satisfying.
For anyone curious, here’s the source code:
Comments – Roxen CMS Module 17:31, Sat 17 October 2009 :: 36.3 kB






