Tuesday, November 30, 2010

A syntax highlighting specification

In light of the Skywriter-Cloud9 collaboration, I've been working on a specification for cross-browser syntax highlighting engines. I now have something to announce, and I'm thrilled to have been able to work with not only the Skywriter team, but also Fabian Jakobs of Cloud9 and Marijn Haverbeke of CodeMirror fame, to develop a unified syntax highlighting specification for JavaScript code editors.

The idea is that people can share syntax highlighting engines across a variety of JavaScript code editors and viewers. Syntax highlighters are simple to create and are essentially a state machine consisting of JavaScript regexes. Here's an example of a simple diff/patch highlighter in this format:

exports.getInfo = function() {
return {
name: "diff",
fileexts: [ "diff", "patch" ],
mimetypes: [ "application/x-diff" ]

exports.getRules = function() {
return {
start: [
{ regex: /\+.*/, token: 'addition.diff' },
{ regex: /-.*/, token: 'deletion.diff' },
{ regex: /.*/, token: 'plain' }

As you can see, the format is very simple to get started with. But it's also fairly powerful: it supports nested syntax highlighting modes (for JavaScript inside HTML and the like), multiple states (so that strings and comments can be highlighted correctly), and all the features of JavaScript regexes.

It's my hope that this format will allow greater sharing among code editors. We're hoping to implement this soon in the Skywriter/Cloud9 codebase. The specification can be found here in an EtherPad, and I'd be very grateful for any and all feedback!


jviereck said...

Focusing power to bring up one standard for syntax highlighters just sounds right :)

One small question: The current spec doesn't mention a tag like "addition.diff", "deletion.diff". Should these get added to the spec?

Patrick Walton said...

The spec lists the names that TextMate uses. I just used "addition" and "deletion" as quick examples: I'm not sure what TextMate's names for "addition" and "deletion" are, but a real diff highlighter should use those instead.

Anonymous said...

Great work! But, is your format already implemented in CodeMirror and Ace?

Samuel Williams said...

Hi, have you considered jQuery.Syntax which already supports a wide variety of programming languages?

gggeek said...

Another candidate for an existing highlighting syntax format: the one used by geshi. The tool is written in php, but it should be possible to implement its functionality in js - and get 80+ languages for free