Welcome to the first article in an ongoing series, in which I attempt to actually explain the stuff that I usually blather on about without backreference or elucidation of any sort.
This is a high-level overview of the extant design for the Carrot processing flow. For a basic overview of the language itself, see the 10km document.
The Grammarian
Compilation of a Carrot document is a 2 phase process. In the first phase, a subsystem called the Grammarian examines the document for well-formedness. That is, it examines the document to make sure that it is written in a grammar of Carrot. The Grammarian proper is a message-passer/traffic cop between a small flock of components which handle the majority of the actual work of this phase of processing.
The Lexer does all input filehandling and decomposes the resultant stream of text into tokens. There are only three kinds of tokens: whitespace strings, nonwhitespace strings, and newlines.
Tokens from the lexer are examined by a group of Handler components. There is one handler for each type of Carrot markup, and one each for special cases like the document and include triggers (the pragma). These handlers construct nodes from the tokens, with each node being a lexical chunk of the document and all available metadata for it.
These nodes are then passed to the DOM (Document Object Model) component, which does the bulk of well-formedness checking and then stows properly constructed nodes into the document tree (which the DOM also provides read access to).
Assuming all goes well, the entire source document will be transformed into a datastructure which encodes the content and structure of the original, along with explicit and inferred metadata for later processing and (possible) error reporting.
The Ontologist
Phase 2 begins with another subsystem, called the Ontologist, beginning a directed walk of the DOM's tree. It is the Ontologist's job to make sure that the document is written in a valid dialect of Carrot. This is neccessary because, in its purest form, Carrot is, like XML, a language construction kit.
In any event, the Ontologist examines each node in isolation (This node says it's a trigger and its element type is "foobar". Is there a foobar trigger in my current dialect? This node contains an attribute "baz". Is that attribute valid for this element, and is its value of the defined type and within any defined bounds?) and in relation to other nodes (Is this element allowed within the current scope(s)?).
The Ontologist's validity rules are expressed in a Carrot grammar called Radish, which will be the topic of an entire post later on. (Yes, Carrot bootstraps itself, using itself.)
If the Ontologist successfully walks the entire tree, then the document is both well-formed and valid, and compilation is complete. From this point on, it's about what the user wants to do with the document. And that'll be the topic of the next How Carrot Works post.
No comments:
Post a Comment