Carrot: August 2008

Thursday, August 21, 2008

Tricksy

After my last post I sat around thinking about the implementation of various things, and I started to come up with some pretty clever ways of doing things. Of course, we all know how I feel about clever, so I'm really thinking hard about these. Also, how would it look if the Voice of Dissent hauled off and started doing things just because he could?

Magic Document Triggers

The first thing I thought about was the document trigger. Since the day I thought of the mechanism which it embodies, it's been declared Mandatory; you can't have a Carrot document without a document trigger. But also on the plan from early times is a tiered configuration system, and I realized that this combined with the fact that the processor can treat arbitrary strings as files to be decomposed into tokens (thanks to PerlIO) meant that a synthetic file could be assembled which contained a document pragma, built on-the-fly from a user's personal configuration.

This synthetic file would then be injected into the input list before processing began, putting it right where it should have been all along. Document export would take care of generating proper, sharable source files, and authors wouldn't have to type the same block into the beginning of every document they wrote. The problem comes when people don't export before sharing, and the person on the other end gets a document which contains entities their system doesn't recognize, or is written against a version of Carrot they don't have installed or...

Decision: if implemented, this should not be default behavior. It should be a power user feature which must be enabled in one's configuration.

Positionally Independent Data Pragma

The data pragma is an interesting thing, I think. It's basically the mechanism through which BLOB-equivalents are streamed into Carrot documents. As such, I've considered many uses for it but the two which will see use first are really different sides of the same coin: embedding media files and embedding Carrot extensions (both tied to Document Export).

The idea is that when you want to share a document, you tell Carrot to "export" you a copy -- to generate a version for the use of other people, who may not have access to all the resources you have locally. Since the system knows what's "stock", it examines the document for everything that isn't, and these bits are turned into data pragma which are written into the new copy of the document, and references to the original resources are rewritten to point at the data segments (where applicable).

The data segments themselves, whether they contain an image or an audio file or a Radish module, are compressed (again, where applicable) and BASE64 encoded. The BASE64 representation is then stowed as the content of a data pragma -- and so all resources become local to the document, at the cost of storage (but who cares about that these days?)

Here's the new bit: I realized a good while back that since the Grammarian has to read the document pragma before the rest of its processing chain can be instantiated (because the document pragma contains directives which may affect those tools). I realized this morning that I could use that delay to actually do 2 full passes over the document, scanning only for document and data pragma on the first pass. This would remove the requirement that data pragma appear before (that is: physically above in the source) any references to their content.

This sounds pretty sweet, but it increases processing time, and it brings up the new wrinkle of how not to hit the data pragma on the second pass. Do you special-case them and ignore? Do you physically remove them from the file during the first pass by doing a copy-on-read and excepting the pragma you're catching the first time 'round?

Decision: too big a problem for too small a win. Not gonna do it, for now.

Wednesday, August 20, 2008

Frighteningly close

Phase I of Carrot development is, I have realized, very close to completion. The only things left undone are the document pragma and the Grammarian -- and both of these have been at least partially implemented or prototyped already.

On the one hand, this is a very pleasant thought. On the other, it means that the rough, unyeilding wall of language design -- which had, to this point, merely been staring me in the face -- is now pressing hard against my shoulders.

The pragma, due to the way they operate (at the Grammarian level), must have their validation routines hardcoded. And, as the document pragma's job is to set data about the document itself and to modify the initial behavior of the Carrot processing chain, its contents and their validation have a lot of bearing on everything that follows.

This will all make more sense when (1) there's a demo and (2) Radish comes along in Phase II, I swear :)

Friday, August 15, 2008

Problem solved; next up

I chose to create a new class, VerbatimText, which is solely responsible for handling text inside Verbatim chunks. This adds a small amount of simple special-casing to the Grammarian, but the alternative was to add probably-complex special-casing to the Text handler and coming up with a mechanism for it to know what's going on at the Grammarian's level.

This felt like the far poorer choice, because as things stand, no modules are aware of what's happening "above" them in the stack -- things can only see down into the modules they're using. I think this is Good.

Anyway, that's done, and it's tested and working. Onward to the Pragma!

Tuesday, August 12, 2008

Oops (smaller)

Integration testing has revealed that I forgot to tell the text handler how to parse things inside Verbatim chunks.

I can think of a number of ways to handle this, mostly kludgey, but it strikes me just this moment that a pretty clean way would be the creation of a second Text handler, with modified opening (undef) and closing (Verbatim close) markups, to be called from the Grammarian whenever Verbatim is atop the context stack.

Hmmmmmmmmmm

Saturday, August 9, 2008

Huge step forward

Carrot now has a small set of handler integration tests, and they all pass. This neccessitated the creation of a "minigrammarian", which, while having no document intelligence as the real Grammarian will, validates the concept of the Grammarian -- and the tests validate the "flock of handlers" concept.

The test was to parse the string

"preceeding text [tag content content content]]]\nNew para"

which was properly rendered into

A TEXT chunk, 'preceding text'
A TAG
A second TEXT chunk, 'content content content'
A terminal TAG markup (']')
A terminal TRIGGER markup (']]')
And a third TEXT chunk, 'New para'

This is awesome.

Carrot