Thursday, August 21, 2008

Tricksy

After my last post I sat around thinking about the implementation of various things, and I started to come up with some pretty clever ways of doing things. Of course, we all know how I feel about clever, so I'm really thinking hard about these. Also, how would it look if the Voice of Dissent hauled off and started doing things just because he could?

Magic Document Triggers

The first thing I thought about was the document trigger. Since the day I thought of the mechanism which it embodies, it's been declared Mandatory; you can't have a Carrot document without a document trigger. But also on the plan from early times is a tiered configuration system, and I realized that this combined with the fact that the processor can treat arbitrary strings as files to be decomposed into tokens (thanks to PerlIO) meant that a synthetic file could be assembled which contained a document pragma, built on-the-fly from a user's personal configuration.

This synthetic file would then be injected into the input list before processing began, putting it right where it should have been all along. Document export would take care of generating proper, sharable source files, and authors wouldn't have to type the same block into the beginning of every document they wrote. The problem comes when people don't export before sharing, and the person on the other end gets a document which contains entities their system doesn't recognize, or is written against a version of Carrot they don't have installed or...

Decision: if implemented, this should not be default behavior. It should be a power user feature which must be enabled in one's configuration.

Positionally Independent Data Pragma

The data pragma is an interesting thing, I think. It's basically the mechanism through which BLOB-equivalents are streamed into Carrot documents. As such, I've considered many uses for it but the two which will see use first are really different sides of the same coin: embedding media files and embedding Carrot extensions (both tied to Document Export).

The idea is that when you want to share a document, you tell Carrot to "export" you a copy -- to generate a version for the use of other people, who may not have access to all the resources you have locally. Since the system knows what's "stock", it examines the document for everything that isn't, and these bits are turned into data pragma which are written into the new copy of the document, and references to the original resources are rewritten to point at the data segments (where applicable).

The data segments themselves, whether they contain an image or an audio file or a Radish module, are compressed (again, where applicable) and BASE64 encoded. The BASE64 representation is then stowed as the content of a data pragma -- and so all resources become local to the document, at the cost of storage (but who cares about that these days?)

Here's the new bit: I realized a good while back that since the Grammarian has to read the document pragma before the rest of its processing chain can be instantiated (because the document pragma contains directives which may affect those tools). I realized this morning that I could use that delay to actually do 2 full passes over the document, scanning only for document and data pragma on the first pass. This would remove the requirement that data pragma appear before (that is: physically above in the source) any references to their content.

This sounds pretty sweet, but it increases processing time, and it brings up the new wrinkle of how not to hit the data pragma on the second pass. Do you special-case them and ignore? Do you physically remove them from the file during the first pass by doing a copy-on-read and excepting the pragma you're catching the first time 'round?

Decision: too big a problem for too small a win. Not gonna do it, for now.

No comments: