Sunday, September 14, 2008

300

Changeset 300 was just committed to the Carrot repo. It was a minor change to Handler.pm with decently large consequences. Following on from the last post, I was worried about another untested case: immediately nested tags, like this...

[foo [bar a;b blah blah]]

i was worried this might turn up an edge case in valid_ending_pos -- and it did, but not in the manner I had anticipated. I thought the immediate nesting, with no intervening attributes or content, might break something. That part worked just fine, but the cuddled terminators weren't behaving properly at all.

What should happen is
  1. The Text handler sees ']]', quits processing, and returns
  2. The Tag handler is called, sees ']]', which begins with its markup terminator, and claims responsibility
  3. Tag consumes its terminating markup (']'), and pushes the remainder (also ']') back on to the Lexer's next-token stack
  4. Tag is called again and the final ] is consumed
Instead, the first call to Tag was seeing ']]' and seemingly consuming both brackets. But only seemingly; what actually happened requires a little more granularity
  1. Text sees ']]', pushes it onto the Lexer's next-token stack, quits processing, and returns
  2. Lexer, knowing that it is at physical EOF, sets its status to 1 (EOF, but I have synthetic tokens in stack)
  3. Tag is called, consuming a token from the stack, which happens to be the only token in the stack.
  4. Lexer gives Tag the token and sets its status to 99 (EOF, no synthetic tokens)
  5. Tag calls valid_ending_pos which has the statement 'return 1 if $l->status == 99' right up top. This is meant to let Triggers and Verbatim contexts always have a valid ending at EOF.
  6. Lexer's status is 99
  7. Tag does not execute the code at the bottom of valid_ending_pos, which performs the separation of terminating markup from the current token and the subsequent injection of the remainder onto Lexer's next-token stack (which would have reset its status to 1), allowing for cuddled terminators (and sloppiness like 'foo]bar').
  8. The current token, containing the terminators for both open tags, is thrown away
  9. The test suite calls Tag again to process the second terminator.
  10. Lexer returns undef; tests fail
Any sufficiently complex system will always be able to bite you in the ass. Testing is your only defense. So write more tests, today! Buy war bonds! Victory!

No comments: