RFC format

From XionKB
Revision as of 09:09, 21 February 2025 by Alexander (talk | contribs) (Created page with "The '''RFC format''' is a renderer-agnostic structured document format built using ASCII C0 control codes as a superset of the TTY text format. It uses other C0 control codes not employed by printers (and thereby not known to TTY) to create rich document structure otherwise attained with advanced systems such as Troff and LaTeX. The format does this instead of providing a meta syntax like Markdown, RTF or HTML so that it can satisfy its design constraint o...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

The RFC format is a renderer-agnostic structured document format built using ASCII C0 control codes as a superset of the TTY text format. It uses other C0 control codes not employed by printers (and thereby not known to TTY) to create rich document structure otherwise attained with advanced systems such as Troff and LaTeX. The format does this instead of providing a meta syntax like Markdown, RTF or HTML so that it can satisfy its design constraint of being legible as TTY formatted ASCII text if all superset control codes are blindly (as in, without any parsing knowledge required) stripped out.

The recommended file extension for these documents is .rfc. The magic number all RFC files should begin with is, in hexadecimal, 06 15 06 15, representing two interleaved pairs of ASCII ACK and NAK C0 control codes. The RFC format reserves all C0 control codes not used by the TTY format except for NUL (0x00), BEL (0x07), SUB (0x1A) and ESC (0x1B).

The RFC format is called as such because it is designed to mimic the production of the IETF's RFC XML. It is not technically related to any of the IETF's tools for producing or validating RFCs.

Embed encoding

Sixteen extraneous ASCII C0 control codes are hijacked as a binary encoding medium so that each 7-bit ASCII character provides 4 bits of arbitrary binary data:

Character name Code Enc. Dec.
End of Text ETX 0x03 0x0
End of Transmission EOT 0x04 0x1
Enquiry ENQ 0x05 0x2
Data Link Escape DLE 0x10 0x3
Device Control 1 DC1 0x11 0x4
Device Control 2 DC2 0x12 0x5
Device Control 3 DC3 0x13 0x6
Device Control 4 DC4 0x14 0x7
Synchronous Idle SYN 0x16 0x8
End of Transmission Block ETB 0x17 0x9
Cancel CAN 0x18 0xA
End of Medium EM 0x19 0xB
File Separator FS 0x1C 0xC
Group Separator GS 0x1D 0xD
Record Separator RS 0x1E 0xE
Unit Separator US 0x1F 0xF

This 4-bit medium is then employed to harbour an MSB sentinel variable-length integer encoding format: each 7-bit ASCII character contains 3 bits of meaningful information, while the high fourth bit is used to indicate whether the control character immediately following the current one should be collated with it as a single number. As with most variable-width binary encodings, it is invalid and undefined when control sequences are not properly encoded; they should always have zero or more of the above sixteen control characters with the high bit set in direct sequence followed by one and only one such control characters where the high bit is LOW.

Rich format

The rich format is paragraph-oriented and does not recognise any semantic distinction between whitespace characters; Line Feed 0x0A, Carriage Return 0x0D and Space 0x20 are all equivalent to one another. Parsing will collapse multitudes of such spaces into one inside paragraphs and conjoin source lines spanning many physical lines into one logical line before rendering them into monotype again with 72 characters/line.

Header for metadata

If an RFC format parser encounters an ASCII Start of Heading SOH before seeing any visible characters (including spacing) in the stream, it will parse it as a heading metadata block. Fields are simple ASCII text separated by CRLF pairs, making them dumb-printable:

  1. Full document title
  2. Author name
  3. Author address
  4. Author telephone
  5. Author e-mail
  6. Copyright date
  7. Copyright assignment
  8. Licence

Omitting any of the fields is done by leaving them empty. They are parsed in order, and therefore there should always be exactly seven CRLF pairs in a heading metadata block. This block is then terminated by an ASCII Start of Text STX character, after which the normal document text and whatever command formatting it bears appears immediately until EOF. As a whole, this header is optional.

General formatting

The number encoded in this way is, once decoded and in memory, interpreted as a formatting command. Parsing at this point should be forgiving; for example, excessive closing commands should be ignored. Here is a table of commands with their encodings in octal:

Command description Number
Begin heading, level 1 001
Begin heading, level 2 002
Begin heading, level 3 003
Begin heading, level 4 004
Begin heading, level 5 005
Begin heading, level 6 006
End heading (relative) 007
Begin code block 010
End code block 011
Begin block quote 012
End block quote 013
Begin block quote credit 014
End block quote credit 015
Begin hard shadow 016
End hard shadow 017
Begin hyperlink display text 020
End hyperlink display text 021
Begin hyperlink shadow bridge 022
End hyperlink shadow bridge 023
Begin hyperlink URI 024
End hyperlink URI 025
Begin centred text 026
Begin right-aligned text 027
Begin justified text 030
Revert to left-aligned text 031
Render table of contents 032
Begin internal link target ident 033
End internal link target ident 034
Begin internal link display text 035
End internal link display text 036
Begin internal link shadow bridge 037
End internal link shadow bridge 040
Begin internal link reference ident 041
End internal link reference ident 042
Paragraph break 043
Line break 044
Begin all capitals 045
End all capitals 046
Whole paragraph indent 047

Shadows

The RFC format employs a simple rendering hatchet called shadows. The basic action of a shadow is, from its beginning mark to its end, it hides the text encased within from rich renderings of the document; this is called a hard shadow. A softer variant of this exists which are called shadow bridges – these are used to interlink a hyperlink or internal link's display text to its target destination, providing a semantic connection and a hiding of punctuation that would be needed in plain text for legibility, such as spacing and parentheses. Shadows provide a concise and simple way to hide plain text boilerplate from rich renderings of RFC documents while still providing them for dumb formatting strippers to create plain text renditions as the authors intended them without having to do any high-level reconstructions.