RFC format: Difference between revisions
(→Rendering: new section) |
m (→Rendering: typo) |
||
| Line 244: | Line 244: | ||
==Rendering== | ==Rendering== | ||
Regardless of whether the target medium is print or digital, the RFC format is still meant to be a monospaced, paper-friendly medium that never exceeds 72 columns in width. Since RFC is page-aware, unlike TTY, it sets a maximum page height of 55 rows – this leaves 2 header rows with 1 spacer row and 1 footer row with 1 spacer row for 50 | Regardless of whether the target medium is print or digital, the RFC format is still meant to be a monospaced, paper-friendly medium that never exceeds 72 columns in width. Since RFC is page-aware, unlike TTY, it sets a maximum page height of 55 rows – this leaves 2 header rows with 1 spacer row and 1 footer row with 1 spacer row for 50 rows of content per page. | ||
[[Category:Sirius DOS components]][[Category:Byblos components]] | [[Category:Sirius DOS components]][[Category:Byblos components]] | ||
Revision as of 13:54, 21 February 2025
The RFC format is a renderer-agnostic structured document format built using ASCII C0 control codes as a superset of the TTY text format. It uses other C0 control codes not employed by printers (and thereby not known to TTY) to create rich document structure otherwise attained with advanced systems such as Troff and LaTeX. The format does this instead of providing a meta syntax like Markdown, RTF or HTML so that it can satisfy its design constraint of being legible as TTY formatted ASCII text if all superset control codes are blindly (as in, without any parsing knowledge required) stripped out.
The recommended file extension for these documents is .rfc. The magic number all RFC files should begin with is, in hexadecimal, 06 15 06 15, representing two interleaved pairs of ASCII ACK and NAK C0 control codes. The RFC format reserves all C0 control codes not used by the TTY format except for NUL (0x00), BEL (0x07), SUB (0x1A) and ESC (0x1B).
The RFC format is called as such because it is designed to mimic the production of the IETF's RFC XML. It is not technically related to any of the IETF's tools for producing or validating RFCs.
Embed encoding
Sixteen extraneous ASCII C0 control codes are hijacked as a binary encoding medium so that each 7-bit ASCII character provides 4 bits of arbitrary binary data:
| Character name | Code | Enc. | Dec. |
|---|---|---|---|
| End of Text | ETX | 0x03 | 0x0 |
| End of Transmission | EOT | 0x04 | 0x1 |
| Enquiry | ENQ | 0x05 | 0x2 |
| Data Link Escape | DLE | 0x10 | 0x3 |
| Device Control 1 | DC1 | 0x11 | 0x4 |
| Device Control 2 | DC2 | 0x12 | 0x5 |
| Device Control 3 | DC3 | 0x13 | 0x6 |
| Device Control 4 | DC4 | 0x14 | 0x7 |
| Synchronous Idle | SYN | 0x16 | 0x8 |
| End of Transmission Block | ETB | 0x17 | 0x9 |
| Cancel | CAN | 0x18 | 0xA |
| End of Medium | EM | 0x19 | 0xB |
| File Separator | FS | 0x1C | 0xC |
| Group Separator | GS | 0x1D | 0xD |
| Record Separator | RS | 0x1E | 0xE |
| Unit Separator | US | 0x1F | 0xF |
This 4-bit medium is then employed to harbour an MSB sentinel variable-length integer encoding format: each 7-bit ASCII character contains 3 bits of meaningful information, while the high fourth bit is used to indicate whether the control character immediately following the current one should be collated with it as a single number. As with most variable-width binary encodings, it is invalid and undefined when control sequences are not properly encoded; they should always have zero or more of the above sixteen control characters with the high bit set in direct sequence followed by one and only one such control characters where the high bit is LOW.
Rich format
The rich format is paragraph-oriented and does not recognise any semantic distinction between whitespace characters; Line Feed 0x0A, Carriage Return 0x0D and Space 0x20 are all equivalent to one another. Parsing will collapse multitudes of such spaces into one inside paragraphs and conjoin source lines spanning many physical lines into one logical line before rendering them into monotype again with 72 characters/line.
Header for metadata
If an RFC format parser encounters an ASCII Start of Heading SOH before seeing any visible characters (including spacing) in the stream, it will parse it as a heading metadata block. Fields are simple ASCII text separated by CRLF pairs, making them dumb-printable:
- Full document title
- Author name
- Author address
- Author telephone
- Author e-mail
- Copyright date
- Copyright assignment
- Licence
Omitting any of the fields is done by leaving them empty. They are parsed in order, and therefore there should always be exactly seven CRLF pairs in a heading metadata block. This block is then terminated by an ASCII Start of Text STX character, after which the normal document text and whatever command formatting it bears appears immediately until EOF. As a whole, this header is optional.
General formatting
The number encoded in this way is, once decoded and in memory, interpreted as a formatting command. Parsing at this point should be forgiving; for example, excessive closing commands should be ignored. Here is a table of commands with their encodings in octal:
| Command description | Number |
|---|---|
| Begin heading, level 1 | 001 |
| Begin heading, level 2 | 002 |
| Begin heading, level 3 | 003 |
| Begin heading, level 4 | 004 |
| Begin heading, level 5 | 005 |
| Begin heading, level 6 | 006 |
| End heading (relative) | 007 |
| Begin code block | 010 |
| End code block | 011 |
| Begin block quote | 012 |
| End block quote | 013 |
| Begin block quote credit | 014 |
| End block quote credit | 015 |
| Begin hard shadow | 016 |
| End hard shadow | 017 |
| Begin hyperlink display text | 020 |
| End hyperlink display text | 021 |
| Begin hyperlink shadow bridge | 022 |
| End hyperlink shadow bridge | 023 |
| Begin hyperlink URI | 024 |
| End hyperlink URI | 025 |
| Begin centred text | 026 |
| Begin right-aligned text | 027 |
| Begin justified text | 030 |
| Revert to left-aligned text | 031 |
| Render table of contents | 032 |
| Begin internal link target ident | 033 |
| End internal link target ident | 034 |
| Begin internal link display text | 035 |
| End internal link display text | 036 |
| Begin internal link shadow bridge | 037 |
| End internal link shadow bridge | 040 |
| Begin internal link reference ident | 041 |
| End internal link reference ident | 042 |
| Paragraph break | 043 |
| Line break | 044 |
| Begin all capitals | 045 |
| End all capitals | 046 |
| Whole paragraph indent | 047 |
Shadows
The RFC format employs a simple rendering hatchet called shadows. The basic action of a shadow is, from its beginning mark to its end, it hides the text encased within from rich renderings of the document; this is called a hard shadow. A softer variant of this exists which are called shadow bridges – these are used to interlink a hyperlink or internal link's display text to its target destination, providing a semantic connection and a hiding of punctuation that would be needed in plain text for legibility, such as spacing and parentheses. Shadows provide a concise and simple way to hide plain text boilerplate from rich renderings of RFC documents while still providing them for dumb formatting strippers to create plain text renditions as the authors intended them without having to do any high-level reconstructions.
Rendering
Regardless of whether the target medium is print or digital, the RFC format is still meant to be a monospaced, paper-friendly medium that never exceeds 72 columns in width. Since RFC is page-aware, unlike TTY, it sets a maximum page height of 55 rows – this leaves 2 header rows with 1 spacer row and 1 footer row with 1 spacer row for 50 rows of content per page.