RFC format: Difference between revisions

From XionKB
Jump to navigationJump to search
m (→‎Rendering: typo)
(→‎Figures: new section)
Line 121: Line 121:
! Command description
! Command description
! Number
! Number
|-
| No-op
| <tt>0000</tt>
|-
|-
| Begin heading, level 1
| Begin heading, level 1
| <tt>001</tt>
| <tt>0001</tt>
|-
|-
| Begin heading, level 2
| Begin heading, level 2
| <tt>002</tt>
| <tt>0002</tt>
|-
|-
| Begin heading, level 3
| Begin heading, level 3
| <tt>003</tt>
| <tt>0003</tt>
|-
|-
| Begin heading, level 4
| Begin heading, level 4
| <tt>004</tt>
| <tt>0004</tt>
|-
|-
| Begin heading, level 5
| Begin heading, level 5
| <tt>005</tt>
| <tt>0005</tt>
|-
|-
| Begin heading, level 6
| Begin heading, level 6
| <tt>006</tt>
| <tt>0006</tt>
|-
|-
| End heading (relative)
| End heading (relative)
| <tt>007</tt>
| <tt>0007</tt>
|-
|-
| Begin code block
| Begin code block
| <tt>010</tt>
| <tt>0010</tt>
|-
|-
| End code block
| End code block
| <tt>011</tt>
| <tt>0011</tt>
|-
|-
| Begin block quote
| Begin block quote
| <tt>012</tt>
| <tt>0012</tt>
|-
|-
| End block quote
| End block quote
| <tt>013</tt>
| <tt>0013</tt>
|-
|-
| Begin block quote credit
| Begin block quote credit
| <tt>014</tt>
| <tt>0014</tt>
|-
|-
| End block quote credit
| End block quote credit
| <tt>015</tt>
| <tt>0015</tt>
|-
|-
| Begin hard shadow
| Begin hard shadow
| <tt>016</tt>
| <tt>0016</tt>
|-
|-
| End hard shadow
| End hard shadow
| <tt>017</tt>
| <tt>0017</tt>
|-
|-
| Begin hyperlink display text
| Begin hyperlink display text
| <tt>020</tt>
| <tt>0020</tt>
|-
|-
| End hyperlink display text
| End hyperlink display text
| <tt>021</tt>
| <tt>0021</tt>
|-
|-
| Begin hyperlink shadow bridge
| Begin hyperlink shadow bridge
| <tt>022</tt>
| <tt>0022</tt>
|-
|-
| End hyperlink shadow bridge
| End hyperlink shadow bridge
| <tt>023</tt>
| <tt>0023</tt>
|-
|-
| Begin hyperlink URI
| Begin hyperlink URI
| <tt>024</tt>
| <tt>0024</tt>
|-
|-
| End hyperlink URI
| End hyperlink URI
| <tt>025</tt>
| <tt>0025</tt>
|-
|-
| Begin centred text
| Begin centred text
| <tt>026</tt>
| <tt>0026</tt>
|-
|-
| Begin right-aligned text
| Begin right-aligned text
| <tt>027</tt>
| <tt>0027</tt>
|-
|-
| Begin justified text
| Begin justified text
| <tt>030</tt>
| <tt>0030</tt>
|-
|-
| Revert to left-aligned text
| Revert to left-aligned text
| <tt>031</tt>
| <tt>0031</tt>
|-
|-
| Render table of contents
| Render table of contents
| <tt>032</tt>
| <tt>0032</tt>
|-
|-
| Begin internal link target ident
| Begin internal link target ident
| <tt>033</tt>
| <tt>0033</tt>
|-
|-
| End internal link target ident
| End internal link target ident
| <tt>034</tt>
| <tt>0034</tt>
|-
|-
| Begin internal link display text
| Begin internal link display text
| <tt>035</tt>
| <tt>0035</tt>
|-
|-
| End internal link display text
| End internal link display text
| <tt>036</tt>
| <tt>0036</tt>
|-
|-
| Begin internal link shadow bridge
| Begin internal link shadow bridge
| <tt>037</tt>
| <tt>0037</tt>
|-
|-
| End internal link shadow bridge
| End internal link shadow bridge
| <tt>040</tt>
| <tt>0040</tt>
|-
|-
| Begin internal link reference ident
| Begin internal link reference ident
| <tt>041</tt>
| <tt>0041</tt>
|-
|-
| End internal link reference ident
| End internal link reference ident
| <tt>042</tt>
| <tt>0042</tt>
|-
|-
| Paragraph break
| Paragraph break
| <tt>043</tt>
| <tt>0043</tt>
|-
|-
| Line break
| Line break
| <tt>044</tt>
| <tt>0044</tt>
|-
|-
| Begin all capitals
| Begin all capitals
| <tt>045</tt>
| <tt>0045</tt>
|-
|-
| End all capitals
| End all capitals
| <tt>046</tt>
| <tt>0046</tt>
|-
|-
| Whole paragraph indent
| Whole paragraph indent
| <tt>047</tt>
| <tt>0047</tt>
|-
| Figure 5 lines, &frac14; width
| <tt>0100</tt>
|-
| Figure 10 lines, &frac14; width
| <tt>0101</tt>
|-
| Figure 15 lines, &frac14; width
| <tt>0102</tt>
|-
| Figure 20 lines, &frac14; width
| <tt>0103</tt>
|-
| Figure 25 lines, &frac14; width
| <tt>0104</tt>
|-
| Figure 30 lines, &frac14; width
| <tt>0105</tt>
|-
| Figure 40 lines, &frac14; width
| <tt>0106</tt>
|-
| Figure 50 lines, &frac14; width
| <tt>0107</tt>
|-
| Figure 5 lines, &frac12; width
| <tt>0110</tt>
|-
| Figure 10 lines, &frac12; width
| <tt>0111</tt>
|-
| Figure 15 lines, &frac12; width
| <tt>0112</tt>
|-
| Figure 20 lines, &frac12; width
| <tt>0113</tt>
|-
| Figure 25 lines, &frac12; width
| <tt>0114</tt>
|-
| Figure 30 lines, &frac12; width
| <tt>0115</tt>
|-
| Figure 40 lines, &frac12; width
| <tt>0116</tt>
|-
| Figure 50 lines, &frac12; width
| <tt>0117</tt>
|-
| Figure 5 lines, &frac34; width
| <tt>0120</tt>
|-
| Figure 10 lines, &frac34; width
| <tt>0121</tt>
|-
| Figure 15 lines, &frac34; width
| <tt>0122</tt>
|-
| Figure 20 lines, &frac34; width
| <tt>0123</tt>
|-
| Figure 25 lines, &frac34; width
| <tt>0124</tt>
|-
| Figure 30 lines, &frac34; width
| <tt>0125</tt>
|-
| Figure 40 lines, &frac34; width
| <tt>0126</tt>
|-
| Figure 50 lines, &frac34; width
| <tt>0127</tt>
|-
| Figure 5 lines, full width
| <tt>0130</tt>
|-
| Figure 10 lines, full width
| <tt>0131</tt>
|-
| Figure 15 lines, full width
| <tt>0132</tt>
|-
| Figure 20 lines, full width
| <tt>0133</tt>
|-
| Figure 25 lines, full width
| <tt>0134</tt>
|-
| Figure 30 lines, full width
| <tt>0135</tt>
|-
| Figure 40 lines, full width
| <tt>0136</tt>
|-
| Figure 50 lines, full width
| <tt>0137</tt>
|}
|}


===Shadows===
===Shadows===
The RFC format employs a simple rendering hatchet called '''shadows'''. The basic action of a shadow is, from its beginning mark to its end, it hides the text encased within from rich renderings of the document; this is called a '''hard shadow'''. A softer variant of this exists which are called '''shadow bridges''' – these are used to interlink a hyperlink or internal link's display text to its target destination, providing a semantic connection and a hiding of punctuation that would be needed in plain text for legibility, such as spacing and parentheses. Shadows provide a concise and simple way to hide plain text boilerplate from rich renderings of RFC documents while still providing them for dumb formatting strippers to create plain text renditions as the authors intended them without having to do any high-level reconstructions.
The RFC format employs a simple rendering hatchet called '''shadows'''. The basic action of a shadow is, from its beginning mark to its end, it hides the text encased within from rich renderings of the document; this is called a '''hard shadow'''. A softer variant of this exists which are called '''shadow bridges''' – these are used to interlink a hyperlink or internal link's display text to its target destination, providing a semantic connection and a hiding of punctuation that would be needed in plain text for legibility, such as spacing and parentheses. Shadows provide a concise and simple way to hide plain text boilerplate from rich renderings of RFC documents while still providing them for dumb formatting strippers to create plain text renditions as the authors intended them without having to do any high-level reconstructions.
===Figures===
The RFC format does not provide any direct means for embedding figure data (beyond the aforementioned ASCII tables which are not true 'figures' anyway). However, it does provide a comprehensive command set for accommodating figures into the rigid and portable geometry of RFC documents. True figures could then be emplaced into this 'saved space' in the course of the rendering process, or if such rendering is not practical, the space could be left empty in a visually acceptable way. Commands 0100-0137 are provisioned for this purpose and provide a simple matrix of figure sizes: line height may be 5, 10, 15, 20, 25, 30, 40 or 50 high and the figure may be &frac14; width (17 columns), &frac12; width (35 columns), &frac34; width (51 columns) or full width (68 columns). Figures are always centred, and in the case of &frac14; and &frac34; width figures will have their remainder column on the right, and their top is the line on which the figure command first appears. Rendering of the figure involves an implicit carriage return before centring and as usual must respect the overwriting behaviour of the [[TTY format]], making it possible to overlay text on top of figures directly in the document. Document authors should navigate as necessary around their inserted figure according to its prescribed size.


==Rendering==
==Rendering==

Revision as of 06:02, 3 March 2025

The RFC format is a renderer-agnostic structured document format built using ASCII C0 control codes as a superset of the TTY text format. It uses other C0 control codes not employed by printers (and thereby not known to TTY) to create rich document structure otherwise attained with advanced systems such as Troff and LaTeX. The format does this instead of providing a meta syntax like Markdown, RTF or HTML so that it can satisfy its design constraint of being legible as TTY formatted ASCII text if all superset control codes are blindly (as in, without any parsing knowledge required) stripped out.

The recommended file extension for these documents is .rfc. The magic number all RFC files should begin with is, in hexadecimal, 06 15 06 15, representing two interleaved pairs of ASCII ACK and NAK C0 control codes. The RFC format reserves all C0 control codes not used by the TTY format except for NUL (0x00), BEL (0x07), SUB (0x1A) and ESC (0x1B).

The RFC format is called as such because it is designed to mimic the production of the IETF's RFC XML. It is not technically related to any of the IETF's tools for producing or validating RFCs.

Embed encoding

Sixteen extraneous ASCII C0 control codes are hijacked as a binary encoding medium so that each 7-bit ASCII character provides 4 bits of arbitrary binary data:

Character name Code Enc. Dec.
End of Text ETX 0x03 0x0
End of Transmission EOT 0x04 0x1
Enquiry ENQ 0x05 0x2
Data Link Escape DLE 0x10 0x3
Device Control 1 DC1 0x11 0x4
Device Control 2 DC2 0x12 0x5
Device Control 3 DC3 0x13 0x6
Device Control 4 DC4 0x14 0x7
Synchronous Idle SYN 0x16 0x8
End of Transmission Block ETB 0x17 0x9
Cancel CAN 0x18 0xA
End of Medium EM 0x19 0xB
File Separator FS 0x1C 0xC
Group Separator GS 0x1D 0xD
Record Separator RS 0x1E 0xE
Unit Separator US 0x1F 0xF

This 4-bit medium is then employed to harbour an MSB sentinel variable-length integer encoding format: each 7-bit ASCII character contains 3 bits of meaningful information, while the high fourth bit is used to indicate whether the control character immediately following the current one should be collated with it as a single number. As with most variable-width binary encodings, it is invalid and undefined when control sequences are not properly encoded; they should always have zero or more of the above sixteen control characters with the high bit set in direct sequence followed by one and only one such control characters where the high bit is LOW.

Rich format

The rich format is paragraph-oriented and does not recognise any semantic distinction between whitespace characters; Line Feed 0x0A, Carriage Return 0x0D and Space 0x20 are all equivalent to one another. Parsing will collapse multitudes of such spaces into one inside paragraphs and conjoin source lines spanning many physical lines into one logical line before rendering them into monotype again with 72 characters/line.

Header for metadata

If an RFC format parser encounters an ASCII Start of Heading SOH before seeing any visible characters (including spacing) in the stream, it will parse it as a heading metadata block. Fields are simple ASCII text separated by CRLF pairs, making them dumb-printable:

  1. Full document title
  2. Author name
  3. Author address
  4. Author telephone
  5. Author e-mail
  6. Copyright date
  7. Copyright assignment
  8. Licence

Omitting any of the fields is done by leaving them empty. They are parsed in order, and therefore there should always be exactly seven CRLF pairs in a heading metadata block. This block is then terminated by an ASCII Start of Text STX character, after which the normal document text and whatever command formatting it bears appears immediately until EOF. As a whole, this header is optional.

General formatting

The number encoded in this way is, once decoded and in memory, interpreted as a formatting command. Parsing at this point should be forgiving; for example, excessive closing commands should be ignored. Here is a table of commands with their encodings in octal:

Command description Number
No-op 0000
Begin heading, level 1 0001
Begin heading, level 2 0002
Begin heading, level 3 0003
Begin heading, level 4 0004
Begin heading, level 5 0005
Begin heading, level 6 0006
End heading (relative) 0007
Begin code block 0010
End code block 0011
Begin block quote 0012
End block quote 0013
Begin block quote credit 0014
End block quote credit 0015
Begin hard shadow 0016
End hard shadow 0017
Begin hyperlink display text 0020
End hyperlink display text 0021
Begin hyperlink shadow bridge 0022
End hyperlink shadow bridge 0023
Begin hyperlink URI 0024
End hyperlink URI 0025
Begin centred text 0026
Begin right-aligned text 0027
Begin justified text 0030
Revert to left-aligned text 0031
Render table of contents 0032
Begin internal link target ident 0033
End internal link target ident 0034
Begin internal link display text 0035
End internal link display text 0036
Begin internal link shadow bridge 0037
End internal link shadow bridge 0040
Begin internal link reference ident 0041
End internal link reference ident 0042
Paragraph break 0043
Line break 0044
Begin all capitals 0045
End all capitals 0046
Whole paragraph indent 0047
Figure 5 lines, ¼ width 0100
Figure 10 lines, ¼ width 0101
Figure 15 lines, ¼ width 0102
Figure 20 lines, ¼ width 0103
Figure 25 lines, ¼ width 0104
Figure 30 lines, ¼ width 0105
Figure 40 lines, ¼ width 0106
Figure 50 lines, ¼ width 0107
Figure 5 lines, ½ width 0110
Figure 10 lines, ½ width 0111
Figure 15 lines, ½ width 0112
Figure 20 lines, ½ width 0113
Figure 25 lines, ½ width 0114
Figure 30 lines, ½ width 0115
Figure 40 lines, ½ width 0116
Figure 50 lines, ½ width 0117
Figure 5 lines, ¾ width 0120
Figure 10 lines, ¾ width 0121
Figure 15 lines, ¾ width 0122
Figure 20 lines, ¾ width 0123
Figure 25 lines, ¾ width 0124
Figure 30 lines, ¾ width 0125
Figure 40 lines, ¾ width 0126
Figure 50 lines, ¾ width 0127
Figure 5 lines, full width 0130
Figure 10 lines, full width 0131
Figure 15 lines, full width 0132
Figure 20 lines, full width 0133
Figure 25 lines, full width 0134
Figure 30 lines, full width 0135
Figure 40 lines, full width 0136
Figure 50 lines, full width 0137

Shadows

The RFC format employs a simple rendering hatchet called shadows. The basic action of a shadow is, from its beginning mark to its end, it hides the text encased within from rich renderings of the document; this is called a hard shadow. A softer variant of this exists which are called shadow bridges – these are used to interlink a hyperlink or internal link's display text to its target destination, providing a semantic connection and a hiding of punctuation that would be needed in plain text for legibility, such as spacing and parentheses. Shadows provide a concise and simple way to hide plain text boilerplate from rich renderings of RFC documents while still providing them for dumb formatting strippers to create plain text renditions as the authors intended them without having to do any high-level reconstructions.

Figures

The RFC format does not provide any direct means for embedding figure data (beyond the aforementioned ASCII tables which are not true 'figures' anyway). However, it does provide a comprehensive command set for accommodating figures into the rigid and portable geometry of RFC documents. True figures could then be emplaced into this 'saved space' in the course of the rendering process, or if such rendering is not practical, the space could be left empty in a visually acceptable way. Commands 0100-0137 are provisioned for this purpose and provide a simple matrix of figure sizes: line height may be 5, 10, 15, 20, 25, 30, 40 or 50 high and the figure may be ¼ width (17 columns), ½ width (35 columns), ¾ width (51 columns) or full width (68 columns). Figures are always centred, and in the case of ¼ and ¾ width figures will have their remainder column on the right, and their top is the line on which the figure command first appears. Rendering of the figure involves an implicit carriage return before centring and as usual must respect the overwriting behaviour of the TTY format, making it possible to overlay text on top of figures directly in the document. Document authors should navigate as necessary around their inserted figure according to its prescribed size.

Rendering

Regardless of whether the target medium is print or digital, the RFC format is still meant to be a monospaced, paper-friendly medium that never exceeds 72 columns in width. Since RFC is page-aware, unlike TTY, it sets a maximum page height of 55 rows – this leaves 2 header rows with 1 spacer row and 1 footer row with 1 spacer row for 50 rows of content per page.