C*: Difference between revisions

From XionKB
Jump to navigationJump to search
m (we got mangling lol)
 
(7 intermediate revisions by the same user not shown)
Line 3: Line 3:
|image=C*.svg
|image=C*.svg
|caption=Flavour image for the C* logo.
|caption=Flavour image for the C* logo.
|paradigm={{wp|Imperative programming|imperative}}, {{wp|Procedural programming|procedural}}
|paradigm={{wp|Imperative programming|imperative}}, {{wp|Procedural programming|procedural}}, {{wp|Structured programming|structured}}
|designer=[[User:Alexander|Alexander Nicholi]]
|by=[[User:Alexander|Alexander Nicholi]]
|appeared=December, 2020
|appeared=December, 2020
|typing={{wp|Type system#Static type checking|static}}, {{wp|Strong and weak typing|strong}}, {{wp|Manifest typing|manifest}}
|typing={{wp|Type system#Static type checking|static}}, {{wp|Strong and weak typing|strong}}, {{wp|Manifest typing|manifest}}, {{wp|Nominal type system|nominal}}, {{wp|Abstract type|concrete}}
|ext=<code>.cst</code>, <code>.hst</code>
|exts=<code>.cst</code>, <code>.hst</code>
|influencedby={{wp|Ada (programming language)|Ada}}, {{wp|C (programming language)|C}}, {{wp|C*|Thinking Machines C*}}, {{wp|D (programming language)|D}}, {{wp|Go (programming language)|Go}}
|influencedby={{wp|Ada (programming language)|Ada}}, {{wp|C (programming language)|C}}, {{wp|C*|Thinking Machines C*}}, {{wp|D (programming language)|D}}, {{wp|Go (programming language)|Go}}
|influenced=[[C~]], [[C♭]]
|influenced=[[C~]], [[C♭]]
}}'''C*''' (pronounced ''C star'') is an {{wp|Imperative programming|imperative}}, {{wp|Procedural programming|procedural}}, [[mechanicalism|mechanicalist]] {{wp|systems programming language}} created by [[User:Alexander|Alexander Nicholi]]. It facilitates comprehensive {{wp|compile time}} guarantees of fully arbitrary {{wp|Immutable object|mutability}} of {{wp|State (computer science)|state}}.
}}'''C*''' (pronounced ''C star'') is an {{wp|Imperative programming|imperative}}, {{wp|Procedural programming|procedural}}, [[mechanicalism|mechanicalist]] {{wp|systems programming language}} created by [[User:Alexander|Alexander Nicholi]]. It facilitates comprehensive {{wp|compile time}} guarantees of fully arbitrary {{wp|Immutable object|mutability}} of {{wp|State (computer science)|state}}. Work on it began in early 2020, and publications first started appearing towards the end of that year. Work on it has been ongoing ever since. The name C* is meant to "point to" the aspects of C which have been overlooked or even derided by the field of programming language theorists, chiefly its embodiment of data-oriented design and self-evident semantics.


==Background==
C* was created as a result of informatics research conducted by its creator that uncovered a new paradigm of programming called [[mechanicalism]], a school of thought about computing architecture that draws on such concepts as data-oriented design in direct contrast to [[functionalism]], the generic, extensible kind of programming taken for granted as universal before. C* leans into a property of C called ''communicativity'' by researcher Stephen Kell<ref>"[//dl.acm.org/doi/10.1145/3133850.3133867 Some were meant for C: the endurance of an unmanageable language."] Association for Computing Machinery. Retrieved 2024-02-01.</ref>, radically reforming its abstract machine model and introducing several new features that provide programmers more expressive power without compromise to the bit-precise yet portable niche that C occupies. In a nutshell, it is a more canonical language for generalised bare-metal software, such as drivers and kernels.
C*'s legacy begins in the earliest time I can remember, before I attended school. Our house got its first computer, and I fell into a deep love with it that I never shook away. Much fun there was with gaming, especially with old god games like SimCity and Caesar III. As I became an adolescent I began to want to command the computer to do things existing programs wouldn't do, and began learning how to program with Visual BASIC .NET. Over the next several years I learned many programming languages and paradigms, and learned baremetal programming through the Game Boy Advance when I was seventeen – but I began to notice that something was wrong with how things were going. Computers weren't always doing what they were supposed to, and I spent a few years figuring out why.


In my early twenties I figured out an explanation for the essence of [https://www.stilldrinking.org/programming-sucks this blog post] by Peter Welch. The truth is, most computer scientists don't know what they're doing, and I'm not just referring to hustlers writing JavaScript and overselling themselves. Even our industry leaders are beginning to pale in comparison to their priors. The most crucial kind of programming, systems programming, is stagnating in a major way, and almost nobody is even noticing this, let alone understanding why, and ''let alone'' finding anything to do to stop it. It's partly because systems programming is hard. Also because it's relatively new. Also because universities are too busy making money to care. At the end of the day, the outcome is simple: systems are made generically, and the quality of those systems suffers. Peter Welch's bridge analogy has less hyperbole as the years wear on.
==Overview==
Practicality of complexity before now has always been achieved through genericism. C* rejects this prescription, and instead capitalises on the explicit semantics of C. Genericism is anathema to systems programming, because it inherently obfuscates a program as a means to compartmentalise complexity. This does not address the complexity in a way that programmers can positively appreciate, rather trying to "do away" with it and let them pretend it is something abstract when it is not. The complexity itself is already more than enough for a human brain to handle – this abstract metaprogramming is surely a denial of the system in any real mind and makes for bad systems.
 
Instead, C* capitalises on C's communicativity to make obvious and clear the details of a system. It then provides a slew of new semantic mechanisms for constraining valid state, and a specification oriented around bits alone instead of abstract objects or octets of any length. This is called '''law &amp; order''' and it is the key feature of C*.
 
===General semantics===
There are many common words in informatics that primary literature on C* has to be careful with. Such effort plays a large part in substantiating the design philosophy of the language as well as its general adherence to mechanicalism as a school of thought. Among other terms, this includes:
* avoiding the term ''function'' to refer to callables, instead using ''routine''
* using ''octet'' to refer to the magnitude of data, reserving ''byte'' only for the mass of data (see [[Octet, not byte]])
 
C* also adds many new terms that build upon the existing lexicon of our field, including:
* deeper elucidation of the term ''marshalling'' with regard to data validation in addition to mere serialisation
* a new term ''suite'' to refer to semantically parallelisable statements joined by the comma operator in place of statement terminators
* ''segment routines'', or simply ''segroutines'', referring to labels inside routines with external visibility for jumping into
 
===Changes from C===
Like C, C* is an imperative programming language in the {{wp|ALGOL}} tradition. C* was derived specifically from ANSI C, that is, the C language as standardised by the {{wp|American National Standards Institute}}'s working group X3J11<ref>"[//web.archive.org/web/20160304062427/http://publicaa.ansi.org/sites/apdl/Documents/Standards%20Action/2005%20PDFs/SAV3648.pdf ANSI Standards Action Vol. 36, #48"] (PDF). American National Standards Institute. 2005-12-02. Archived from [http://publicaa.ansi.org/sites/apdl/Documents/Standards%20Action/2005%20PDFs/SAV3648.pdf the original] on 2016-03-04. Retrieved 2009-08-06.</ref>. From C, it inherits the following characteristics:
 
* a full set of {{wp|control flow}} keywords
* all arithmetic and bitwise operators present in C
* subroutines and procedures
* the {{expl|CPP|C Preprocessor}}
* the concept of the "compilation unit"


My own conviction led me to reject this, fuelled no doubt by the emergent work culture of cynicism and manager-blaming that has resulted from this pitiful state of affairs. Also motivating me to keep going is my conviction against the major ''false solutions'' to these problems of system misbehaviour, chief among them the Rust programming language and its false claim to "memory safety". The evangelism for it is the worst I have ever seen. I know there is an important problem trying to be solved by Rust, and it is in a sense that same problem that C* solves: system correctness made practical. Unfortunately for Rust folks, their error was made [https://nicholatian.com/safety at the drawing board].
However, C is often more illustratively described by what you might expect out of a language ''that it lacks,'' and C* is characteristically no different. Among other things, there are many high-level constructs that will never be provisioned by the C* language, including:


==Addressing complexity in systems==
* nested subroutine declarations
Practicality of complexity before now has always been achieved through genericism. C* rejects this prescription, and instead capitalises on the explicit semantics of C. Genericism is anathema to systems programming, because it inherently obfuscates a program as a means to compartmentalise complexity. This does not address the complexity in a way that programmers can positively appreciate, rather trying to "do away" with it and let them pretend it is something abstract when it is not. The complexity itself is already more than enough for a human brain to handle – this abstract metaprogramming is surely a denial of the system in any real mind and makes for bad systems.
* object-oriented programming facilities, including
** classes (or any form of non-POD structure really)
** parameter polymorphism
** operator overloading
** constructors/destructors
** methods
* garbage collection
* lambdas
* templates or generics
* reflection
* concurrency
* module declaration system (imports)
* test harnessing
* line comments
* strong typing
 
C* provides many great additions and changes to ANSI C instead. Changes and removals include:
 
* removal of all built-in types save for <code>void</code>
* removal of all support for all source encodings other than ASCII{{expl|*|with a doc comment exception}}
* removal of trigraphs
* change the meaning of <code>sizeof(&nbsp;)</code> to be denominated in bits rather than octets


Instead, C* capitalises on C's ''communicativity'' to make obvious and clear the details of a system. It then provides a slew of new semantic mechanisms for constraining valid state, and a specification oriented around bits alone instead of abstract objects or bytes of any length.
The changes are modest compared to the many fantastic additions C* brings to the language, including:


My essay entitled [https://nicholatian.com/lawandorder Law & Order in Software Engineering with C*] proselytises for C* through one of its key features: '''law & order'''. This is the part of C* that provides ''fully arbitrary mutability of state''.
* law &amp; order
** marshalling for run-time law enforcement
** transient variable lifetime traversal for compile-time law enforcement
* one ''fundamental primitive'' type, the <code>bit</code>
* bit-oriented struct definitions
* attributes for declaring complex behaviour about types for the compiler to implement
* the underscore pronoun, which serves many purposes
* flex structs
* explicit padding
* struct synonyms
* legal enums
* enumerated unions
* union alignment
* union punning
* Unicode literals
* binary numeric literals
* transmogrification
* multiple return values
* explicit inlining
* code transclusion
* several new arithmetic operators


==Law &amp; order==
===Laws===
===Laws===
C* provides this through a few new keywords and a key concept. First among these is the <code>law</code> keyword, which defines and optionally names constraints to be used on data types. This is semantically accomplished through a series of boolean expressions, like so:
C* provides law &amp; order through a few new keywords and a key concept. First among these is, of course, the <code>law</code> keyword, which defines and optionally names constraints to be used on data types. This is semantically accomplished through a series of boolean expressions, like so:
 
<div class="mw-code"><span style="color:rgba(24,24,24,0.333)">/* anonymous law applied to type */</span><br/><span style="color:hsl(212,60%,40%)">law</span> <span style="color:rgba(24,24,24,0.667)">:</span> <span style="color:hsl(155,45%,55%)">s32</span><br/><span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;_ <span style="color:rgba(24,24,24,0.667)">&gt;=</span> <span style="color:hsl(212,55%,55%)">0</span><span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;_ <span style="color:rgba(24,24,24,0.667)">&lt;</span> <span style="color:hsl(212,55%,55%)">1000</span><span style="color:rgba(24,24,24,0.667)">;<br/>};</span><br/><br/><span style="color:rgba(24,24,24,0.333)">/* named law */</span><br/><span style="color:hsl(212,60%,40%)">law</span> leet</span><br/><span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;_ <span style="color:rgba(24,24,24,0.667)">==</span> <span style="color:hsl(212,55%,55%)">1337</span><span style="color:rgba(24,24,24,0.667)">;<br/>};</span><br/><br/><span style="color:rgba(24,24,24,0.333)">/* applying previously declared law */</span><br/><span style="color:hsl(212,60%,40%)">law</span> leet</span> <span style="color:rgba(24,24,24,0.667)">:</span> <span style="color:hsl(155,45%,55%)">u16</span><span style="color:rgba(24,24,24,0.667)">;</span></div>
 
These laws are enforced upon the data types they apply to at compile time through an exhaustive program analysis. The compiler works backwards to create a control flow tree representing a transient variable lifetime, and exhaustively validates the initialisation and modifications of that transient variable against the laws enacted upon it. This is made practical by formalising the boundaries of the compilation unit as a border between "native" and "foreign" code, which in the essay is called the total system. Data which is confined to this total system gains the performance benefit of fully arbitrary validity checking at compile time.


<pre>
===Marshalling===
/* anonymous law applied to type */
To deal with foreign code, C* provides a mechanism called '''marshalling'''. This is a definition of marshalling expanded from {{wp|Marshalling (computer science)|its current meaning}} in computer science as a synonym for serialisation, to also include the act of validating data being serialised according to arbitrary schemas, or in the case of C*, arbitrary ''laws.'' All subroutines that are callable from outside the total system must provide marshalling blocks for validating their variables, like so:
law : s32
{
  _ >= 0;
  _ < 1000;
};


/* named law */
<div class="mw-code"><span style="color:hsl(212,60%,40%)">typedef</span> <span style="color:hsl(155,55%,40%)">bit</span><span style="color:rgba(24,24,24,0.667)">[</span><span style="color:hsl(212,55%,55%)">8</span><span style="color:rgba(24,24,24,0.667)">]</span> <span style="color:hsl(155,45%,55%)">mybyte</span><span style="color:rgba(24,24,24,0.667)">;</span><br/><span style="color:hsl(212,60%,40%)">typedef</span> <span style="color:hsl(155,55%,40%)">bit</span><span style="color:rgba(24,24,24,0.667)">[</span><span style="color:hsl(212,55%,55%)">32</span><span style="color:rgba(24,24,24,0.667)">]</span> <span style="color:hsl(155,45%,55%)">u32</span><span style="color:rgba(24,24,24,0.667)">;</span><br/><br/><span style="color:hsl(212,60%,40%)">law</span> <span style="color:rgba(24,24,24,0.667)">:</span> <span style="color:hsl(155,45%,55%)">mybyte</span><br/><span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;_ <span style="color:rgba(24,24,24,0.667)">&lt;</span> <span style="color:hsl(212,55%,55%)">255</span><span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;_ <span style="color:rgba(24,24,24,0.667)">!=</span> <span style="color:hsl(212,55%,55%)">0</span><span style="color:rgba(24,24,24,0.667)">;<br/>};</span><br/><br/><span style="color:hsl(155,55%,40%)">void</span> <span style="color:hsl(86,35%,55%)">foo</span><span style="color:rgba(24,24,24,0.667)">(</span> <span style="color:hsl(155,45%,55%)">mybyte</span> a<span style="color:rgba(24,24,24,0.667)">,</span> <span style="color:hsl(155,45%,55%)">u32</span> c <span style="color:rgba(24,24,24,0.667)">)<br/>{</span><br/>&#9;<span style="color:rgba(24,24,24,0.333)">/* marshalling happens one parameter at a time */</span><br/>&#9;<span style="color:hsl(212,60%,40%)">marshal</span> a<br/>&#9;<span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;&#9;<span style="color:hsl(212,60%,40%)">if</span><span style="color:rgba(24,24,24,0.667)">(</span>a <span style="color:rgba(24,24,24,0.667)">==</span> <span style="color:hsl(212,55%,55%)">0</span><span style="color:rgba(24,24,24,0.667)">)<br/>&#9;&#9;{</span><br/>&#9;&#9;&#9;<span style="color:rgba(24,24,24,0.333)">/* a MUST be set to a valid value through marshalling<br/>&#9;&#9;&#9; * but, we can check around that, smartly */</span><br/>&#9;&#9;&#9;a <span style="color:rgba(24,24,24,0.667)">=</span> <span style="color:hsl(212,55%,55%)">1</span><span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;&#9;&#9;<span style="color:hsl(212,60%,40%)">break</span><span style="color:rgba(24,24,24,0.667)">;<br/>&#9;&#9;}</span><br/><br/>&#9;&#9;<span style="color:rgba(24,24,24,0.333)">/* exit the routine otherwise */</span><br/>&#9;&#9;<span style="color:hsl(212,60%,40%)">return</span><span style="color:rgba(24,24,24,0.667)">;<br/>&#9;}</span><br/><br/>&#9;<span style="color:rgba(24,24,24,0.333)">/* this is the minimum required<br/>&#9; * if ANY laws enacted upon u32, this will fail to compile */</span><br/>&#9;<span style="color:hsl(212,60%,40%)">marshal</span> c<br/>&#9;<span style="color:rgba(24,24,24,0.667)">{ }</span><br/><br/>&#9;<span style="color:rgba(24,24,24,0.333)">/* alternatively, this minimal marshalling will do law checks<br/>&#9; * and return upon any failures, since marshal blocks are only<br/>&#9; * entered when the runtime checks for the laws fail */</span><br/>&#9;<span style="color:hsl(212,60%,40%)">marshal</span> c<br/>&#9;<span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;&#9;<span style="color:hsl(212,60%,40%)">return</span><span style="color:rgba(24,24,24,0.667)">;<br/>&#9;}<br/>}</span></div>
law leet
{
  _ == 1337;
};


/* applying previously declared law */
Marshal blocks can only reference the parameter they are marshalling. They may declare and modify local variables with automatic storage duration, and may only call pure routines with such parameters.
law leet : u16;
</pre>


These laws are enforced upon the data types they apply to at compile time through an exhaustive program analysis. The compiler works backwards to create a control flow tree representing a ''transient variable lifetime'', and exhaustively validates the initialisation and modifications of that transient variable against the laws enacted upon it. This is made practical by formalising the boundaries of the compilation unit as a border between "native" and "foreign" code, which in the essay is called the ''total system''. Data which is confined to this total system gains the performance benefit of fully arbitrary validity checking at compile time.
===Transient variable lifetime traversal===
'''Transient variable lifetime''' is a term coined to refer to the ephemeral object of interest in performing the C* compiler's most valuable task: ''compile-time law enforcement.'' It refers to the exhaustive graphing of data as it flows through various names in all possible call graphs of a program. In a nutshell, we imagine a "variable" as a kind of ephemeral object that "travels" around the program, being modified and passed on. Consider the following C* code:


===Marshalling===
<div class="mw-code"><span style="color:hsl(212,60%,40%)">typedef</span> <span style="color:hsl(155,55%,40%)">bit</span><span style="color:rgba(24,24,24,0.667)">[</span><span style="color:hsl(212,55%,55%)">32</span><span style="color:rgba(24,24,24,0.667)">]</span> <span style="color:hsl(155,45%,55%)">myu32</span><span style="color:rgba(24,24,24,0.667)">;</span><br/><span style="color:hsl(212,60%,40%)">typedef</span> <span style="color:hsl(155,55%,40%)">bit</span><span style="color:rgba(24,24,24,0.667)">[</span><span style="color:hsl(212,55%,55%)">32</span><span style="color:rgba(24,24,24,0.667)">]</span> <span style="color:hsl(155,45%,55%)">u32</span><span style="color:rgba(24,24,24,0.667)">;</span><br/><br/><span style="color:rgba(24,24,24,0.333)">/* Must be less than 100 and cannot ever equal 17 */</span><br/><span style="color:hsl(212,60%,40%)">law</span> <span style="color:rgba(24,24,24,0.667)">:</span> <span style="color:hsl(155,45%,55%)">myu32</span><br/><span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;_ <span style="color:rgba(24,24,24,0.667)">&lt;</span> <span style="color:hsl(212,55%,55%)">100</span><span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;_ <span style="color:rgba(24,24,24,0.667)">!=</span> <span style="color:hsl(212,55%,55%)">17</span><span style="color:rgba(24,24,24,0.667)">;<br/>};</span><br/><br/><span style="color:rgba(24,24,24,0.333)">/* Fibonacci sequence will satisfy both of those constraints, but how do we know? */</span><br/><span style="color:hsl(155,55%,40%)">void</span> <span style="color:hsl(86,35%,55%)">fibonacci</span><span style="color:rgba(24,24,24,0.667)">(</span> <span style="color:hsl(155,55%,40%)">void</span> <span style="color:rgba(24,24,24,0.667)">)<br/>{</span><br/>&#9;<span style="color:hsl(155,45%,55%)">u32</span> i<span style="color:rgba(24,24,24,0.667)">,</span> n<span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;<span style="color:hsl(155,45%,55%)">myu32</span> t0<span style="color:rgba(24,24,24,0.667)">,</span> t1<span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;<span style="color:hsl(155,45%,55%)">u32</span> tn<span style="color:rgba(24,24,24,0.667)">;</span><br/><br/>&#9;t0 <span style="color:rgba(24,24,24,0.667)">=</span> <span style="color:hsl(212,55%,55%)">0</span><span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;t1 <span style="color:rgba(24,24,24,0.667)">=</span> <span style="color:hsl(212,55%,55%)">1</span><span style="color:rgba(24,24,24,0.667)">;</span><br/><br/>&#9;<span style="color:rgba(24,24,24,0.333)">/* print the first two terms */</span><br/>&#9;<span style="color:hsl(86,30%,65%)">fprintf</span><span style="color:rgba(24,24,24,0.667)">(</span> stdout<span style="color:rgba(24,24,24,0.667)">,</span> <span style="color:hsl(3,50%,55%)">"</span><span style="color:hsl(3,55%,40%)">Fibonacci series: %d, %d</span><span style="color:hsl(3,50%,55%)">"</span><span style="color:rgba(24,24,24,0.667)">,</span> t0<span style="color:rgba(24,24,24,0.667)">,</span> t1 <span style="color:rgba(24,24,24,0.667)">);</span><br/><br/>&#9;<span style="color:rgba(24,24,24,0.333)">/* print 3rd to 12th terms */</span><br/>&#9;<span style="color:hsl(212,60%,40%)">for</span><span style="color:rgba(24,24,24,0.667)">(</span><span style="color:hsl(155,45%,55%)">u32</span>&nbsp;i <span style="color:rgba(24,24,24,0.667)">=</span> <span style="color:hsl(212,55%,55%)">2</span><span style="color:rgba(24,24,24,0.667)">;</span> i <span style="color:rgba(24,24,24,0.667)">&lt;</span> <span style="color:hsl(212,55%,55%)">12</span><span style="color:rgba(24,24,24,0.667)">; ++</span>i<span style="color:rgba(24,24,24,0.667)">)<br/>&#9;{</span><br/>&#9;&#9;tn <span style="color:rgba(24,24,24,0.667)">=</span> t0 <span style="color:rgba(24,24,24,0.667)">+</span> t1<span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;&#9;<span style="color:hsl(86,30%,65%)">fprintf</span><span style="color:rgba(24,24,24,0.667)">(</span> stdout,</span> <span style="color:hsl(3,50%,55%)">"</span><span style="color:hsl(3,55%,40%)">, %d</span><span style="color:hsl(3,50%,55%)">"</span><span style="color:rgba(24,24,24,0.667)">,</span> tn <span style="color:rgba(24,24,24,0.667)">);</span><br/>&#9;&#9;t0 <span style="color:rgba(24,24,24,0.667)">=</span> t1<span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;&#9;t1 <span style="color:rgba(24,24,24,0.667)">=</span> tn<span style="color:rgba(24,24,24,0.667)">;<br/>&#9;}<br/>}</span></div>
To deal with foreign code, C* provides a mechanism called '''marshalling'''. This is a definition of marshalling expanded from its current meaning in computer science as a synonym for ''serialisation'', to also include the act of ''validating'' data being serialised according to arbitrary schemas, or in the case of C*, arbitrary laws. All subroutines that are callable from outside the total system must provide marshalling blocks for validating their variables, like so:
 
Going through the Fibonacci sequence, we know that if we limit the number of terms to 12, we will never reach 100. But how does the C* compiler break this down?
 
It evaluates the possible values of each variable term that it is enforcing at every point they are modified, in an exhaustive recursive fashion. This means that the algorithmic complexity of verification is proportional to the algorithmic complexity of the program being verified. The verification algorithm will first minimise the possible program space by factoring in all constant values, which in the routine above is very helpful.


<pre>
In cases where the output of the routine depends on outside variables, the laws applied to the incoming parameters are assumed to hold either directly or by marshalling, but beyond that, it will assume worst values for the type's size. In the case of complex algorithms, it will often happen that it is not trivial to guarantee the validity of a given combination of laws; for example, if a foreign <code>n</code> was given of type <code>u32</code>, it may require brute force search to ensure that some other variable dependent on n never equals 17.
typedef u8 mybyte;


law : mybyte
The default behaviour of the C* compiler in situations like these is to error out, asking the programmer to give it more certainty about the data it is dealing with. Practically speaking, this involves creating more concise types with more permissible laws. For instance, if you want to be sure a 40 bit integer never overflows via multiplication, you need to make sure the types multiplied to create it have a bit size that, summed together, does not exceed 40 bits. Like so:
{
  _ < 255;
  _ != 0;
};


void foo( mybyte a, int c )
<div class="mw-code"><span style="color:hsl(212,60%,40%)">typedef</span> <span style="color:hsl(155,55%,40%)">bit</span><span style="color:rgba(24,24,24,0.667)">[</span><span style="color:hsl(212,55%,55%)">64</span><span style="color:rgba(24,24,24,0.667)">]</span> <span style="color:hsl(155,45%,55%)">outint</span><span style="color:rgba(24,24,24,0.667)">;</span><br/><span style="color:hsl(212,60%,40%)">typedef</span> <span style="color:hsl(155,55%,40%)">bit</span><span style="color:rgba(24,24,24,0.667)">[</span><span style="color:hsl(212,55%,55%)">64</span><span style="color:rgba(24,24,24,0.667)">]</span> <span style="color:hsl(155,45%,55%)">term0</span><span style="color:rgba(24,24,24,0.667)">;</span><br/><span style="color:hsl(212,60%,40%)">typedef</span> <span style="color:hsl(155,55%,40%)">bit</span><span style="color:rgba(24,24,24,0.667)">[</span><span style="color:hsl(212,55%,55%)">64</span><span style="color:rgba(24,24,24,0.667)">]</span> <span style="color:hsl(155,45%,55%)">term1</span><span style="color:rgba(24,24,24,0.667)">;</span><br/><br/><span style="color:hsl(212,60%,40%)">law</span> <span style="color:rgba(24,24,24,0.667)">:</span> <span style="color:hsl(155,45%,55%)">outint</span><br/><span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;_ <span style="color:rgba(24,24,24,0.667)">&lt;=</span> <span style="color:hsl(212,55%,55%)">0xFFFFFFFFFF</span><span style="color:rgba(24,24,24,0.667)">;<br/>};</span><br/><br/><span style="color:hsl(212,60%,40%)">law</span> <span style="color:rgba(24,24,24,0.667)">:</span> <span style="color:hsl(155,45%,55%)">term0</span><br/><span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;_ <span style="color:rgba(24,24,24,0.667)">&lt;=</span> <span style="color:hsl(212,55%,55%)">0xFFFFFF</span><span style="color:rgba(24,24,24,0.667)">;<br/>};</span><br/><br/><span style="color:hsl(212,60%,40%)">law</span> <span style="color:rgba(24,24,24,0.667)">:</span> <span style="color:hsl(155,45%,55%)">term1</span><br/><span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;_ <span style="color:rgba(24,24,24,0.667)">&lt;=</span> <span style="color:hsl(212,55%,55%)">0xFFFF</span><span style="color:rgba(24,24,24,0.667)">;<br/>};</span><br/><br/><span style="color:hsl(155,55%,40%)">void</span> <span style="color:hsl(86,30%,65%)">mysubroutine</span><span style="color:rgba(24,24,24,0.667)">(</span> <span style="color:hsl(155,55%,40%)">void</span> <span style="color:rgba(24,24,24,0.667)">)<br/>{</span><br/>&#9;<span style="color:hsl(155,45%,55%)">myout</span> a<span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;<span style="color:hsl(155,45%,55%)">term0</span> b <span style="color:rgba(24,24,24,0.667)">=</span> <span style="color:rgba(24,24,24,0.333)">/* ... */</span><span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;<span style="color:hsl(155,45%,55%)">term1</span> c <span style="color:rgba(24,24,24,0.667)">=</span> <span style="color:rgba(24,24,24,0.333)">/* ... */</span><span style="color:rgba(24,24,24,0.667)">;</span><br/><br/>&#9;<span style="color:rgba(24,24,24,0.333)">/* This is OK */</span><br/>&#9;a <span style="color:rgba(24,24,24,0.667)">=</span> b <span style="color:rgba(24,24,24,0.667)">*</span> c<span style="color:rgba(24,24,24,0.667)">;<br/>}</span></div>
{
  /* marshalling happens one parameter at a time */
  marshal a
  {
      if( a == 0 )
      {
        /* a MUST be set to a valid value through marshalling
          * but, we can check around that, smartly */
        a = 1;
        break;
      }


      /* exit the function otherwise */
If the above code was modified to have laws that permit any valid addition or subtraction but not multiplication (ergo, the laws are only enough to allow linear mixing, not quadratic), then <code>a = b + c</code> would still be valid, but the compiler would error out if it found <code>a = b * c</code>. The precautionary principle is in play.
      return;
  }


  /* this is the minimum required
However, it will be possible to put the compiler into that brute force mode, potentially at great computational cost, in order to arrive definitively at an answer to that question. This is accomplished using a framework of satisfiability solver programs, which provide a bitcode proof that can be saved by a programmer for trivial verification of its satisfiability once the solution is found.
    * if any laws enacted upon s32, this will fail to compile */
  marshal c
  { }


  /* alternatively, this minimal marshalling will do law checks
Introducing the transient variable lifetime to this approach means that we transcend callsite boundaries within the total system to thoroughly simulate all subroutines in a program as one big meta-routine. This means that we can get more information about possible states than is possible when marshalling without attached formal proofs. Data confined within a total system has a far smaller number of possible states. More precisely, the number of possible states it has is directly proportional to the number of changes it has. The larger the program, the longer it takes to validate, but that does not scale exponentially in its own right. It merely follows the algorithmic complexity of the program being validated.
    * and return upon any failures, since marshal blocks are only
    * entered when the runtime checks for the laws fail */
  marshal c
  {
      return;
  }
}
</pre>


Marshal blocks can only reference the parameter they are marshalling. They may declare and modify local variables with automatic storage duration, and may only call <code>pure</code> functions with such parameters.
==Concrete type system==
C* has no abstract type system, not even a {{wp|Strong and weak typing|weak}} one as provided by ANSI C. Instead, it has a simple yet rigorous concrete type system based on three fundamental primitive types: <code>bit</code>, <code>void</code> and <code>fifo</code>. They are considered ''fundamental'' because they are built into the language, and ''primitive'' as they are elementary types (as opposed to complex ones created by structure and bifurcated by dot notation). More generally, the <code>bit</code> is "something", while <code>void</code> is "nothing", and <code>fifo</code> is a secret third thing currently only valid in the context of [[#Transmogrification|transmogrification]] for the transient transit of data.


==Fundamental model of data==
C* uses its radically simplified set of primitive types as a basis for a powerfully expressive complex type system that far outshines that provided in C. Enumerations, structures, and unions have all received major semantic changes at the outset, and on top of this provide a host of new expressions that are not possible in C. Many expressions are entirely new to the imperative paradigm thanks to the conceptual distinction between [[functionalism]] and mechanicalism mentioned already. In other words, techniques and concepts previously only possible in the abstract through {{wp|functional programming}} are now accessible in concrete way.
C* models state in terms of ''bits''. This is reflected in the type system: C* has only one ''fundamental primitive type'', the <code>bit</code>. All other primitives are modelled with explicit and exact <code>struct</code>s of bits coupled with various attributes that the compiler understands the meaning of.


===Memory model===
===Enumerations===
C* has a simpler and more natural memory model improved upon from the memory model of C. There are three ''types'' of memory and two ''jurisdictions'' of memory in the language.
Enumerations have received comparatively modest treatment in the design of C*. They still behave as they do in C, with one major conceptual difference: enumerations do not have an implicit typing of <code>int</code> (or any implicit typing at all for that matter). Instead, the values of enumerations in C* hold fully arbitrary integers and floating-point numbers, using a big integer implementation for the former and a {{wp|Unum (number format)#Type I Unum|Type I Unum}} implementation for the latter. This is made practical by the architecture of the [[Oración]] assembler backend that the [[Sirius C*]] compiler will use. Enumerations do not have a C♯ style "namespacing" effect, so they put their symbols into the main symbol namespace like everything else does in a C compilation unit.


{| width="100%"
====Concrete enumerations====
|-
Since enums in C* are ephemeral by default, it is useful then to have a non-ephemeral enum variant that does indeed carry a concrete type (ergo, a definite size). We call these '''concrete enumerations''' and they are denoted in a familiar syntax borrowed from C++:
! colspan="1" width="50%" | <big>Types of memory</big>
! colspan="1" width="50%" | <big>Jurisdictions of memory</big>
|-
| colspan="1" width="50%" valign="top" |
; Private memory
: Same as in OpenCL parlance; in CUDA terms it may be called "registers" or "local memory"; in general-purpose CPU terms it is thread-local storage. Writable, and only accessible from a single execution context.
; Shared memory
: Same as in CUDA parlance; in OpenCL terms it is called "local memory"; in CPU terms it is typical, often heap-allocated memory. Writable and accessible from potentially multiple concurrent execution contexts; this is the only memory category that demands manual synchronisation.
; Constant memory
: Shared memory that is read-only for all execution contexts. This memory can be shared and used without need for synchronisation across multiple execution contexts, but is only modified at the moment it was declared.
| colspan="1" width="50%" valign="top" |
; Native memory
: Memory for which its entire lifetime and all of its access is confined to the total system. Laws can be enacted upon types of objects residing in native memory. Variable declarations default to this jurisdiction.
; Foreign memory
: Memory that may be accessed, created and/or destroyed outside of the total system. All laws enacted on a type of object in foreign memory do not apply; instead, the user must apply pacts and marshalling code to validate such data. Use of either the extern or volatile storage modifiers declares a variable as residing in foreign memory.
|}


===New enum semantics===
<div class="mw-code"><span style="color:hsl(212,60%,40%)">enum</span> <span style="color:rgba(24,24,24,0.667)">:</span> <span style="color:hsl(155,45%,55%)">u32</span><br/><span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;<span style="color:hsl(200,20%,45%)">PRIMA</span><span style="color:rgba(24,24,24,0.667)">,</span><br/>&#9;<span style="color:hsl(200,20%,45%)">SECUNDA</span><span style="color:rgba(24,24,24,0.667)">,</span><br/>&#9;<span style="color:hsl(200,20%,45%)">MAX_MYENUM</span><br/><span style="color:rgba(24,24,24,0.667)">};</span><br/><span style="color:rgba(24,24,24,0.333)">/* all of the above enum identifiers are u32s */</span></div>
C* provides a new kind of <code>enum</code> called legal enums, distinguished by their composite keyword syntax of <code>enum law</code>. Observe:


<pre>
Concrete enumerations have a set of anonymous members that fill in every possible number representable by their underlying type not already named. They are semantically interchangeable with each other so long as they have the same underlying type, unless they are also legal enumerations.
enum law types
{
  FIRST,
  SECOND,
  THIRD
};


enum law err
====Legal enumerations====
{
C* introduces a variant of the typical C enumeration called the '''legal enumeration''', distinguished by the composite opening keyword <code>enum law</code> as opposed to just <code>enum</code>. These work the same as normal enumerations with one major difference: the values of the enumeration's members cannot be set. This means that legal enumerations always start at zero, increment by one, and never hold non-integers. This has two benefits: first, it aids in compile-time law enforcement, and second, it enables the <code>sizeof(&nbsp;)</code> expression to be taken from the enumeration, yielding a headcount of how many members it has. This obviates the need for manually specifying the common pattern of <code>MAX_*</code> as the final member of an enumeration denoting its size. Observe:
  /* Not legal: */
  FIRST = 0,
  /* also not legal: */
  FOOBAR = 42
  /* legal enums cannot have their members set to arbitrary values */
};


/* this gives you the common pattern of a final enum member MAX_* */
<div class="mw-code"><span style="color:hsl(212,60%,40%)">enum law</span> types<br/><span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;<span style="color:hsl(200,20%,45%)">FIRST</span><span style="color:rgba(24,24,24,0.667)">,</span><br/>&#9;<span style="color:hsl(200,20%,45%)">SECOND</span><span style="color:rgba(24,24,24,0.667)">,</span><br/>&#9;<span style="color:hsl(200,20%,45%)">THIRD</span><br/><span style="color:rgba(24,24,24,0.667)">};</span><br/><br/><span style="color:hsl(212,60%,40%)">enum law</span> err<br/><span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;<span style="color:rgba(24,24,24,0.333)">/* Not legal: */</span><br/>&#9;<span style="color:hsl(200,20%,45%)">FIRST</span> <span style="color:rgba(24,24,24,0.667)">=</span> <span style="color:hsl(212,55%,55%)">0</span><span style="color:rgba(24,24,24,0.667)">,</span><br/>&#9;<span style="color:rgba(24,24,24,0.333)">/* also not legal: */</span><br/>&#9;<span style="color:hsl(200,20%,45%)">FOOBAR</span> <span style="color:rgba(24,24,24,0.667)">=</span> <span style="color:hsl(212,55%,55%)">42</span><br/>&#9;<span style="color:rgba(24,24,24,0.333)">/* legal enums cannot have their members set to arbitrary values */</span><br/><span style="color:rgba(24,24,24,0.667)">};</span><br/><br/><span style="color:rgba(24,24,24,0.333)">/* this gives you the common pattern of a final enum member MAX_* */<br/>/* sizeof(enum law types) == 3 */</span></div>
/* sizeof(enum law types) == 3 */</pre>


===New struct semantics===
Legal enumerations can also be combined with concrete enumerations to further aid the transient variable lifetime analyser and achieve more comprehensive law enforcement.
C* provides a new framing of <code>struct</code>s to more precisely declare the anatomy of a data type. Whereas C's <code>struct</code>s are a mechanism for declaring new ''complex'' data types, C*'s <code>struct</code>s are a mechanism for declaring new complex ''and primitive'' data types. This is how it is possible for C* to have only one fundamental primitive built-in to the language (the <code>bit</code>): the other primitives programmers are used to having are declared in header files through <code>struct</code>s.


<code>struct</code>s of primitives are often comprised of statically-sized arrays of <code>bit</code>s or other types. For example, a <code>u8</code> is defined like so:
===Structures===
C* has radically changed the semantics of structures to be oriented and denominated in bits, rather than members and octets with implicit padding.


<pre>
====Inline structures====
typedef struct
Inline structures are a syntactic sugar that makes it practical to define new primitive types. They are constituted by a struct definition that has one ''and only one'' member, the name of which is the pronoun <code>_</code>. With this, all data typed to such a structure will not use dot notation to access the data, but will access it directly as a primitive type. Observe:
{
  bit _[8];
}
u8;
</pre>


This uses a feature of C* called ''inline <code>struct</code>s'', which allows users to declare a struct type with a single member named <code>_</code>, and access its instances directly by name without any dot notation, like a primitive. It combines this with <code>typedef</code>s to make it fully look and act like a primitive data type.
<div class="mw-code"><span style="color:hsl(212,60%,40%)">typedef struct</span><br/><span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;<span style="color:hsl(155,55%,40%)">bit</span><span style="color:rgba(24,24,24,0.667)">[</span><span style="color:hsl(212,55%,55%)">8</span><span style="color:rgba(24,24,24,0.667)">]</span> _<span style="color:rgba(24,24,24,0.667)">;<br/>}</span><br/><span style="color:hsl(155,55%,40%)">u8</span>;</span><br/><br/><span style="color:rgba(24,24,24,0.333)">/* this is how we do it */</span><br/><span style="color:hsl(155,45%,55%)">u8</span> foo <span style="color:rgba(24,24,24,0.667)">=</span> <span style="color:hsl(212,55%,55%)">255</span><span style="color:rgba(24,24,24,0.667)">;</span><br/>foo <span style="color:rgba(24,24,24,0.667)">=</span> <span style="color:hsl(212,55%,55%)">254</span><span style="color:rgba(24,24,24,0.667)">;</span></div>


====Explicit padding====
====Explicit padding====
If a <code>struct</code> has more than one member, fields with the name <code>_</code> are treated as padding bits for the struct. This is useful since C* makes all <code>struct</code>s fully bit-packed, requiring programmers explicate padding. This accomplishes the same thing that C ABIs do while keeping the programmer empowered about the specifics, instead of leaving it up to the compiler implementation.
This is one of the more radical departures C* makes not just from C but even from related languages based on C: implicit "padding" does not exist in the abstract machine model for C*. Since the compiler will never be permitted to insert padding surreptitiously on its own, it is up to the programmer to perform this explicitly. Explicit padding is constituted by structure members named by the pronoun <code>_</code> where there is more than one member.
 
====Structure synonyms====
C* provides a way to declare distinct structs to be "synonyms" of each other, meaning that they can be treated interchangeably in subroutine calls. This is only permitted in cases where, differences in ordering and explicit padding aside, they are semantically identical. Therefore, struct synonyms are a way to automate logic-free transmogrification of structured data. The syntax is as follows:
 
<div class="mw-code"><span style="color:hsl(212,60%,40%)">struct</span> <span style="color:hsl(155,55%,40%)">prima</span><br/><span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;<span style="color:hsl(155,55%,40%)">bit</span><span style="color:rgba(24,24,24,0.667)">[</span><span style="color:hsl(212,55%,55%)">8</span><span style="color:rgba(24,24,24,0.667)">]</span> octet<span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;<span style="color:hsl(155,55%,40%)">bit</span><span style="color:rgba(24,24,24,0.667)">[</span><span style="color:hsl(212,55%,55%)">24</span><span style="color:rgba(24,24,24,0.667)">]</span> _<span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;<span style="color:hsl(155,55%,40%)">bit</span><span style="color:rgba(24,24,24,0.667)">[</span><span style="color:hsl(212,55%,55%)">32</span><span style="color:rgba(24,24,24,0.667)">]</span> alpha<span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;<span style="color:hsl(155,55%,40%)">bit</span><span style="color:rgba(24,24,24,0.667)">[</span><span style="color:hsl(212,55%,55%)">64</span><span style="color:rgba(24,24,24,0.667)">]</span> beta<span style="color:rgba(24,24,24,0.667)">;<br/>};</span><br/><br/><span style="color:hsl(212,60%,40%)">struct</span> <span style="color:hsl(155,55%,40%)">secunda</span>
<span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;<span style="color:hsl(155,55%,40%)">bit</span><span style="color:rgba(24,24,24,0.667)">[</span><span style="color:hsl(212,55%,55%)">64</span><span style="color:rgba(24,24,24,0.667)">]</span> beta<span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;<span style="color:hsl(155,55%,40%)">bit</span><span style="color:rgba(24,24,24,0.667)">[</span><span style="color:hsl(212,55%,55%)">32</span><span style="color:rgba(24,24,24,0.667)">]</span> alpha<span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;<span style="color:hsl(155,55%,40%)">bit</span><span style="color:rgba(24,24,24,0.667)">[</span><span style="color:hsl(212,55%,55%)">8</span><span style="color:rgba(24,24,24,0.667)">]</span> octet<span style="color:rgba(24,24,24,0.667)">;<br/>};</span><br/><br/><span style="color:rgba(24,24,24,0.333)">/* declare them synonyms */</span><br/><span style="color:hsl(155,55%,40%)">struct</span> <span style="color:hsl(155,45%,55%)">secunda</span> <span style="color:rgba(24,24,24,0.667)">:</span> <span style="color:hsl(155,55%,40%)">struct</span> <span style="color:hsl(155,45%,55%)">prima</span><span style="color:rgba(24,24,24,0.667)">;</span><br/><br/><span style="color:rgba(24,24,24,0.333)">/* directionality does not matter */</span><br/><span style="color:hsl(155,55%,40%)">struct</span> <span style="color:hsl(155,45%,55%)">prima</span> <span style="color:rgba(24,24,24,0.667)">:</span> <span style="color:hsl(155,55%,40%)">struct</span> <span style="color:hsl(155,45%,55%)">secunda</span><span style="color:rgba(24,24,24,0.667)">;</span></div>


====Attributes====
====Attributes====
An implementation will have a known list of ''attributes'' that convey certain information about a new primitive. These are construed through a braced list of string literals at the end of the <code>typedef</code>'s body, before the name, like so:
An implementation will have a known list of attributes that convey certain information about a new primitive. These are construed through a braced list of string literals at the end of the typedef's body, before the name, like so:


<pre>
<div class="mw-code"><span style="color:hsl(212,60%,40%)">typedef struct</span><br/><span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;<span style="color:hsl(155,55%,40%)">bit</span> _<span style="color:rgba(24,24,24,0.667)">[</span><span style="color:hsl(212,55%,55%)">8</span><span style="color:rgba(24,24,24,0.667)">]</span> <span style="color:rgba(24,24,24,0.667)">{</span> <span style="color:hsl(3,50%,55%)">"</span><span style="color:hsl(3,55%,40%)">signed2</span><span style="color:hsl(3,50%,55%)">"</span> <span style="color:rgba(24,24,24,0.667)">};</span><br/><span style="color:rgba(24,24,24,0.667)">}</span><br/><span style="color:hsl(155,55%,40%)">s8</span><span style="color:rgba(24,24,24,0.667)">;</span></div>
typedef struct
{
  bit _[8] { "signed2" };
}
s8;
</pre>


The attribute <code>signed2</code> conveys that the number is signed using two's complement. This means the most significant bit of the type will be treated as a sign bit by the implementation using two's complement.
In this example, the attribute <code>signed2</code> conveys that the number is signed using two's complement. This means the most significant bit of the type will be treated as a sign bit by the implementation using two's complement.


Some attributes and their provisions include:
Some attributes and their provisions include:


{| class="wikitable"
{| style="border:2px solid rgba(48,48,48,0.75);background-color:rgba(240,240,240,0.75);border-radius:5px 5px 10px 10px;border-bottom:none;padding:4px;border-spacing:2px"
|-
|-
! Attribute
! style="background-color:rgba(184,184,184,0.75);border-radius:4px 0 0 4px;padding:4px;text-align:left" | Attribute
! Description
! style="background-color:rgba(184,184,184,0.75);border-radius:0 4px 4px 0;padding:4px;text-align:left" | Description
|-
|-
| <code>signed1</code>
| style="background-color:rgba(224,224,224,0.75);border-radius:4px 0 0 4px;padding:4px" | <tt>signed1</tt>
| Signed integers using one's complement
| style="background-color:rgba(224,224,224,0.75);border-radius:0 4px 4px 0;padding:4px" | Signed integers using one's complement
|-
|-
| <code>signed2</code>
| style="background-color:rgba(224,224,224,0.75);border-radius:4px 0 0 4px;padding:4px" | <tt>signed2</tt>
| Signed integers using two's complement
| style="background-color:rgba(224,224,224,0.75);border-radius:0 4px 4px 0;padding:4px" | Signed integers using two's complement
|-
|-
| <code>ieee-bin16</code>
| style="background-color:rgba(224,224,224,0.75);border-radius:4px 0 0 4px;padding:4px" | <tt>ieee-bin16</tt>
| IEEE 754 floating point binary16
| style="background-color:rgba(224,224,224,0.75);border-radius:0 4px 4px 0;padding:4px" | IEEE 754 floating point binary16
|-
|-
| <code>ieee-bin32</code>
| style="background-color:rgba(224,224,224,0.75);border-radius:4px 0 0 4px;padding:4px" | <tt>ieee-bin32</tt>
| IEEE 754 floating point binary32
| style="background-color:rgba(224,224,224,0.75);border-radius:0 4px 4px 0;padding:4px" | IEEE 754 floating point binary32
|-
|-
| <code>ieee-bin64</code>
| style="background-color:rgba(224,224,224,0.75);border-radius:4px 0 0 4px;padding:4px" | <tt>ieee-bin64</tt>
| IEEE 754 floating point binary64
| style="background-color:rgba(224,224,224,0.75);border-radius:0 4px 4px 0;padding:4px" | IEEE 754 floating point binary64
|-
|-
| <code>ieee-bin128</code>
| style="background-color:rgba(224,224,224,0.75);border-radius:4px 0 0 4px;padding:4px" | <tt>ieee-bin128</tt>
| IEEE 754 floating point binary128
| style="background-color:rgba(224,224,224,0.75);border-radius:0 4px 4px 0;padding:4px" | IEEE 754 floating point binary128
|-
|-
| <code>ieee-bin256</code>
| style="background-color:rgba(224,224,224,0.75);border-radius:4px 0 0 4px;padding:4px" | <tt>ieee-bin256</tt>
| IEEE 754 floating point binary256
| style="background-color:rgba(224,224,224,0.75);border-radius:0 4px 4px 0;padding:4px" | IEEE 754 floating point binary256
|-
|-
| <code>bigint</code>
| style="background-color:rgba(224,224,224,0.75);border-radius:4px 0 0 4px;padding:4px" | <tt>bigint</tt>
| Unlimited precision integer
| style="background-color:rgba(224,224,224,0.75);border-radius:0 4px 4px 0;padding:4px" | Unlimited precision integer
|}
|}


====Flex structs====
====Flex structs====
One of the major limitations of C is its inability to parameterise the size of members of <code>struct</code> declarations as part of the variable's type signature. C has always been able to do this with constant expressions in its array syntax, and since C99 it can do this dynamically with VLAs. It just cannot do this with <code>struct</code>s, but C* can. We call these '''flex structs'''.
One of the major limitations of C is its inability to parameterise the size structure members as part of the overlying type signature. C has always been able to do this with constant expressions in its array syntax, and since C99 it can do this dynamically with VLAs. It just cannot do this with structures, but C* can. We call these '''flex structs'''.
 
Below is a really good example of how flex structs prove useful. It is an abbreviated implementation of a singly linked list node, first using the C-compatible pointer approach, and then using the C*-specific inline approach.
 
<pre>struct uni_1ll_node;
 
struct uni_1ll_node
{
  u8 * data;
  struct uni_1ll_node * next;
};
 
void foo( void *, struct uni_1ll_node );
 
struct uni_1ll_inode[_];
/* this is also valid and equivalent */
struct uni_1ll_inode[];


struct uni_1ll_inode[x]
Below is an example of how flex structs prove useful. It is an abbreviated implementation of a singly linked list node, first using the C-compatible pointer approach, and then using the C*-specific inline approach.
{
  u8 data[x];
  struct uni_1ll_inode[x] * next;
};


void fooi( void *, struct uni_1ll_inode[64] );
<div class="mw-code"><span style="color:hsl(212,60%,40%)">struct</span> <span style="color:hsl(155,55%,40%)">uni_1ll_node</span><span style="color:rgba(24,24,24,0.667)">;</span><br/><br/><span style="color:hsl(212,60%,40%)"><span style="color:hsl(212,60%,40%)">struct</span> <span style="color:hsl(155,55%,40%)">uni_1ll_node</span><br/><span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;<span style="color:hsl(155,45%,55%)">u8</span> *</span> data<span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;<span style="color:hsl(155,55%,40%)">struct</span> <span style="color:hsl(155,45%,55%)">uni_1ll_node</span> *</span> next<span style="color:rgba(24,24,24,0.667)">;<br/>};</span><br/><br/><span style="color:hsl(155,55%,40%)">void</span> <span style="color:hsl(86,35%,55%)">foo</span><span style="color:rgba(24,24,24,0.667)">(</span> <span style="color:hsl(155,55%,40%)">void</span> <span style="color:rgba(24,24,24,0.667)">*,</span> <span style="color:hsl(155,55%,40%)">struct</span> <span style="color:hsl(155,45%,55%)">uni_1ll_node</span> <span style="color:rgba(24,24,24,0.667)">);</span><br/><br/><span style="color:hsl(212,60%,40%)">struct</span> <span style="color:hsl(155,55%,40%)">uni_1ll_inode</span><span style="color:rgba(24,24,24,0.667)">[</span>_<span style="color:rgba(24,24,24,0.667)">];</span><br/><br/><span style="color:hsl(212,60%,40%)">struct</span> <span style="color:hsl(155,55%,40%)">uni_1ll_inode</span><span style="color:rgba(24,24,24,0.667)">[</span>x</span><span style="color:rgba(24,24,24,0.667)">]<br/>{</span><br/>&#9;<span style="color:hsl(155,45%,55%)">u8</span> data[</span>x</span><span style="color:rgba(24,24,24,0.667)">];</span><br/>&#9;<span style="color:hsl(155,55%,40%)">struct</span> <span style="color:hsl(155,45%,55%)">uni_1ll_inode</span><span style="color:rgba(24,24,24,0.667)">[</span>x</span><span style="color:rgba(24,24,24,0.667)">] *</span> next<span style="color:rgba(24,24,24,0.667)">;<br/>};</span><br/><br/><span style="color:hsl(155,55%,40%)">void</span> <span style="color:hsl(86,35%,55%)">fooi</span><span style="color:rgba(24,24,24,0.667)">(</span> <span style="color:hsl(155,55%,40%)">void</span> <span style="color:rgba(24,24,24,0.667)">*,</span> <span style="color:hsl(155,55%,40%)">struct</span> <span style="color:hsl(155,45%,55%)">uni_1ll_inode</span><span style="color:rgba(24,24,24,0.667)">[</span><span style="color:hsl(212,55%,55%)">64</span><span style="color:rgba(24,24,24,0.667)">]</span> <span style="color:rgba(24,24,24,0.667)">);</span></div>
</pre>


The value of these semantics is obvious, as it makes it possible to avoid the indirection of having pointers, without requiring abuse of the preprocessor or template/macro metaprogramming. The number, being a compile-time constant, is simply worked into the final definition of the type as if it were present in the struct body itself, dictating its ultimate size and the resulting ABI requirements.
The value of these semantics is obvious, as it makes it possible to avoid the indirection of having pointers, without requiring abuse of the preprocessor or template/macro metaprogramming. The number, being a compile-time constant, is simply worked into the final definition of the type as if it were present in the struct body itself, dictating its ultimate size and the resulting ABI requirements.


====Struct synonyms====
====Structure punning====
C* provides a way to declare distinct <code>struct</code>s to be "synonyms" of each other, meaning that they can be treated interchangeably in subroutine calls. This is only permitted in cases where, differences in explicit padding aside, they are semantically identical. Therefore, <code>struct</code> synonyms are a way to automate logic-free [[#Transmogrification|transmogrification]] of structured data. The syntax is as follows:
C* provides a feature called structure punning, which allows members of a structure definition to be "faked out" for constant data instead of holding a variable value as they usually do. The programmer can also decide whether to hold space for such data in memory, allowing it to be cast variable later on. Observe:
 
<pre>struct prima
{
  bit[8] octet;
  bit[24] _;
  bit[32] alpha;
  bit[64] beta;
};
 
struct secunda
{
  bit[64] beta;
  bit[32] alpha;
  bit[8] octet;
};
 
/* declare them synonyms */
struct secunda : struct prima;
 
/* directionality does not matter */
struct prima : struct secunda;</pre>
 
===New union semantics===
C* also provides several new semantic features for unions, most of which help achieve common high-performance optimisation patterns that C programmers would ordinarily be forced to rely on messy CPP macros to achieve.
 
====Enumerated unions====
C* provides a way to enumerate unions, providing some scaffolding to leverage unions in the conventional manner without imposing the semantic restrictions of "tagged unions" typical to other languages.
 
<pre>/* using the enum law example code from above ... */
 
struct a
{
  enum law types t;
  union b : t
  {
      u16 x;
      u32 y;
      u64 z;
  };
};
 
/* access syntax: */
a.t = FIRST;
/* this is accessing x within */
a.b = 65535;
/* you can bypass the enumeration of the union and modify directly */
a.b.y = 0xFFFFFFFF;
/* a.b.x will equal 0xFFFF */</pre>


Notably, this functionality can be used to create a kind of synthetic "optional" data type that respects C*'s paradigm:
<div class="mw-code"><span style="color:hsl(212,60%,40%)">struct</span> <span style="color:hsl(155,55%,40%)">foo</span><br/><span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;<span style="color:hsl(155,45%,55%)">u16</span> a<span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;<span style="color:hsl(155,45%,55%)">u16</span> b<span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;<span style="color:rgba(24,24,24,0.333)">/* punned without storage */</span><br/>&#9;<span style="color:hsl(155,45%,55%)">u32</span> sig <span style="color:rgba(24,24,24,0.667)">=</span> <span style="color:hsl(212,55%,55%)">0xDEADBEEF</span><span style="color:rgba(24,24,24,0.667)">;<br/>};</span><br/><br/><span style="color:hsl(212,60%,40%)">struct</span> <span style="color:hsl(155,55%,40%)">bar</span><br/><span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;<span style="color:hsl(155,45%,55%)">u16</span> a<span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;<span style="color:hsl(155,45%,55%)">u16</span> b<span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;<span style="color:rgba(24,24,24,0.333)">/* punned with storage */</span><br/>&#9;<span style="color:hsl(155,45%,55%)">u32</span> sig <span style="color:rgba(24,24,24,0.667)">:=</span> <span style="color:hsl(212,55%,55%)">0xDEADBEEF</span><span style="color:rgba(24,24,24,0.667)">;<br/>};</span><br/><br/><span style="color:hsl(212,60%,40%)">struct</span> <span style="color:hsl(155,55%,40%)">baz</span><br/><span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;<span style="color:hsl(155,45%,55%)">u16</span> a<span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;<span style="color:hsl(155,45%,55%)">u16</span> b<span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;<span style="color:rgba(24,24,24,0.333)">/* not punned */</span><br/>&#9;<span style="color:hsl(155,45%,55%)">u32</span> sig<span style="color:rgba(24,24,24,0.667)">;<br/>};</span><br/><br/><span style="color:hsl(212,60%,40%)">extern</span> <span style="color:hsl(155,55%,40%)">struct</span> <span style="color:hsl(155,45%,55%)">foo</span> x1<span style="color:rgba(24,24,24,0.667)">;</span><br/><span style="color:rgba(24,24,24,0.333)">/* this would cause UB due to potential lack of storage allocated to y */</span><br/><span style="color:hsl(155,55%,40%)">struct</span> <span style="color:hsl(155,45%,55%)">bar</span> y1 <span style="color:rgba(24,24,24,0.667)">= (</span><span style="color:hsl(155,55%,40%)">struct</span> <span style="color:hsl(155,45%,55%)">bar</span><span style="color:rgba(24,24,24,0.667)">)</span>x1<span style="color:rgba(24,24,24,0.667)">;</span><br/><span style="color:rgba(24,24,24,0.333)">/* same problem */</span><br/><span style="color:hsl(155,55%,40%)">struct</span> <span style="color:hsl(155,45%,55%)">baz</span> z1 <span style="color:rgba(24,24,24,0.667)">= (</span><span style="color:hsl(155,55%,40%)">struct</span> <span style="color:hsl(155,45%,55%)">baz</span><span style="color:rgba(24,24,24,0.667)">)</span>x1<span style="color:rgba(24,24,24,0.667)">;</span><br/><br/><span style="color:hsl(212,60%,40%)">extern</span> <span style="color:hsl(155,55%,40%)">struct</span> <span style="color:hsl(155,45%,55%)">bar</span> x2<span style="color:rgba(24,24,24,0.667)">;</span><br/><span style="color:rgba(24,24,24,0.333)">/* this is not UB as it merely leaves inaccessible the sig data<br/> * however it can be confounding as .sig is no longer being accessed<br/> * from memory as it was */</span><br/><span style="color:hsl(155,55%,40%)">struct</span> <span style="color:hsl(155,45%,55%)">foo</span> y2 <span style="color:rgba(24,24,24,0.667)">= (</span><span style="color:hsl(155,55%,40%)">struct</span> <span style="color:hsl(155,45%,55%)">foo</span><span style="color:rgba(24,24,24,0.667)">)</span>x2<span style="color:rgba(24,24,24,0.667)">;</span><br/><span style="color:rgba(24,24,24,0.333)">/* this is USEFUL as it makes the backed pun not punny */</span><br/><span style="color:hsl(155,55%,40%)">struct</span> <span style="color:hsl(155,45%,55%)">baz</span> z2 <span style="color:rgba(24,24,24,0.667)">= (</span><span style="color:hsl(155,55%,40%)">struct</span> <span style="color:hsl(155,45%,55%)">baz</span><span style="color:rgba(24,24,24,0.667)">)</span>x2<span style="color:rgba(24,24,24,0.667)">;</span><br/><br/><span style="color:hsl(212,60%,40%)">extern</span> <span style="color:hsl(155,55%,40%)">struct</span> <span style="color:hsl(155,45%,55%)">baz</span> x3<span style="color:rgba(24,24,24,0.667)">;</span><br/><span style="color:rgba(24,24,24,0.333)">/* this is not UB as it merely leaves inaccessible the sig data<br/> * however it can be confounding as .sig is no longer being accessed<br/> * from memory as it was */</span><br/><span style="color:hsl(155,55%,40%)">struct</span> <span style="color:hsl(155,45%,55%)">foo</span> y3 <span style="color:rgba(24,24,24,0.667)">= (</span><span style="color:hsl(155,55%,40%)">struct</span> <span style="color:hsl(155,45%,55%)">foo</span><span style="color:rgba(24,24,24,0.667)">)</span>x3<span style="color:rgba(24,24,24,0.667)">;</span><br/><span style="color:rgba(24,24,24,0.333)">/* this WILL cause .sig to be overwritten with 0xDEADBEEF, thereby<br/> * destroying whatever variable data was stored there */</span><br/><span style="color:hsl(155,55%,40%)">struct</span> <span style="color:hsl(155,45%,55%)">bar</span> z3 <span style="color:rgba(24,24,24,0.667)">= (</span><span style="color:hsl(155,55%,40%)">struct</span> <span style="color:hsl(155,45%,55%)">bar</span><span style="color:rgba(24,24,24,0.667)">)</span>x3<span style="color:rgba(24,24,24,0.667)">;</span></div>


<pre>enum law bool
===Unions===
{
C* also provides several new semantics for unions, most of which help achieve common high-performance optimisation patterns that C programmers would ordinarily be forced to rely on messy {{expl|CPP|C preprocessor}} macros or assembly code to achieve.
  FALSE,
  TRUE
};
 
struct optional_foo
{
  enum law boolean is;
  union val : is
  {
      void;
      struct foo data;
  };
};</pre>


====Union alignment====
====Union alignment====
It is helpful to be able to align members of a <code>union</code> relative to one another on a bit level, similar to paragraph alignment to the left or right. This obviates the need for cumbersome <code>struct</code> boilerplate to create artificial alignment with other union members that it is not defined in direct reference to, which is more semantically straightforward. C* uses the <code>&gt;_&gt;</code> symbol following the member type name to signify right alignment (i.e. towards the least significant bit) with respect to the largest member, and likewise <code>&lt;_&lt;</code> to signify left alignment (i.e. towards the most significant bit).
It is helpful to be able to align members of a union relative to one another on a bit level, similar to paragraph alignment to the left or right. This obviates the need for cumbersome struct boilerplate to create artificial alignment with other union members that it is not defined in direct reference to, which is more semantically straightforward. C* uses the <code>&gt;_&gt;</code> symbol following the member type name to signify rightward alignment (i.e. towards the least significant bit) with respect to the largest member, and likewise <code>&lt;_&lt;</code> to signify leftward alignment (i.e. towards the most significant bit). The default alignment (i.e. indeterminate alignment) can be denoted explicitly using the <code>&gt;_&lt;</code> symbol if desired.


<pre>typedef bit[16] myu16;
<div class="mw-code"><span style="color:hsl(212,60%,40%)">typedef</span> <span style="color:hsl(155,55%,40%)">bit</span><span style="color:rgba(24,24,24,0.667)">[</span><span style="color:hsl(212,55%,55%)">16</span><span style="color:rgba(24,24,24,0.667)">]</span> <span style="color:hsl(155,45%,55%)">myu16</span><span style="color:rgba(24,24,24,0.667)">;</span><br/><span style="color:hsl(212,60%,40%)">typedef</span> <span style="color:hsl(155,55%,40%)">bit</span><span style="color:rgba(24,24,24,0.667)">[</span><span style="color:hsl(212,55%,55%)">9</span><span style="color:rgba(24,24,24,0.667)">]</span> <span style="color:hsl(155,45%,55%)">myu9</span><span style="color:rgba(24,24,24,0.667)">;</span><br/><br/><span style="color:hsl(212,60%,40%)">union</span><br/><span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;<span style="color:rgba(24,24,24,0.333)">/* largest member, no alignment needed, but given anyway */</span><br/>&#9;<span style="color:hsl(155,45%,55%)">myu16</span> <span style="color:rgba(24,24,24,0.667)">&gt;_&lt;</span> prima<span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;<span style="color:rgba(24,24,24,0.333)">/* right align as carats point right */</span><br/>&#9;<span style="color:hsl(155,45%,55%)">myu9</span> <span style="color:rgba(24,24,24,0.667)">&gt;_&gt;</span> secunda<span style="color:rgba(24,24,24,0.667)">;<br/>};</span></div>
typedef bit[9] myu9;


union
Although alignment is not explicitly required by the language like padding is, the default alignment is indeterminate and the compiler reserves the right to align members however it pleases unless instructed otherwise with the above symbols.
{
  myu16 prima; /* largest member, no alignment needed */
  myu9 &gt;_&gt; secunda; /* right align as carats point right */
};</pre>
 
Although alignment is not explicitly required by the language like padding is, the default alignment is '''indeterminate''' and the compiler reserves the right to align members however it pleases unless instructed otherwise with the above symbols.


====Union punning====
====Union punning====
It is very useful to be able to pun the values of other union members in order to overload the bitfield in cases where one or more bits of a field may be zero and therefore usable for other purposes. A common example of this is flag storage in pointers, where a pointer may offer 1 or more bits on the least significant end that are always zero (guaranteed by either the hardware or by the allocator). Punning requires explicit union member alignment. Here is an example of a pointer type where the alignment is assumed to be at a minimum of 4, giving us two bits to use as flags:
It is very useful to be able to pun the values of other union members in order to overload the bitfield in cases where one or more bits of a field may be zero and therefore usable for other purposes. A common example of this is flag storage in pointers, where a pointer may offer 1 or more bits on the least significant end that are always zero (guaranteed by either the hardware or by the allocator). Punning requires explicit union member alignment. Here is an example of a pointer type where the alignment is assumed to be at a minimum of 4, giving us two bits to use as flags:


<pre>typedef bit[32] ptri;
<div class="mw-code"><span style="color:hsl(212,60%,40%)">typedef</span> <span style="color:hsl(155,55%,40%)">bit</span>[</span><span style="color:hsl(212,55%,55%)">32</span>]</span> <span style="color:hsl(155,45%,55%)">ptri</span><span style="color:rgba(24,24,24,0.667)">;</span><br/><br/><span style="color:hsl(212,60%,40%)">union</span><br/><span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;<span style="color:hsl(155,45%,55%)">ptri</span> ptr <span style="color:rgba(24,24,24,0.667)">{</span> _<span style="color:rgba(24,24,24,0.667)">[</span><span style="color:hsl(212,55%,55%)">0</span><span style="color:rgba(24,24,24,0.667)">:</span><span style="color:hsl(212,55%,55%)">1</span><span style="color:rgba(24,24,24,0.667)">] =</span> <span style="color:hsl(212,55%,55%)">0b00</span><span style="color:rgba(24,24,24,0.667)">,</span> flags</span> <span style="color:rgba(24,24,24,0.667)">=</span> <span style="color:hsl(212,55%,55%)">0b00</span> <span style="color:rgba(24,24,24,0.667)">};</span><br/>&#9;<span style="color:hsl(155,55%,40%)">bit</span><span style="color:rgba(24,24,24,0.667)">[</span><span style="color:hsl(212,55%,55%)">2</span><span style="color:rgba(24,24,24,0.667)">] &gt;_&gt;</span> flags<span style="color:rgba(24,24,24,0.667)">;<br/>};</span></div>
 
union
{
  ptri ptr { _[0:1] = 0b00, flags = 0b00 };
  bit[2] &gt;_&gt; flags;
};</pre>


This example is redundant but fully explains the feature at work here:
This example is redundant but fully explains the feature at work here:
Line 349: Line 250:
#* it is explicitly aligned to the right, so that its bits correspond to the least significant bits of <code>ptr</code>
#* it is explicitly aligned to the right, so that its bits correspond to the least significant bits of <code>ptr</code>


===A comprehensive fluid ABI===
====Enumerated unions====
C* strives to provide the maximum possible power to exactly specify the function of a system. If a programmer needs a certain number, they need only to declare as much specificity as they need, and the implementation has the responsibility of discerning the best way to map that specificity onto the targeted machine. When writing C, general-purpose "catch-all" primitives like <code>int</code> and <code>long</code> often prove to be brittle, as they are, in fact, boiled down into concrete types with more specificity than the programmer may have wanted. C* takes the ''spirit'' of these built-in C primitives, as in their original intended meaning from their names, and formalises that fluidity so it can be properly taken advantage of by the programmer. If a programmer only needs so many bits of information, they can cap it. If they need truly unbounded numbers, they can use an attribute to ask the implementation for a number that can be extended as large as needed.
C* provides a way to enumerate unions, providing some scaffolding to leverage unions in the conventional manner without imposing the semantic restrictions of "tagged unions" typical to other languages.
 
<div class="mw-code"><span style="color:rgba(24,24,24,0.333)">/* using the enum law example code from above ... */</span><br/><br/><span style="color:hsl(212,60%,40%)">struct</span> <span style="color:hsl(155,55%,40%)">a</span><br/><span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;<span style="color:hsl(155,55%,40%)">enum law</span> <span style="color:hsl(155,45%,55%)">types</span> t<span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;<span style="color:hsl(155,55%,40%)">union</span> <span style="color:hsl(155,45%,55%)">b</span> <span style="color:rgba(24,24,24,0.667)">:</span> t<br/>&#9;<span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;&#9;<span style="color:hsl(155,45%,55%)">u16</span> <span style="color:rgba(24,24,24,0.667)">&gt;_&gt;</span> x<span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;&#9;<span style="color:hsl(155,45%,55%)">u32</span> <span style="color:rgba(24,24,24,0.667)">&gt;_&gt;</span> y<span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;&#9;<span style="color:hsl(155,45%,55%)">u64</span> <span style="color:rgba(24,24,24,0.667)">&gt;_&lt;</span> z<span style="color:rgba(24,24,24,0.667)">;<br/>&#9;};<br/>};</span><br/><span style="color:rgba(24,24,24,0.333)">/* access syntax: */</span><br/>a<span style="color:rgba(24,24,24,0.667)">.</span>t <span style="color:rgba(24,24,24,0.667)">=</span> <span style="color:hsl(200,20%,45%)">FIRST</span><span style="color:rgba(24,24,24,0.667)">;</span><br/><span style="color:rgba(24,24,24,0.333)">/* this is accessing x within */</span><br/>a<span style="color:rgba(24,24,24,0.667)">.</span>b <span style="color:rgba(24,24,24,0.667)">=</span> <span style="color:hsl(212,55%,55%)">0xFFDD</span><span style="color:rgba(24,24,24,0.667)">;</span><br/><span style="color:rgba(24,24,24,0.333)">/* you can bypass the enumeration of the union and modify directly */</span><br/>a<span style="color:rgba(24,24,24,0.667)">.</span>b<span style="color:rgba(24,24,24,0.667)">.</span>y <span style="color:rgba(24,24,24,0.667)">=</span> <span style="color:hsl(212,55%,55%)">0xFFFFFFFF</span><span style="color:rgba(24,24,24,0.667)">;</span><br/><span style="color:rgba(24,24,24,0.333)">/* a.b.x will then be equal to 0xFFFF, not 0xFFDD */</span></div>
 
Notably, this functionality can be used to create a kind of synthetic "optional" data type that respects C*'s paradigm:
 
<div class="mw-code"><span style="color:hsl(212,60%,40%)">enum law</span> <span style="color:hsl(155,55%,40%)">bool</span><br/><span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;<span style="color:hsl(200,20%,45%)">FALSE</span><span style="color:rgba(24,24,24,0.667)">,</span><br/>&#9;<span style="color:hsl(200,20%,45%)">TRUE</span><br/><span style="color:rgba(24,24,24,0.667)">};</span><br/><br/><span style="color:hsl(212,60%,40%)">struct</span> <span style="color:hsl(155,55%,40%)">optional_foo</span><br/><span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;<span style="color:hsl(155,55%,40%)">enum law</span> <span style="color:hsl(155,45%,55%)">bool</span> is<span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;<span style="color:hsl(155,55%,40%)">union</span> <span style="color:hsl(155,45%,55%)">val</span> <span style="color:rgba(24,24,24,0.667)">:</span> is<br/>&#9;<span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;&#9;<span style="color:hsl(155,55%,40%)">bit</span> nop <span style="color:rgba(24,24,24,0.667)">{</span> _ <span style="color:rgba(24,24,24,0.667)">=</span> <span style="color:hsl(212,55%,55%)">0</span> <span style="color:rgba(24,24,24,0.667)">};</span><br/>&#9;&#9;<span style="color:hsl(155,55%,40%)">struct</span> <span style="color:hsl(155,45%,55%)">foo</span> data<span style="color:rgba(24,24,24,0.667)">;<br/>&#9;};<br/>};</span><br/><br/><span style="color:rgba(24,24,24,0.333)">/* optional_foo would then be accessed like so: */</span><br/><span style="color:hsl(212,60%,40%)">extern</span> <span style="color:hsl(155,55%,40%)">union</span> <span style="color:hsl(155,45%,55%)">optional_foo</span> fooey<span style="color:rgba(24,24,24,0.667)">;</span><br/><br/><span style="color:hsl(212,60%,40%)">switch</span><span style="color:rgba(24,24,24,0.667)">(</span>fooey<span style="color:rgba(24,24,24,0.667)">.</span>is<span style="color:rgba(24,24,24,0.667)">)<br/>{</span><br/><span style="color:hsl(212,60%,40%)">case</span> <span style="color:hsl(200,20%,45%)">FALSE</span><span style="color:rgba(24,24,24,0.667)">:</span><br/>&#9;<span style="color:rgba(24,24,24,0.333)">/* fooey.val is bit that is always zero */</span><br/>&#9;<span style="color:hsl(212,60%,40%)">break</span><span style="color:rgba(24,24,24,0.667)">;</span><br/><span style="color:hsl(212,60%,40%)">case</span> <span style="color:hsl(200,20%,45%)">TRUE</span><span style="color:rgba(24,24,24,0.667)">:</span><br/>&#9;<span style="color:rgba(24,24,24,0.333)">/* fooey.val is a struct foo */</span><br/>&#9;<span style="color:hsl(212,60%,40%)">break</span><span style="color:rgba(24,24,24,0.667)">;<br/>}</span></div>


The implementation has the responsibility of applying these definitions to data through hardware-provided mechanisms for the sake of performance. However, it is prudent for the system engineer to know the provisions and limits of the hardware they are targeting, as C* will transparently fall back to software emulation of unavailable features, in much the same way GCC has fallback implementations of IEEE 754 floating-point arithmetic. Ultimately, system engineers must fine tune their code if they desire the maximum possible performance on a given machine.
==Dealing with data==
C* has made many useful departures from the archaic models of C in how it conceptualises data for the programmer. It sports a new and much-simplified abstract model of computer memory. It also has new semantics for string and character literals that not only add Unicode support but do so in an encoding-agnostic manner that fully leverages C*'s powerful new type system. Encoding of literals and numeric constant data in general has been supercharged by the inclusion of a pushdown automaton that performs transmogrification of data in source format into its desired binary form in the final program.


The benefit of doing this in C* is that they simply need to understand the hardware their code runs on; they do not always need to fall back to inline assembly, whether it is to work around ABI formalities, or to have their algorithms take advantage of important instructions like SSE or NEON when a compiler might not emit them. Since the C* compiler models the program–machine relationship so much better, it can be much more certain when things like these are appropriate.
===Abstract memory model===
The language considers three broad categories of memory. All storage falls into one of these categories, regardless of its mechanism of storage. In other words, this is distinct from the other memory distinction between "automatic" stack-allocated memory and "manual" heap-allocated memory.


==Encoding data==
{| style="border:2px solid rgba(48,48,48,0.75);background-color:rgba(240,240,240,0.75);border-radius:5px 5px 10px 10px;border-bottom:none;padding:4px;border-spacing:2px;max-width:900px"
C* revolves quite heavily around the spirit of data-oriented design. So, much thought and many tough decisions have been made regarding data in the design of the language. This includes data encoding mechanisms, rules and limits upon literals in source, and more.
|+ style="border:2px solid rgba(48,48,48,0.25);border-radius:4px 4px 0 0;margin:0 4px 0;border-bottom:none;padding:4px;font-size:120%;font-weight:bold;background-color:rgba(224,224,224,0.25)" | Types of memory
|-
! style="background-color:rgba(184,184,184,0.75);border-radius:4px 0 0 4px;padding:4px;text-align:left" | Private memory
| style="background-color:rgba(224,224,224,0.75);border-radius:0 4px 4px 0;padding:4px" | Same as in OpenCL parlance; in CUDA terms it may be called "registers" or "local memory"; in general-purpose CPU terms it is thread-local storage. Writable, and only accessible from a single execution context.
|-
! style="background-color:rgba(184,184,184,0.75);border-radius:4px 0 0 4px;padding:4px;text-align:left" | Shared memory
| style="background-color:rgba(224,224,224,0.75);border-radius:0 4px 4px 0;padding:4px" | Same as in CUDA parlance; in OpenCL terms it is called "local memory"; in CPU terms it is typical, often heap-allocated memory. Writable and accessible from potentially multiple concurrent execution contexts; this is the only memory category that demands manual synchronisation.
|-
! style="background-color:rgba(184,184,184,0.75);border-radius:4px 0 0 4px;padding:4px;text-align:left" | Constant memory
| style="background-color:rgba(224,224,224,0.75);border-radius:0 4px 4px 0;padding:4px" | Shared memory that is read-only for all execution contexts. This memory can be shared and used without need for synchronisation across multiple execution contexts, but is only modified at the point it was declared and initialised.
|}


===Literals in source code===
===Literals===
C* imposes several strong measures to help contain complexity in systems programming. Many of these show up in the specifics of construing data literally within source code using "literals".
C* imposes several strong measures to help contain complexity in systems programming. Many of these show up in the specifics of construing data literally within source code using "literals".


The language requires all source code to be ASCII compliant in its raw form. No other encodings of source text are supported, although there is the [[#Doc comment exception|doc comment exception]].
C* does provide a binary literal notation that is identical to that of C++ and many other languages:
 
<div class="mw-code"><span style="color:hsl(155,45%,55%)">u32</span> foo <span style="color:rgba(24,24,24,0.667)">=</span> <span style="color:hsl(212,55%,55%)">0b10110011</span><span style="color:rgba(24,24,24,0.667)">;</span></div>


C* does not provide binary literal notation, nor does it deviate from C in its syntax for octal literals. C* certainly dispels any argument against this on grounds of better code, and so the more historically prevalent opinion is chosen out of pragmatism.
C* does not deviate from C in its syntax for octal literals.


There are literal notations for two kinds of "text": 7-bit "narrow" ASCII, and 21-bit "wide" Unicode. As in C, single quotes are used to construe character literals, and double quotes are used to construe string literals. C* uses the <code>@</code> symbol prefixed to opening quotes to denote the literal as being Unicode instead of ASCII. Observe:
There are literal notations for two kinds of "text": 7-bit "narrow" ASCII, and 21-bit "wide" Unicode. As in C, single quotes are used to construe character literals, and double quotes are used to construe string literals. C* uses the @ symbol prefixed to the opening marks to denote the literal as being Unicode instead of ASCII. Observe:


<pre>
<div class="mw-code"><span style="color:hsl(3,50%,55%)">'</span><span style="color:hsl(3,55%,40%)">a</span><span style="color:hsl(3,50%,55%)">'</span><span style="color:rgba(24,24,24,0.667)">;</span> <span style="color:rgba(24,24,24,0.333)">/* literal ASCII lowercase A (number 97) */</span><br/><span style="color:hsl(3,50%,55%)">@'</span><span style="color:hsl(3,55%,40%)">a</span><span style="color:hsl(3,50%,55%)">'</span><span style="color:rgba(24,24,24,0.667)">;</span> <span style="color:rgba(24,24,24,0.333)">/* literal Unicode lowercase A (U+0061) */</span><br/><span style="color:hsl(3,50%,55%)">'</span><span style="color:hsl(3,55%,40%)">\377</span><span style="color:hsl(3,50%,55%)">'</span><span style="color:rgba(24,24,24,0.667)">;</span> <span style="color:rgba(24,24,24,0.333)">/* ASCII DEL (number 127) */</span><br/><span style="color:hsl(3,50%,55%)">@'</span><span style="color:hsl(3,55%,40%)">\u2018</span><span style="color:hsl(3,50%,55%)">'</span><span style="color:rgba(24,24,24,0.667)">;</span> <span style="color:rgba(24,24,24,0.333)">/* Unicode opening single quote (U+2018) */</span><br/><span style="color:hsl(3,50%,55%)">"</span><span style="color:hsl(3,55%,40%)">Good morning, Vietnam!\n</span><span style="color:hsl(3,50%,55%)">"</span><span style="color:rgba(24,24,24,0.667)">;</span> <span style="color:rgba(24,24,24,0.333)">/* literal ASCII string */</span><br/><span style="color:hsl(3,50%,55%)">@"</span><span style="color:hsl(3,55%,40%)">Good morning, Vietnam!\n</span><span style="color:hsl(3,50%,55%)">"</span><span style="color:rgba(24,24,24,0.667)">;</span> <span style="color:rgba(24,24,24,0.333)">/* literal Unicode string */</span></div>
'a'; /* literal ASCII lowercase A (number 97) */
@'a'; /* literal Unicode lowercase A (U+0061) */
'\377'; /* ASCII DEL (number 127) */
@'\u2018'; /* Unicode opening single quote (U+2018) */
"Good morning, Vietnam!\n"; /* literal ASCII string */
@"Good morning, Vietnam!\n"; /* literal Unicode string */
</pre>


===Transmogrification===
===Transmogrification===
C*'s provisions for literals are usually not going to translate into their ideal storage medium as-is. Everything defaults to being bit-packed, including ASCII as 7-bit and Unicode as 21-bit, which is hostile to most CPU architectures. In order to help programmers work through such problems without the follies of metaprogrammer, C* provides syntax for a kind of ''transmogrifier'' function that is worked through to transform literals into their final form within the program.
{{notice|This feature is under heavy redevelopment. Its final form will probably be quite different from the work-in-progress you see here.}}
C*'s provisions for literals are usually not going to translate into their ideal storage medium as-is. Everything defaults to being bit-packed, including ASCII as 7-bit and Unicode as 21-bit, which is hostile to most CPU architectures. In order to help programmers work through such problems without the follies of metaprogramming, C* provides syntax for a kind of '''transmogrifier subroutine''' that is worked through to transform literals into their final form within the program.


Transmogrifier functions are the sole context for C*'s third fundamental primitive type, the <code>fifo</code>, as well as two operators, <code>&lt;-</code> and <code>-&gt;</code>. definitionally, a function is a ''transmogrifier function'' if its return "type" is <code>fifo</code> and its sole parameter is an anonymous <code>fifo</code> "type". The <code>&lt;-</code> is the ''output operator,'' streaming <code>bit</code>s as output as the function progresses; the <code>-&gt;</code> is the ''input operator,'' streaming <code>bit</code>s as input as the function progresses.
Transmogrifier subroutine are the sole context for C*'s third fundamental primitive type, <code>fifo</code>, as well as two operators, <code>&lt;-</code> and <code>-&gt;</code>. Definitionally, a subroutine is a ''transmogrifier subroutine'' if its return type is fifo and its sole parameter is an anonymous <code>fifo</code> type. The <code>&lt;-</code> is the output operator, streaming bits as output as the routine progresses; the <code>-&gt;</code> is the input operator, streaming bits as input as the routine progresses.


With these, it is possible, for example, to write a transmogrifier that takes a Unicode string literal and outputs it as UTF-8:
With these, it is possible, for example, to write a transmogrifier that takes a Unicode string literal and outputs it as UTF-8:


<pre>  
<div class="mw-code"><span style="color:hsl(155,55%,40%)">fifo</span> <span style="color:hsl(86,35%,55%)">ustr2utf8</span><span style="color:rgba(24,24,24,0.667)">(</span> <span style="color:hsl(155,55%,40%)">fifo</span> <span style="color:rgba(24,24,24,0.667)">)<br/>{</span><br/>&#9;<span style="color:hsl(155,55%,40%)">bit</span> c<span style="color:rgba(24,24,24,0.667)">[</span><span style="color:hsl(212,55%,55%)">21</span><span style="color:rgba(24,24,24,0.667)">];</span><br/>&#9;<span style="color:hsl(155,45%,55%)">u8</span> n<span style="color:rgba(24,24,24,0.667)">;</span><br/><br/>&#9;<span style="color:rgba(24,24,24,0.333)">/* take in 21 bits from input FIFO<br/>&#9; * n reports how many bits were available */</span><br/>&#9;c<span style="color:rgba(24,24,24,0.667)">,</span> n <span style="color:rgba(24,24,24,0.667)">&lt;-</span> <span style="color:hsl(212,55%,55%)">21</span><span style="color:rgba(24,24,24,0.667)">;</span><br/><br/>&#9;<span style="color:hsl(212,60%,40%)">if</span><span style="color:rgba(24,24,24,0.667)">(</span>n <span style="color:rgba(24,24,24,0.667)">&lt;</span> <span style="color:hsl(212,55%,55%)">21</span><span style="color:rgba(24,24,24,0.667)">)<br/>&#9;{</span><br/>&#9;&#9;<span style="color:rgba(24,24,24,0.333)">/* do something different, potentially */</span><br/>&#9;<span style="color:rgba(24,24,24,0.667)">}</span><br/><br/>&#9;<span style="color:rgba(24,24,24,0.333)">/* ... implementation ... */<br/><br/>&#9;/* send out 8 bits */</span><br/>&#9;c <span style="color:rgba(24,24,24,0.667)">&amp;</span> <span style="color:hsl(212,55%,55%)">0xFF</span> <span style="color:rgba(24,24,24,0.667)">-&gt;</span> <span style="color:hsl(212,55%,55%)">8</span><span style="color:rgba(24,24,24,0.667)">;<br/>}</span><br/><br/><span style="color:hsl(155,55%,40%)">fifo</span> <span style="color:hsl(86,35%,55%)">str2utf8</span><span style="color:rgba(24,24,24,0.667)">(</span> <span style="color:hsl(155,55%,40%)">fifo</span> <span style="color:rgba(24,24,24,0.667)">)<br/>{</span><br/>&#9;<span style="color:hsl(155,55%,40%)">bit</span> c<span style="color:rgba(24,24,24,0.667)">[</span><span style="color:hsl(212,55%,55%)">7</span><span style="color:rgba(24,24,24,0.667)">];</span><br/>&#9;<span style="color:hsl(155,45%,55%)">u8</span> n<span style="color:rgba(24,24,24,0.667)">;</span><br/><br/>&#9;c<span style="color:rgba(24,24,24,0.667)">,</span> n <span style="color:rgba(24,24,24,0.667)">&lt;-</span> <span style="color:hsl(212,55%,55%)">7</span><span style="color:rgba(24,24,24,0.667)">;</span><br/><br/>&#9;<span style="color:hsl(212,60%,40%)">if</span><span style="color:rgba(24,24,24,0.667)">(</span>n <span style="color:rgba(24,24,24,0.667)">==</span> <span style="color:hsl(212,55%,55%)">0</span><span style="color:rgba(24,24,24,0.667)">)<br/>&#9;{</span><br/>&#9;&#9;<span style="color:hsl(212,60%,40%)">return</span><span style="color:rgba(24,24,24,0.667)">;<br/>&#9;}</span><br/>&#9;<span style="color:hsl(212,60%,40%)">else if</span><span style="color:rgba(24,24,24,0.667)">(</span>n <span style="color:rgba(24,24,24,0.667)">&lt;</span> <span style="color:hsl(212,55%,55%)">7</span><span style="color:rgba(24,24,24,0.667)">)<br/>&#9;{</span><br/>&#9;&#9;<span style="color:rgba(24,24,24,0.333)">/* Houston... we have a problem */</span><br/>&#9;<span style="color:rgba(24,24,24,0.667)">}</span><br/><br/>&#9;<span style="color:rgba(24,24,24,0.333)">/* 7-bit ASCII into 8-bit stream */</span><br/>&#9;<span style="color:hsl(212,55%,55%)">0</span> <span style="color:rgba(24,24,24,0.667)">-&gt;</span> <span style="color:hsl(212,55%,55%)">1</span><span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;c <span style="color:rgba(24,24,24,0.667)">-&gt;</span> <span style="color:hsl(212,55%,55%)">7</span><span style="color:rgba(24,24,24,0.667)">;<br/>}</span><br/><br/><span style="color:hsl(155,55%,40%)">const</span> <span style="color:hsl(155,45%,55%)">u8</span> <span style="color:rgba(24,24,24,0.667)">*</span> my_unicode <span style="color:rgba(24,24,24,0.667)">=</span> <span style="color:hsl(18,60%,55%)">ustr2utf8</span><span style="color:hsl(3,50%,55%)">@"</span><span style="color:hsl(3,55%,40%)">\u201CBlah blah\u201D</span><span style="color:hsl(3,50%,55%)">"</span><span style="color:rgba(24,24,24,0.667)">;</span><br/><span style="color:hsl(155,55%,40%)">const</span> <span style="color:hsl(155,45%,55%)">u8</span> <span style="color:rgba(24,24,24,0.667)">*</span> my_ascii <span style="color:rgba(24,24,24,0.667)">=</span> <span style="color:hsl(18,60%,55%)">str2utf8</span><span style="color:hsl(3,50%,55%)">"</span><span style="color:hsl(3,55%,40%)">Good morning, Vietnam!</span><span style="color:hsl(3,50%,55%)">"</span><span style="color:rgba(24,24,24,0.667)">;</span></div>
fifo ustr2utf8( fifo )
{
  bit c[21];
  u8 n;


  /* take in 21 bits from input FIFO
==Other features==
    * n reports how many bits were available */
===Domains===
  c, n <- 21;
What other languages often call "modules" or "namespaces" are provided by C* as '''domains'''. Domains are a simple semantic grouping tool for making coherent collections of symbols and identifiers. In contrast to C++ namespaces, they are not lexically "grouping", that is, they are merely declared to exist, and used in other declarations directly as desired. Observe:


  if(n < 21)
<div class="mw-code"><span style="color:hsl(212,60%,40%)">domain</span> sys<span style="color:rgba(24,24,24,0.667)">;</span><br/><span style="color:hsl(212,60%,40%)">domain</span> sys<span style="color:rgba(24,24,24,0.667)">.</span>io<span style="color:rgba(24,24,24,0.667)">,</span> sys<span style="color:rgba(24,24,24,0.667)">.</span>mem<span style="color:rgba(24,24,24,0.667)">,</span> sys<span style="color:rgba(24,24,24,0.667)">.</span>utf<span style="color:rgba(24,24,24,0.667)">;</span><br/><br/><span style="color:hsl(212,60%,40%)">using</span> sys<span style="color:rgba(24,24,24,0.667)">.</span>io<span style="color:rgba(24,24,24,0.667)">.</span><span style="color:hsl(86,35%,55%)">printf</span><span style="color:rgba(24,24,24,0.667)">;</span> <span style="color:rgba(24,24,24,0.333)">/* printf is now in scope unqualified */</span><br/><span style="color:hsl(212,60%,40%)">using</span> sys<span style="color:rgba(24,24,24,0.667)">.</span>io<span style="color:rgba(24,24,24,0.667)">.</span><span style="color:hsl(86,35%,55%)">printf</span> <span style="color:rgba(24,24,24,0.667)">=</span> <span style="color:hsl(86,35%,55%)">p</span><span style="color:rgba(24,24,24,0.667)">;</span> <span style="color:rgba(24,24,24,0.333)">/* p refers to sys.io.printf now */</span><br/><span style="color:hsl(212,60%,40%)">using</span> <span style="color:hsl(155,55%,40%)">struct</span> sys<span style="color:rgba(24,24,24,0.667)">.</span>io<span style="color:rgba(24,24,24,0.667)">.</span><span style="color:hsl(155,45%,55%)">file</span><span style="color:rgba(24,24,24,0.667)">;</span> <span style="color:rgba(24,24,24,0.333)">/* now struct file is in scope */</span><br/><span style="color:hsl(212,60%,40%)">using</span> <span style="color:hsl(155,55%,40%)">struct</span> sys<span style="color:rgba(24,24,24,0.667)">.</span>io<span style="color:rgba(24,24,24,0.667)">.</span><span style="color:hsl(155,45%,55%)">file</span> <span style="color:rgba(24,24,24,0.667)">=</span> <span style="color:hsl(155,55%,40%)">struct</span> <span style="color:hsl(155,45%,55%)">f</span><span style="color:rgba(24,24,24,0.667)">;</span> <span style="color:rgba(24,24,24,0.333)">/* struct f declared */</span><br/><span style="color:hsl(212,60%,40%)">using</span> <span style="color:hsl(155,55%,40%)">struct</span> sys<span style="color:rgba(24,24,24,0.667)">.</span>io<span style="color:rgba(24,24,24,0.667)">.</span><span style="color:hsl(155,45%,55%)">file</span> <span style="color:rgba(24,24,24,0.667)">=</span> f<span style="color:rgba(24,24,24,0.667)">;</span> <span style="color:rgba(24,24,24,0.333)">/* ERROR: cannot cross namespaces */</span><br/><span style="color:hsl(212,60%,40%)">typedef</span> <span style="color:hsl(155,55%,40%)">struct</span> <span style="color:hsl(155,45%,55%)">f f</span><span style="color:rgba(24,24,24,0.667)">;</span> <span style="color:rgba(24,24,24,0.333)">/* if you really wanted to do that, this is how */</span></div>
  {
      /* do something different, potentially */
  }


  /* ... implementation ... */
While in a vacuum, C*'s domains hardly justify their existence in light of the sufficiency of normal symbols as in ANSI C, the utility can be realised in how it makes possible smarter contextualisation of parameters for routine calls and structure initialisation, like so:


  /* send out 8 bits */
<div class="mw-code"><span style="color:hsl(212,60%,40%)">domain</span> mylib<span style="color:rgba(24,24,24,0.667)">;</span><br/><br/><span style="color:hsl(212,60%,40%)">enum</span> mylib<span style="color:rgba(24,24,24,0.667)">.</span><span style="color:hsl(155,55%,40%)">foo</span><br/><span style="color:rgba(24,24,24,0.667)">{</span><br/>&#9;<span style="color:hsl(200,20%,45%)">PRIMA</span><span style="color:rgba(24,24,24,0.667)">,</span><br/>&#9;<span style="color:hsl(200,20%,45%)">SECUNDA</span><br/><span style="color:rgba(24,24,24,0.667)">};</span><br/><br/><span style="color:hsl(155,55%,40%)">void</span> mylib<span style="color:rgba(24,24,24,0.667)">.</span><span style="color:hsl(86,35%,55%)">bar</span><span style="color:rgba(24,24,24,0.667)">(</span> <span style="color:hsl(155,55%,40%)">enum</span> mylib<span style="color:rgba(24,24,24,0.667)">.</span><span style="color:hsl(155,45%,55%)">foo</span> e <span style="color:rgba(24,24,24,0.667)">);</span><br/><br/><span style="color:rgba(24,24,24,0.333)">/* regardless of the presence of using statements, the enum would be<br/> * contextualised in the routine call so it never needs qualifying */</span><br/>mylib<span style="color:rgba(24,24,24,0.667)">.</span><span style="color:hsl(86,30%,65%)">bar</span><span style="color:rgba(24,24,24,0.667)">(</span> <span style="color:hsl(200,20%,45%)">PRIMA</span> <span style="color:rgba(24,24,24,0.667)">);</span><br/><br/><span style="color:rgba(24,24,24,0.333)">/* or, with using */</span><br/><span style="color:hsl(212,60%,40%)">using</span> mylib<span style="color:rgba(24,24,24,0.667)">.</span><span style="color:hsl(86,35%,55%)">bar</span><span style="color:rgba(24,24,24,0.667)">;</span><br/><span style="color:hsl(86,30%,65%)">bar</span><span style="color:rgba(24,24,24,0.667)">(</span> <span style="color:hsl(200,20%,45%)">SECUNDA</span> <span style="color:rgba(24,24,24,0.667)">);</span><br/><span style="color:rgba(24,24,24,0.333)">/* never brought in enum mylib.foo directly */<br/><br/>/* this can be avoided by globalising the call with a leading dot */</span><br/><span style="color:rgba(24,24,24,0.667)">.</span>mylib<span style="color:rgba(24,24,24,0.667)">.</span><span style="color:hsl(86,30%,65%)">bar</span><span style="color:rgba(24,24,24,0.667)">(</span> mylib<span style="color:rgba(24,24,24,0.667)">.</span><span style="color:hsl(200,20%,45%)">PRIMA</span> <span style="color:rgba(24,24,24,0.667)">);</span><br/><br/><span style="color:rgba(24,24,24,0.333)">/* or, with using */</span><br/><span style="color:hsl(212,60%,40%)">using</span> mylib<span style="color:rgba(24,24,24,0.667)">.</span><span style="color:hsl(86,35%,55%)">bar</span><span style="color:rgba(24,24,24,0.667)">;<br/>.</span><span style="color:hsl(86,30%,65%)">bar</span><span style="color:rgba(24,24,24,0.667)">(</span> mylib<span style="color:rgba(24,24,24,0.667)">.</span><span style="color:hsl(200,20%,45%)">SECUNDA</span> <span style="color:rgba(24,24,24,0.667)">);</span></div>
  c & 0xFF -> 8;
}


fifo str2utf8( fifo )
The main danger of domains is obfuscation of interface – for this reason, C* disallows <code>using</code> statements outside of block scope, and additionally forbids any form of "wildcard" selectors in <code>using</code> statements entirely. Since the above feature of soft contextualisation applies to all identifiers in a given domain, the application of <code>using</code> statements as a general "decluttering" is avoided and refitted solely as a tool for bringing desired subroutines into scope. In this spirit, C* mandates that <code>using</code> statements are hoisted to the top of the block scope, before all variable declarations.
{
  bit c[7];
  u8 n;


  c, n <- 7;
===Segment routines===
C* provides a way to export labels inside routine bodies as ABI symbols, giving it multiple points of entry. This is useful for bypassing certain kinds of housekeeping code for performance reasons when one knows that the variants held by such boilerplate hold without it executing. Consider this:


  if( n == 0 )
<div class="mw-code"><span style="color:hsl(155,55%,40%)">void</span> <span style="color:hsl(86,35%,55%)">foo</span><span style="color:rgba(24,24,24,0.667)">(</span> <span style="color:hsl(155,45%,55%)">knot20</span> <span style="color:rgba(24,24,24,0.667)">*,</span> <span style="color:hsl(155,45%,55%)">u32</span> <span style="color:rgba(24,24,24,0.667)">);</span><br/><span style="color:hsl(155,55%,40%)">void</span> <span style="color:hsl(86,35%,55%)">foo</span>:<span style="color:hsl(86,30%,65%)">quick</span><span style="color:rgba(24,24,24,0.667)">( );</span><br/><br/><span style="color:hsl(155,55%,40%)">void</span> <span style="color:hsl(86,35%,55%)">foo</span><span style="color:rgba(24,24,24,0.667)">(</span> <span style="color:hsl(155,45%,55%)">knot20</span> <span style="color:rgba(24,24,24,0.667)">*</span> cord<span style="color:rgba(24,24,24,0.667)">,</span> <span style="color:hsl(155,45%,55%)">u32</span> knotcount <span style="color:rgba(24,24,24,0.667)">)<br/>{</span><br/>&#9;<span style="color:hsl(155,45%,55%)">u32</span> i<span style="color:rgba(24,24,24,0.667)">,</span> j<span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;<span style="color:hsl(155,45%,55%)">u32</span> olimit <span style="color:rgba(24,24,24,0.667)">=</span> knotcount <span style="color:rgba(24,24,24,0.667)">-</span> <span style="color:hsl(212,55%,55%)">1</span><span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;<span style="color:hsl(155,45%,55%)">u32</span> ilimit <span style="color:rgba(24,24,24,0.667)">=</span> <span style="color:hsl(212,55%,55%)">0x40000</span><span style="color:rgba(24,24,24,0.667)">;</span><br/><br/>&#9;<span style="color:hsl(212,60%,40%)">goto</span> <span style="color:hsl(86,30%,65%)">algo</span><span style="color:rgba(24,24,24,0.667)">;</span><br/><br/><span style="color:hsl(86,35%,55%)">quick</span>::<br/>&#9;olimit <span style="color:rgba(24,24,24,0.667)">=</span> <span style="color:hsl(212,55%,55%)">0</span><span style="color:rgba(24,24,24,0.667)">;</span><br/>&#9;ilimit <span style="color:rgba(24,24,24,0.667)">=</span> knotcount <span style="color:rgba(24,24,24,0.667)">*</span> <span style="color:hsl(212,55%,55%)">0x40000</span><span style="color:rgba(24,24,24,0.667)">;</span><br/><br/><span style="color:hsl(86,30%,65%)">algo</span>:<br/>&#9;<span style="color:hsl(212,60%,40%)">for</span><span style="color:rgba(24,24,24,0.667)">(</span>j <span style="color:rgba(24,24,24,0.667)">=</span> <span style="color:hsl(212,55%,55%)">0</span><span style="color:rgba(24,24,24,0.667)">;</span> j <span style="color:rgba(24,24,24,0.667)">&lt;=</span> olimit<span style="color:rgba(24,24,24,0.667)">; ++</span>j<span style="color:rgba(24,24,24,0.667)">)<br/>&#9;{</span><br/>&#9;&#9;<span style="color:hsl(155,45%,55%)">u32</span> <span style="color:rgba(24,24,24,0.667)">*</span> const</span> d <span style="color:rgba(24,24,24,0.667)">= (</span><span style="color:hsl(155,45%,55%)">u32</span> <span style="color:rgba(24,24,24,0.667)">*)</span>cord<span style="color:rgba(24,24,24,0.667)">[</span>j<span style="color:rgba(24,24,24,0.667)">];</span><br/><br/>&#9;&#9;<span style="color:hsl(212,60%,40%)">for</span><span style="color:rgba(24,24,24,0.667)">(</span>i <span style="color:rgba(24,24,24,0.667)">=</span> <span style="color:hsl(212,55%,55%)">0</span><span style="color:rgba(24,24,24,0.667)">;</span> i <span style="color:rgba(24,24,24,0.667)">&lt;</span> ilimit<span style="color:rgba(24,24,24,0.667)">; ++</span>i<span style="color:rgba(24,24,24,0.667)">)<br/>&#9;&#9;{</span><br/>&#9;&#9;&#9;d<span style="color:rgba(24,24,24,0.667)">[</span>i<span style="color:rgba(24,24,24,0.667)">]</span> <span style="color:rgba(24,24,24,0.667)">^=</span> d<span style="color:rgba(24,24,24,0.667)">[</span>i<span style="color:rgba(24,24,24,0.667)">];<br/>&#9;&#9;}<br/>&#9;}<br/>}</span></div>
  {
      return;
  }
  else if( n < 7 )
  {
      /* Houston... we have a problem */
  }


  /* 7-bit ASCII into 8-bit stream */
There are several things being described here. On the high level, we are conceptually dealing with an algorithm that can work with ''modular memory'' – that is, memory that has been intelligently segmented to be digestible on processors with small memories (think 16-bit). This routine was heavily modified to algebraically move all of the differences in execution into different starting variables that cause the desired behaviour. The algorithm inside is illustrative: it merely XORs the input data with itself, inverting it. The idea is to support bookkeeping that advances the algorithm's work linearly over one knot, and then adjusts the data pointer to work on the ''next knot,'' but with a catch: if we instead call into the <code>:quick(&nbsp;)</code> segroutine, it will treat the first knot in the cord as the start of a ''contiguous block of knots'', skipping all of the overhead of advancing from one knot to the next because we are told they immediately follow one another in memory.
  0 -> 1;
  c -> 7;
}


const u8 * my_unicode = ustr2utf8@"\u201CBlah blah\u201D";
Some other details that are important include:
const u8 * my_ascii = str2utf8"Good morning, Vietnam!";
* forward declaration of segroutines ''must always'' have an empty parameter list
</pre>
** segroutines ''always'' take the same number and types of parameters as their parent routine
* at the assembly level, segroutine labels in the implementation imply a hidden stack allocation to make space for all of the variables hoisted and declared at the top of the routine
** beware of compound declaration-definitions! at the start of a segroutine label, the variables are ''only'' declared, '''not''' initialised
* the hidden stack allocation also implies a hidden <code>goto</code> inserted immediately before it, targeted to the position immediately after it
** therefore, explicit <code>goto</code>s like in the example above incur no performance penalty, and give the programmer full control over expression differentiation in the rest of the routine


==Optimisation==
While this example merely shows different initial values of stack variables for the purposes of illustration, a more real-world implementation of a modular memory aware algorithm may instead insert machine-specific plumbing code, such as incrementing a segment register or switching active banks, while offering the segroutine as a bypass to this potentially costly part of execution in cases where it is known to not be needed.
Conceptual work is still being done on how C* provides for optimisation, as it is so much later in the development process, requiring it to be modelled in that order to a great extent, too. Much of the explicit semantics of the language carry manual optimisation for the programmer's use, but there are still other model changes that will be made to help programmers make better programs while maintaining portability and avoiding genericism and preprocessing.


===Inlining===
===Flexible anonymous typing===
C* models function inlining in reverse of C and most other languages. Instead of dictating this intent at the callee's declaration site, it is instead dictated at the caller's. As long as the function is within the total system, it can be inlined using this technique. Furthermore, it is desirable to have a feeble compiler that ''always'' inlines upon request, and ''never'' does so otherwise, the opposite of what most C compilers do with the <code>inline</code> keyword (ignore it). Experienced systems programmers know this all too well, and in the real world, profile-guided manual optimisation is the name of the game anyway. So, this is a tool for that kind of task.
Since C* embodies the maxim of "data is all we have," it does not trip up the programmer when they use a variety of different phrasings of what boils down to the same underlying bit structure. In other words, it lacks the abstract type system enforcement typical of C, which might confound or prevent a programmer from working with their data in a self-evident way. C* will cause an error when two different types are assigned to one another, unless they are structure synonyms or there is an explicit cast. The compiler should also warn the programmer if they are casting a smaller variable into a larger one, as this may cause {{expl|UB|undefined behaviour}} due to lack of allocated memory. Since the type system is so self-evident, detecting this is almost always easy to do. This self-evident approach to type identity is only constrained by the facilities of law and order which act upon the type names they are applied to.


To achieve this, C* uses the back tick symbol (<code>`</code>) to prefix the function identifier at the call site, like so:
===Multiple return values===
Even though C*'s concrete type system is highly syntactically flexible, so much so that it is easy to write up an anonymous structure as a return type and handle it without issue, the language nonetheless provides multiple return values directly without such boilerplate, in much the same fashion as seen in other programming languages. Types are comma separated, and the subroutine call site can receive any variety of them by comma, dropping unneeded values using the pronoun <code>_</code>.


<pre>
===Statement suites===
void foo( void )
C* provides a high-level semantic parallelism with what are called ''statement suites'', or simply ''suites''. This is a maximisation of C's lack of ordering of expression evaluation (not to be confused with order of operator precedence): entire statements can be conjoined or "separated" using commas <code>,</code> instead of semicolons <code>;</code>, destroying the ordering of their execution in the program semantics and allowing the statements to be executed in arbitrary time (ergo, in any order ''or all at once''). Naturally, this precludes routines so grouped from having any data interdependency, so one cannot use the output or parameters of one function to feed another in the same suite. Despite this, statement suites prove to be the fundamental building block of fine-grained parallel computing in C* – they are conceptually analogous to the machinations of {{expl|VLIW|Very Long Instruction Word}} processors that dispatch several orders of logic at once.
{
  /* ... */
}


void bar( void )
<div class="mw-code"><span style="color:rgba(24,24,24,0.333)"><span style="color:hsl(155,45%,55%)">int</span> <span style="color:hsl(86,35%,55%)">f1</span><span style="color:rgba(24,24,24,0.667)">(</span> <span style="color:hsl(155,55%,40%)">void</span> <span style="color:rgba(24,24,24,0.667)">);</span><br/><span style="color:hsl(155,45%,55%)">int</span><span style="color:rgba(24,24,24,0.667)">,</span> <span style="color:hsl(155,45%,55%)">int</span> <span style="color:hsl(86,35%,55%)">f2</span><span style="color:rgba(24,24,24,0.667)">(</span> <span style="color:hsl(155,55%,40%)">void</span> <span style="color:rgba(24,24,24,0.667)">);</span><br/><span style="color:hsl(155,45%,55%)">int</span> <span style="color:hsl(86,35%,55%)">f3</span><span style="color:rgba(24,24,24,0.667)">(</span> <span style="color:hsl(155,55%,40%)">void</span> <span style="color:rgba(24,24,24,0.667)">);</span><br/><br/>/* ANSI C approach: the compiler must guess it is parallelisable */</span><br/><span style="color:hsl(86,30%,65%)">f1</span><span style="color:rgba(24,24,24,0.667)">( );</span> <span style="color:hsl(86,30%,65%)">f2</span><span style="color:rgba(24,24,24,0.667)">( );</span> <span style="color:hsl(86,30%,65%)">f3</span><span style="color:rgba(24,24,24,0.667)">( );</span><br/><br/><span style="color:rgba(24,24,24,0.333)">/* C* statement suite approach: we say these can happen in any order */</span><br/><br/><span style="color:hsl(86,30%,65%)">f1</span><span style="color:rgba(24,24,24,0.667)">( ),</span> <span style="color:hsl(86,30%,65%)">f2</span><span style="color:rgba(24,24,24,0.667)">( ),</span> <span style="color:hsl(86,30%,65%)">f3</span><span style="color:rgba(24,24,24,0.667)">( );</span></div>
{
  /* this calls a proper separate function implemented elsewhere */
  foo( );


  /* this inlines foo( ) right here, always */
To cope with ambiguity in other situations where commas are used, one can use parentheses to disambiguate, like so:
  `foo( );
}
</pre>


<!-- XXX: Needs work
<div class="mw-code"><span style="color:rgba(24,24,24,0.333)">/* capturing multiple return values */</span><br/><span style="color:hsl(86,30%,65%)">f1</span><span style="color:rgba(24,24,24,0.667)">( ), (</span>a<span style="color:rgba(24,24,24,0.667)">,</span> b<span style="color:rgba(24,24,24,0.667)">) =</span> <span style="color:hsl(86,30%,65%)">f2</span><span style="color:rgba(24,24,24,0.667)">( ),</span> <span style="color:hsl(86,30%,65%)">f3</span><span style="color:rgba(24,24,24,0.667)">( );</span><br/><br/><span style="color:rgba(24,24,24,0.333)">/* the parentheses of routine calls also disambiguates */</span><br/><span style="color:hsl(155,55%,40%)">void</span> <span style="color:hsl(86,35%,55%)">g1</span><span style="color:rgba(24,24,24,0.667)">(</span> <span style="color:hsl(155,45%,55%)">int</span><span style="color:rgba(24,24,24,0.667)">,</span> <span style="color:hsl(155,45%,55%)">int</span> <span style="color:rgba(24,24,24,0.667)">);</span><br/><span style="color:hsl(86,30%,65%)">g1</span><span style="color:rgba(24,24,24,0.667)">(</span> <span style="color:hsl(86,30%,65%)">f1</span><span style="color:rgba(24,24,24,0.667)">( ),</span> <span style="color:hsl(86,30%,65%)">f3</span><span style="color:rgba(24,24,24,0.667)">( ) );</span><br/><span style="color:rgba(24,24,24,0.333)">/* temporarily storing return values is necessary to forward multiple<br/>&nbsp;* return values into later subroutine calls */</span><br/>a<span style="color:rgba(24,24,24,0.667)">,</span> b <span style="color:rgba(24,24,24,0.667)">=</span> <span style="color:hsl(86,30%,65%)">f2</span><span style="color:rgba(24,24,24,0.667)">( );</span><br/><span style="color:hsl(86,30%,65%)">g1</span><span style="color:rgba(24,24,24,0.667)">(</span> a<span style="color:rgba(24,24,24,0.667)">,</span> b <span style="color:rgba(24,24,24,0.667)">);</span> <span style="color:rgba(24,24,24,0.333)">/* OK */</span><br/><span style="color:hsl(86,30%,65%)">g1</span><span style="color:rgba(24,24,24,0.667)">(</span> <span style="color:hsl(86,30%,65%)">f2</span><span style="color:rgba(24,24,24,0.667)">( ) );</span> <span style="color:rgba(24,24,24,0.333)">/* error, g1( ) expects 2 arguments, got only the first<br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;* value from f2( ) */</span></div>
==Potential gotchas for C programmers==
In C*, the <code>sizeof</code> operator does not return its value in terms of bytes, but rather bits. This decision was made deliberately in the spirit of C*'s sole fundamental denomination of sizing being the bit, and in consideration that erroneous use is simple enough that reasoning it out in a debugger or through static analysis is not too difficult.


C* is derived from ANSI C, as published by ANSI in 1989 and republished by ISO in 1990. In the interest of hygiene, C* does not permit mixing declarations and code like C99 does, nor does it support line comments.
===Explicit inlining===
C* models subroutine inlining in reverse of C and most other languages. Instead of dictating the intent to inline at the callee's site, it is instead dictated at the caller's. As long as the subroutine is within the total system, it can be inlined using this technique. Furthermore, it is desirable to have a feeble compiler that always inlines upon request, and never does so otherwise, the opposite of what most C compilers do with the inline keyword (ignore it). Experienced systems programmers know this all too well, and in the real world, profile-guided manual optimisation is the name of the game anyway. So, this is a tool for that kind of task.


In the interest of explicitness, C* packs all structs by default, leaving padding bits to the purview of the programmer to allocate or not. C* compilers need not pad the ending of structs to bytes either, rather leaving it to the allocators of the stack and heap to decide according to their own constraints.
To achieve this, C* uses the back tick symbol (<code>`</code>) to prefix the subroutine identifier at the call site, like so:


-->
<div class="mw-code"><span style="color:hsl(155,55%,40%)">void</span> <span style="color:hsl(86,35%,55%)">foo</span><span style="color:rgba(24,24,24,0.667)">(</span> <span style="color:hsl(155,55%,40%)">void</span> <span style="color:rgba(24,24,24,0.667)">)<br/>{</span><br/>&#9;<span style="color:rgba(24,24,24,0.333)">/* ... */</span><br/><span style="color:rgba(24,24,24,0.667)">}</span><br/><br/><span style="color:hsl(155,55%,40%)">void</span> <span style="color:hsl(86,35%,55%)">bar</span><span style="color:rgba(24,24,24,0.667)">(</span> <span style="color:hsl(155,55%,40%)">void</span> <span style="color:rgba(24,24,24,0.667)">)<br/>{</span><br/>&#9;<span style="color:rgba(24,24,24,0.333)">/* this calls a proper separate subroutine implemented elsewhere */</span><br/>&#9;<span style="color:hsl(86,30%,65%)">foo</span><span style="color:rgba(24,24,24,0.667)">(&nbsp;);</span><br/><br/>&#9;<span style="color:rgba(24,24,24,0.333)">/* this inlines foo(&nbsp;) right here, always */</span><br/>&#9;<span style="color:rgba(24,24,24,0.667)">`</span><span style="color:hsl(86,30%,65%)">foo</span><span style="color:rgba(24,24,24,0.667)">(&nbsp;);<br/>}</span></div>
==Other features==
C* provides some other more typical features that are often found in other languages. The language introduces keywords for function signatures, among them <code>noreturn</code> and <code>pure</code>. The <code>noreturn</code> keyword can be used as a return type in place of <code>void</code>, to signify a function not only does not return a value (as <code>void</code> suggests), but does not even return execution flow at all. The <code>pure</code> keyword can prefix the type in a function signature to invoke enforcement upon it as a ''pure function'', that is, a function that does not read or write to any external state, and thus always gives the same output for a given set of inputs. Currently there are no plans to provide decorators that influence optimisation, such as those handling inlining, cold/hot spots, or leaf functions.


===Code transclusion===
===Code transclusion===
C has long struggled to cope with the problem of inline assembly code, given the diversity of architectures and dialects, as well as the lack of a viable path to standardisation. C* attempts to solve this with a feature it calls '''code transclusion''', which looks like so:
C has long struggled to cope with the problem of inline assembly code, given the diversity of architectures and dialects, as well as the lack of a viable path to standardisation. C* attempts to solve this with a feature it calls '''code transclusion'''. Observe:


<pre>
<div class="mw-code"><span style="color:hsl(155,55%,40%)">void</span> <span style="color:hsl(86,35%,55%)">foo</span><span style="color:rgba(24,24,24,0.667)">(</span> <span style="color:hsl(155,55%,40%)">void</span> <span style="color:rgba(24,24,24,0.667)">)<br/>{</span><br/>&#9;<span style="color:rgba(24,24,24,0.333)">/* ... */</span><br/><br/>&#9;<span style="color:hsl(86,30%,65%)">bar</span><span style="color:rgba(24,24,24,0.667)">!(&nbsp;);<br/>}</span></div>
void foo( void )
{
  /* ... */


  bar!( );
In this code, <code>bar</code> is a symbol resolved like other routines and data. However, it is middled with an exclamation point <code>!</code>, as it is not a routine call, with the usual implications for calling conventions. The contents of bar are transcluded into the point in <code>foo(&nbsp;)</code> where it appears, which some programmers might call "naked" assembly in the old parlance. Since transclusions can never take parameters or offer return values, their forward declarations are neither necessary nor permitted.
}</pre>
 
In this code, <code>bar</code> is a symbol resolved like other functions and data. However, it is middled with an exclamation point <code>!</code>, as it is not a function call, with the usual implications for calling conventions. The contents of <code>bar</code> are transcluded into the point in <code>foo( )</code> where it appears, which some programmers might call "naked" assembly in the old parlance. Since transclusions can never take parameters or offer return values, their forward declarations are neither necessary nor permitted.


In practise, <code>bar</code> might be written in a proper assembly language source file, and integrated in the build step along with the C* source and other sources.
In practise, <code>bar</code> might be written in a proper assembly language source file, and integrated in the build step along with the C* source and other sources.


===New operators===
===New operators===
C* introduces several new operators:
C* introduces a menagerie of new arithmetic and logical operators.


* the minimum operator <code>&lt;?</code> and its assignment variant <code>&lt;?=</code>
{| style="border:2px solid rgba(48,48,48,0.75);background-color:rgba(240,240,240,0.75);border-radius:5px 5px 10px 10px;border-bottom:none;padding:4px;border-spacing:2px;max-width:900px"
** the assignment variant has short-circuit logic: if the destination variable is smaller, it is left unchanged; otherwise, it is set to the smaller incoming value
|-
* the maximum operator <code>&gt;?</code> and its assignment variant <code>&gt;?=</code>
! style="background-color:rgba(184,184,184,0.75);padding:4px" | Name
** the assignment variant has short-circuit logic: if the destination variable is larger, it is left unchanged; otherwise, it is set to the larger incoming value
! style="background-color:rgba(184,184,184,0.75);padding:4px" | Symbol
* the count leading zeroes unary operator <code>^?</code>
! style="background-color:rgba(184,184,184,0.75);padding:4px" | Variant
* the count trailing zeroes unary operator <code>?^</code>
! style="background-color:rgba(184,184,184,0.75);padding:4px" | Notes
* the population count unary operator <code>^^</code>
|-
* the arithmetic (signed shift right) operator <code>&gt;&gt;&gt;</code> and its assignment variant <code>&gt;&gt;&gt;=</code>
| style="background-color:rgba(224,224,224,0.75);padding:4px" | three-way compare
* the rotate left operator <code>&lt;&lt;&lt;</code> and its assignment variant <code>&lt;&lt;&lt;=</code>
| style="background-color:rgba(224,224,224,0.75);padding:4px;text-align:center" | <tt>&lt;=&gt;</tt>
* the short-circuit logical <tt>AND</tt> assignment operator <code>&&=</code>
| style="background-color:rgba(224,224,224,0.75);padding:4px;text-align:center" | &nbsp;
** this kind of assignment statement only sets the left-hand side variable if its contents are nonzero (truthy)
| style="background-color:rgba(224,224,224,0.75);padding:4px" | the return value of this comparator is balanced tri-state logic represented as an ephemeral enumeration of <math>(-1, 0, 1)</math>, corresponding to open (high-Z), low and high circuit states respectively
* the short-circuit logical <tt>OR</tt> assignment operator <code>||=</code>
|-
** this kind of assignment statement only sets the left-hand side variable if its contents are zero (falsey)
| style="background-color:rgba(224,224,224,0.75);padding:4px" | minimum
| style="background-color:rgba(224,224,224,0.75);padding:4px;text-align:center" | <tt>&lt;?</tt>
| style="background-color:rgba(224,224,224,0.75);padding:4px;text-align:center" | <tt>&lt;?=</tt>
| style="background-color:rgba(224,224,224,0.75);padding:4px" | the assignment variant has short-circuit logic: if the destination variable is smaller, it is left unchanged; otherwise, it is set to the smaller incoming value
|-
| style="background-color:rgba(224,224,224,0.75);padding:4px" | maximum
| style="background-color:rgba(224,224,224,0.75);padding:4px;text-align:center" | <tt>&gt;?</tt>
| style="background-color:rgba(224,224,224,0.75);padding:4px;text-align:center" | <tt>&gt;?=</tt>
| style="background-color:rgba(224,224,224,0.75);padding:4px" | the assignment variant has short-circuit logic: if the destination variable is larger, it is left unchanged; otherwise, it is set to the larger incoming value
|-
| style="background-color:rgba(224,224,224,0.75);padding:4px" | count leading zeroes (unary)
| style="background-color:rgba(224,224,224,0.75);padding:4px;text-align:center" | <tt>^?</tt>
| style="background-color:rgba(224,224,224,0.75);padding:4px;text-align:center" | &nbsp;
| style="background-color:rgba(224,224,224,0.75);padding:4px" | &nbsp;
|-
| style="background-color:rgba(224,224,224,0.75);padding:4px" | count trailing zeroes (unary)
| style="background-color:rgba(224,224,224,0.75);padding:4px;text-align:center" | <tt>?^</tt>
| style="background-color:rgba(224,224,224,0.75);padding:4px;text-align:center" | &nbsp;
| style="background-color:rgba(224,224,224,0.75);padding:4px" | &nbsp;
|-
| style="background-color:rgba(224,224,224,0.75);padding:4px" | population count (unary)
| style="background-color:rgba(224,224,224,0.75);padding:4px;text-align:center" | <tt>^^</tt>
| style="background-color:rgba(224,224,224,0.75);padding:4px;text-align:center" | &nbsp;
| style="background-color:rgba(224,224,224,0.75);padding:4px" | &nbsp;
|-
| style="background-color:rgba(224,224,224,0.75);padding:4px" | arithmetic (signed) shift right
| style="background-color:rgba(224,224,224,0.75);padding:4px;text-align:center" | <tt>&gt;&gt;&gt;</tt>
| style="background-color:rgba(224,224,224,0.75);padding:4px;text-align:center" | <tt>&gt;&gt;&gt;=</tt>
| style="background-color:rgba(224,224,224,0.75);padding:4px" | &nbsp;
|-
| style="background-color:rgba(224,224,224,0.75);padding:4px" | rotate left
| style="background-color:rgba(224,224,224,0.75);padding:4px;text-align:center" | <tt>&lt;&lt;&lt;</tt>
| style="background-color:rgba(224,224,224,0.75);padding:4px;text-align:center" | <tt>&lt;&lt;&lt;=</tt>
| style="background-color:rgba(224,224,224,0.75);padding:4px" | &nbsp;
|-
| style="background-color:rgba(224,224,224,0.75);padding:4px" | short-circuit logical AND assignment
| style="background-color:rgba(224,224,224,0.75);padding:4px;text-align:center" | &nbsp;
| style="background-color:rgba(224,224,224,0.75);padding:4px;text-align:center" | <tt>&amp;&amp;=</tt>
| style="background-color:rgba(224,224,224,0.75);padding:4px" | this kind of assignment statement only sets the left-hand side variable if its contents are nonzero (truthy)
|-
| style="background-color:rgba(224,224,224,0.75);padding:4px" | short-circuit logical OR assignment
| style="background-color:rgba(224,224,224,0.75);padding:4px;text-align:center" | &nbsp;
| style="background-color:rgba(224,224,224,0.75);padding:4px;text-align:center" | <tt>&vert;&vert;=</tt>
| style="background-color:rgba(224,224,224,0.75);padding:4px" | this kind of assignment statement only sets the left-hand side variable if its contents are zero (falsey)
|}


====Division and modulus====
====Division and modulus====
C* also overloads the meaning of both the division operator <code>/</code> and the modulus operator <code>%</code> in a way that maintains semantic compatibility with C. It uses the multiple return values feature of C* borrowed from Go to make the following semantic equivalences:
C* also overloads the meaning of both the division operator <code>/</code> and the modulus operator <code>%</code> in a way that maintains semantic compatibility with C. It uses the multiple return values feature of C* borrowed from Go to make the following semantic equivalences:


<pre>
<div class="mw-code"><span style="color:rgba(24,24,24,0.333)">/* these all have the same effect */</span><br/>a <span style="color:rgba(24,24,24,0.667)">=</span> x <span style="color:rgba(24,24,24,0.667)">/</span> y<span style="color:rgba(24,24,24,0.667)">;</span><br/>a<span style="color:rgba(24,24,24,0.667)">,</span> _ <span style="color:rgba(24,24,24,0.667)">=</span> x <span style="color:rgba(24,24,24,0.667)">/</span> y<span style="color:rgba(24,24,24,0.667)">;</span><br/>b <span style="color:rgba(24,24,24,0.667)">=</span> x <span style="color:rgba(24,24,24,0.667)">%</span> y<span style="color:rgba(24,24,24,0.667)">;</span><br/>b<span style="color:rgba(24,24,24,0.667)">,</span> _ <span style="color:rgba(24,24,24,0.667)">=</span> x <span style="color:rgba(24,24,24,0.667)">%</span> y<span style="color:rgba(24,24,24,0.667)">;</span><br/><br/>a<span style="color:rgba(24,24,24,0.667)">,</span> b <span style="color:rgba(24,24,24,0.667)">=</span> x <span style="color:rgba(24,24,24,0.667)">/</span> y<span style="color:rgba(24,24,24,0.667)">;</span><br/>b<span style="color:rgba(24,24,24,0.667)">,</span> a <span style="color:rgba(24,24,24,0.667)">=</span> x <span style="color:rgba(24,24,24,0.667)">%</span> y<span style="color:rgba(24,24,24,0.667)">;</span></div>
/* these all have the same effect */
 
a = x / y;
This was done because it is virtually universal that division is performed as a single operation with two output values (the quotient and the remainder). It is prudent to have the language reflect that mechanical reality.
a, _ = x / y;
 
Additionally, C* also irons out the semantics of division and modulus, so that integer division will always round towards zero, and modulus will behave consistently so that the result always carries the sign of the second operand.
 
===Source encoding===
The language requires all source code to be ASCII compliant in its raw form. No other encodings of source text are supported, although there is the doc comment exception. This basically means that inside of what C* considers doc comments—that is, comments that begin with <code>/**</code> and end with <code>*/</code>—non-ASCII octets are permitted and will be ignored like the rest of the content of the comment. This makes it possible to encode UTF-8 text in comments, for example, which is important for non-English languages.
 
===Identifier limits===
When C was originally standardised by ANSI in the 1980s, the standard came with some very conservative translation limits on symbols and other identifiers:
* 31 significant initial characters in an internal identifier or a macro name
* 6 significant initial characters in an external identifier
* 511 external identifiers in one translation unit
* 127 identifiers with block scope declared in one block
* 1024 macro identifiers simultaneously defined in one preprocessing translation unit
 
In the 1999 update ratified by ISO, the limits were increased:
* 63 significant initial characters in an internal identifier or a macro name
* 31 significant initial characters in an external identifier
* 4095 external identifiers in one translation unit
* 511 identifiers with block scope declared in one block
* 4095 macro identifiers simultaneously defined in one preprocessing translation unit
 
As Mike Kinghan explained on Stack Overflow<ref>[//stackoverflow.com/questions/38035628/c-why-did-ansi-only-specify-six-characters-for-the-minimum-number-of-significa/38042724#38042724 "Why did ANSI only specify six characters for the minimum number of significant characters in an external identifier?".] Stack Overflow. 2016-06-26. Archived from [//stackoverflow.com/questions/38035628/c-why-did-ansi-only-specify-six-characters-for-the-minimum-number-of-significa/38042724#38042724 the original] on 2024-11-14. Retrieved 2024-11-14.</ref>:
 
<blockquote>There weren't any pitchforks on the lawn of the ANSI C committee when it stipulated 6 initial significant characters for external identifiers. That meant a conforming compiler could be implemented on IBM mainframes; and it need not be one to which the PDP-11 assembler would be inadequate and need not be able to emit code that couldn't even be linked with Fortan 77. It was a wholly unsensational choice.</blockquote>
 
Moreover:
 
<blockquote>An IBM 3380E hard disc unit, 1985, had a capacity of 5.0GB and cost around $120K; $270K in today's money. It had a transfer rate of 24Mbps, about 2% of what my laptop's HD delivers. With parameters like that, every byte that the system had to store, read or write, every disc rotation, every clock cycle, weighed on the bottom line. And this had always been the case, only more so. A miser-like economy of storage, at byte granularity, was ingrained in programming practice and those short public symbol names was just one ingrained expression of it.
 
The problem was not, of course, that the puny, fabulously expensive mainframes and minis that dominated the culture and the counsels of the 1980s could not have supported languages, compilers, linkers and programming practices in which this miserly economy of storage (and everything else) was tossed away. Of course they could, if everybody had one, like a laptop or a mobile phone. What they couldn't do, without it, was support the huge multi-user workloads that they were bought to run. The software needed to be excruciatingly lean to do so much with so little.</blockquote>
 
Doing so much with so little was a practical matter in the 1980s, and while it has been outmoded by uncritically functionalist programming styles today, we understand [[mechanicalism]] as this very same practise ''as a principle.'' It does not matter that a Nexus smartphone is a hundred times faster and a hundred times cheaper today than a mainframe was in 1905; waste is still waste.
 
As the preeminent mechanicalist systems programming language, C* also imposes limits on symbols and other identifiers. Specifically:
 
* up to 4 levels of <code>domain</code> hierarchy including the final symbol
* 15 significant initial characters in an internal identifier or a macro name
* 15 significant initial characters in an externally visible identifier
** with a maximised <code>domain</code> hierarchy usage this makes the maximum "fully-qualified name" size 60 characters.
* 255 identifiers with block scope declared in one block
* 65535 external identifiers in one translation unit
* 65535 macro identifiers simultaneously defined in one preprocessing translation unit
 
Furthermore, C* imposes a somewhat stricter rule on the ''meaning'' of these limits: conforming implementations '''must not''' permit symbols or other identifiers that exceed the limits defined above. Interworking with foreign code is provided by the <code>extern</code> ABI feature.


b = x % y;
====extern ABI====
b, _ = x % y;
While C* provides the <code>extern</code> linkage modifier as it exists in C, and implies it onto non-<code>static</code> functions as C does, it also provides a C++-like ABI specifier suffix to this keyword as well. Not only does this allow implementations to expose different symbol mangling regimes opaquely to the programmer, in C* it also serves as the veneer to incorporate long foreign symbols into C*'s constraints and, depending on the API design at hand, its domain module system. <code>extern</code> ABI symbols are exempted from the identifier limits imposed in the rest of the language; they can be as long as the compiling machine's memory permits. At the minimum, conforming compilers must support the <code>extern "C"</code> ABI, but may opt to support other ones such as their default C++ ABIs.


a, b = x / y;
==Work to be done==
b, a = x % y;</pre>
===A comprehensive ABI===
C* strives to provide the maximum possible power to exactly specify the function of a system. While there are many facilities for this "within the reservation", so to speak, much conceptual work still needs to be done about the Application Binary Interface ''in the popular sense of the term.'' Building up [[Oración]] should help finalise a comprehensive solution to this, so that it is easy for C* programmers to exactly specify the ''lingua franca'' of their programs in a machine-agnostic way.


This was done because it is virtually universal that division is performed as a single operation with two output values (the quotient and the remainder), and it is prudent to have the language reflect that mechanical reality.
===What C* is not===
Otherwise known as "criticisms from the dustbin", this is to be a collection of common criticisms and my answers thereof. Programming language theory has been a notorious hotbed of intellectual rot, so creating a kind of critic's FAQ will help immensely in pre-empting the handful of questions that will no doubt be asked a thousand times over before it is all said and done. There are very good reasons for why everything in C* is the way that it is.


Additionally, C* also irons out the semantics of division and modulus, so that integer division will always round to zero, and modulus will behave consistently so that the result always carries the sign of the second operand.
===Glossary===
A glossary of terms can help readers familiarise themselves with the radically different approach that C* takes in dealing with computing and systems theory. It can also serve as a stimulus for further expansion on such topics by the writers.


===Doc comment exception===
==References==
Normally, all C* source code must be in the form of an array or stream of octets, each carrying one ASCII-only character excluding <tt>NUL</tt> (1-127). For the sake of comments in non-Latin scripts, the language makes one exception to this requirement in the space of ''documentation comments'', which are of the form <code>/** ... */</code>. Within these symbols, the compiler must not error out upon encountering any octets in the range 128-255 until the comment terminates. The language remains totally agnostic to further encodings, although it should be noted that encodings that are not supersets of ASCII (such as UTF-8) may create compatibility issues or cause unexpected compilation errors.
<references />

Latest revision as of 22:31, 21 December 2024

C*
Flavour image for the C* logo.
Paradigm imperative, procedural, structured
Designed by Alexander Nicholi
First appeared December, 2020
Typing discipline static, strong, manifest, nominal, concrete
Filename extensions .cst, .hst
Influenced by
Ada, C, Thinking Machines C*, D, Go
Influenced
C~, C♭

C* (pronounced C star) is an imperative, procedural, mechanicalist systems programming language created by Alexander Nicholi. It facilitates comprehensive compile time guarantees of fully arbitrary mutability of state. Work on it began in early 2020, and publications first started appearing towards the end of that year. Work on it has been ongoing ever since. The name C* is meant to "point to" the aspects of C which have been overlooked or even derided by the field of programming language theorists, chiefly its embodiment of data-oriented design and self-evident semantics.

C* was created as a result of informatics research conducted by its creator that uncovered a new paradigm of programming called mechanicalism, a school of thought about computing architecture that draws on such concepts as data-oriented design in direct contrast to functionalism, the generic, extensible kind of programming taken for granted as universal before. C* leans into a property of C called communicativity by researcher Stephen Kell[1], radically reforming its abstract machine model and introducing several new features that provide programmers more expressive power without compromise to the bit-precise yet portable niche that C occupies. In a nutshell, it is a more canonical language for generalised bare-metal software, such as drivers and kernels.

Overview

Practicality of complexity before now has always been achieved through genericism. C* rejects this prescription, and instead capitalises on the explicit semantics of C. Genericism is anathema to systems programming, because it inherently obfuscates a program as a means to compartmentalise complexity. This does not address the complexity in a way that programmers can positively appreciate, rather trying to "do away" with it and let them pretend it is something abstract when it is not. The complexity itself is already more than enough for a human brain to handle – this abstract metaprogramming is surely a denial of the system in any real mind and makes for bad systems.

Instead, C* capitalises on C's communicativity to make obvious and clear the details of a system. It then provides a slew of new semantic mechanisms for constraining valid state, and a specification oriented around bits alone instead of abstract objects or octets of any length. This is called law & order and it is the key feature of C*.

General semantics

There are many common words in informatics that primary literature on C* has to be careful with. Such effort plays a large part in substantiating the design philosophy of the language as well as its general adherence to mechanicalism as a school of thought. Among other terms, this includes:

  • avoiding the term function to refer to callables, instead using routine
  • using octet to refer to the magnitude of data, reserving byte only for the mass of data (see Octet, not byte)

C* also adds many new terms that build upon the existing lexicon of our field, including:

  • deeper elucidation of the term marshalling with regard to data validation in addition to mere serialisation
  • a new term suite to refer to semantically parallelisable statements joined by the comma operator in place of statement terminators
  • segment routines, or simply segroutines, referring to labels inside routines with external visibility for jumping into

Changes from C

Like C, C* is an imperative programming language in the ALGOL tradition. C* was derived specifically from ANSI C, that is, the C language as standardised by the American National Standards Institute's working group X3J11[2]. From C, it inherits the following characteristics:

  • a full set of control flow keywords
  • all arithmetic and bitwise operators present in C
  • subroutines and procedures
  • the CPP
  • the concept of the "compilation unit"

However, C is often more illustratively described by what you might expect out of a language that it lacks, and C* is characteristically no different. Among other things, there are many high-level constructs that will never be provisioned by the C* language, including:

  • nested subroutine declarations
  • object-oriented programming facilities, including
    • classes (or any form of non-POD structure really)
    • parameter polymorphism
    • operator overloading
    • constructors/destructors
    • methods
  • garbage collection
  • lambdas
  • templates or generics
  • reflection
  • concurrency
  • module declaration system (imports)
  • test harnessing
  • line comments
  • strong typing

C* provides many great additions and changes to ANSI C instead. Changes and removals include:

  • removal of all built-in types save for void
  • removal of all support for all source encodings other than ASCII*
  • removal of trigraphs
  • change the meaning of sizeof( ) to be denominated in bits rather than octets

The changes are modest compared to the many fantastic additions C* brings to the language, including:

  • law & order
    • marshalling for run-time law enforcement
    • transient variable lifetime traversal for compile-time law enforcement
  • one fundamental primitive type, the bit
  • bit-oriented struct definitions
  • attributes for declaring complex behaviour about types for the compiler to implement
  • the underscore pronoun, which serves many purposes
  • flex structs
  • explicit padding
  • struct synonyms
  • legal enums
  • enumerated unions
  • union alignment
  • union punning
  • Unicode literals
  • binary numeric literals
  • transmogrification
  • multiple return values
  • explicit inlining
  • code transclusion
  • several new arithmetic operators

Law & order

Laws

C* provides law & order through a few new keywords and a key concept. First among these is, of course, the law keyword, which defines and optionally names constraints to be used on data types. This is semantically accomplished through a series of boolean expressions, like so:

/* anonymous law applied to type */
law : s32
{
_ >= 0;
_ < 1000;
};


/* named law */
law leet
{
_ == 1337;
};


/* applying previously declared law */
law leet : u16;

These laws are enforced upon the data types they apply to at compile time through an exhaustive program analysis. The compiler works backwards to create a control flow tree representing a transient variable lifetime, and exhaustively validates the initialisation and modifications of that transient variable against the laws enacted upon it. This is made practical by formalising the boundaries of the compilation unit as a border between "native" and "foreign" code, which in the essay is called the total system. Data which is confined to this total system gains the performance benefit of fully arbitrary validity checking at compile time.

Marshalling

To deal with foreign code, C* provides a mechanism called marshalling. This is a definition of marshalling expanded from its current meaning in computer science as a synonym for serialisation, to also include the act of validating data being serialised according to arbitrary schemas, or in the case of C*, arbitrary laws. All subroutines that are callable from outside the total system must provide marshalling blocks for validating their variables, like so:

typedef bit[8] mybyte;
typedef bit[32] u32;

law : mybyte
{
_ < 255;
_ != 0;
};


void foo( mybyte a, u32 c )
{

/* marshalling happens one parameter at a time */
marshal a
{
if(a == 0)
{

/* a MUST be set to a valid value through marshalling
* but, we can check around that, smartly */

a = 1;
break;
}


/* exit the routine otherwise */
return;
}


/* this is the minimum required
* if ANY laws enacted upon u32, this will fail to compile */

marshal c
{ }

/* alternatively, this minimal marshalling will do law checks
* and return upon any failures, since marshal blocks are only
* entered when the runtime checks for the laws fail */

marshal c
{
return;
}
}

Marshal blocks can only reference the parameter they are marshalling. They may declare and modify local variables with automatic storage duration, and may only call pure routines with such parameters.

Transient variable lifetime traversal

Transient variable lifetime is a term coined to refer to the ephemeral object of interest in performing the C* compiler's most valuable task: compile-time law enforcement. It refers to the exhaustive graphing of data as it flows through various names in all possible call graphs of a program. In a nutshell, we imagine a "variable" as a kind of ephemeral object that "travels" around the program, being modified and passed on. Consider the following C* code:

typedef bit[32] myu32;
typedef bit[32] u32;

/* Must be less than 100 and cannot ever equal 17 */
law : myu32
{
_ < 100;
_ != 17;
};


/* Fibonacci sequence will satisfy both of those constraints, but how do we know? */
void fibonacci( void )
{

u32 i, n;
myu32 t0, t1;
u32 tn;

t0 = 0;
t1 = 1;

/* print the first two terms */
fprintf( stdout, "Fibonacci series: %d, %d", t0, t1 );

/* print 3rd to 12th terms */
for(u32 i = 2; i < 12; ++i)
{

tn = t0 + t1;
fprintf( stdout, ", %d", tn );
t0 = t1;
t1 = tn;
}
}

Going through the Fibonacci sequence, we know that if we limit the number of terms to 12, we will never reach 100. But how does the C* compiler break this down?

It evaluates the possible values of each variable term that it is enforcing at every point they are modified, in an exhaustive recursive fashion. This means that the algorithmic complexity of verification is proportional to the algorithmic complexity of the program being verified. The verification algorithm will first minimise the possible program space by factoring in all constant values, which in the routine above is very helpful.

In cases where the output of the routine depends on outside variables, the laws applied to the incoming parameters are assumed to hold either directly or by marshalling, but beyond that, it will assume worst values for the type's size. In the case of complex algorithms, it will often happen that it is not trivial to guarantee the validity of a given combination of laws; for example, if a foreign n was given of type u32, it may require brute force search to ensure that some other variable dependent on n never equals 17.

The default behaviour of the C* compiler in situations like these is to error out, asking the programmer to give it more certainty about the data it is dealing with. Practically speaking, this involves creating more concise types with more permissible laws. For instance, if you want to be sure a 40 bit integer never overflows via multiplication, you need to make sure the types multiplied to create it have a bit size that, summed together, does not exceed 40 bits. Like so:

typedef bit[64] outint;
typedef bit[64] term0;
typedef bit[64] term1;

law : outint
{
_ <= 0xFFFFFFFFFF;
};


law : term0
{
_ <= 0xFFFFFF;
};


law : term1
{
_ <= 0xFFFF;
};


void mysubroutine( void )
{

myout a;
term0 b = /* ... */;
term1 c = /* ... */;

/* This is OK */
a = b * c;
}

If the above code was modified to have laws that permit any valid addition or subtraction but not multiplication (ergo, the laws are only enough to allow linear mixing, not quadratic), then a = b + c would still be valid, but the compiler would error out if it found a = b * c. The precautionary principle is in play.

However, it will be possible to put the compiler into that brute force mode, potentially at great computational cost, in order to arrive definitively at an answer to that question. This is accomplished using a framework of satisfiability solver programs, which provide a bitcode proof that can be saved by a programmer for trivial verification of its satisfiability once the solution is found.

Introducing the transient variable lifetime to this approach means that we transcend callsite boundaries within the total system to thoroughly simulate all subroutines in a program as one big meta-routine. This means that we can get more information about possible states than is possible when marshalling without attached formal proofs. Data confined within a total system has a far smaller number of possible states. More precisely, the number of possible states it has is directly proportional to the number of changes it has. The larger the program, the longer it takes to validate, but that does not scale exponentially in its own right. It merely follows the algorithmic complexity of the program being validated.

Concrete type system

C* has no abstract type system, not even a weak one as provided by ANSI C. Instead, it has a simple yet rigorous concrete type system based on three fundamental primitive types: bit, void and fifo. They are considered fundamental because they are built into the language, and primitive as they are elementary types (as opposed to complex ones created by structure and bifurcated by dot notation). More generally, the bit is "something", while void is "nothing", and fifo is a secret third thing currently only valid in the context of transmogrification for the transient transit of data.

C* uses its radically simplified set of primitive types as a basis for a powerfully expressive complex type system that far outshines that provided in C. Enumerations, structures, and unions have all received major semantic changes at the outset, and on top of this provide a host of new expressions that are not possible in C. Many expressions are entirely new to the imperative paradigm thanks to the conceptual distinction between functionalism and mechanicalism mentioned already. In other words, techniques and concepts previously only possible in the abstract through functional programming are now accessible in concrete way.

Enumerations

Enumerations have received comparatively modest treatment in the design of C*. They still behave as they do in C, with one major conceptual difference: enumerations do not have an implicit typing of int (or any implicit typing at all for that matter). Instead, the values of enumerations in C* hold fully arbitrary integers and floating-point numbers, using a big integer implementation for the former and a Type I Unum implementation for the latter. This is made practical by the architecture of the Oración assembler backend that the Sirius C* compiler will use. Enumerations do not have a C♯ style "namespacing" effect, so they put their symbols into the main symbol namespace like everything else does in a C compilation unit.

Concrete enumerations

Since enums in C* are ephemeral by default, it is useful then to have a non-ephemeral enum variant that does indeed carry a concrete type (ergo, a definite size). We call these concrete enumerations and they are denoted in a familiar syntax borrowed from C++:

enum : u32
{
PRIMA,
SECUNDA,
MAX_MYENUM
};
/* all of the above enum identifiers are u32s */

Concrete enumerations have a set of anonymous members that fill in every possible number representable by their underlying type not already named. They are semantically interchangeable with each other so long as they have the same underlying type, unless they are also legal enumerations.

Legal enumerations

C* introduces a variant of the typical C enumeration called the legal enumeration, distinguished by the composite opening keyword enum law as opposed to just enum. These work the same as normal enumerations with one major difference: the values of the enumeration's members cannot be set. This means that legal enumerations always start at zero, increment by one, and never hold non-integers. This has two benefits: first, it aids in compile-time law enforcement, and second, it enables the sizeof( ) expression to be taken from the enumeration, yielding a headcount of how many members it has. This obviates the need for manually specifying the common pattern of MAX_* as the final member of an enumeration denoting its size. Observe:

enum law types
{
FIRST,
SECOND,
THIRD
};

enum law err
{
/* Not legal: */
FIRST = 0,
/* also not legal: */
FOOBAR = 42
/* legal enums cannot have their members set to arbitrary values */
};

/* this gives you the common pattern of a final enum member MAX_* */
/* sizeof(enum law types) == 3 */

Legal enumerations can also be combined with concrete enumerations to further aid the transient variable lifetime analyser and achieve more comprehensive law enforcement.

Structures

C* has radically changed the semantics of structures to be oriented and denominated in bits, rather than members and octets with implicit padding.

Inline structures

Inline structures are a syntactic sugar that makes it practical to define new primitive types. They are constituted by a struct definition that has one and only one member, the name of which is the pronoun _. With this, all data typed to such a structure will not use dot notation to access the data, but will access it directly as a primitive type. Observe:

typedef struct
{
bit[8] _;
}

u8;

/* this is how we do it */
u8 foo = 255;
foo = 254;

Explicit padding

This is one of the more radical departures C* makes not just from C but even from related languages based on C: implicit "padding" does not exist in the abstract machine model for C*. Since the compiler will never be permitted to insert padding surreptitiously on its own, it is up to the programmer to perform this explicitly. Explicit padding is constituted by structure members named by the pronoun _ where there is more than one member.

Structure synonyms

C* provides a way to declare distinct structs to be "synonyms" of each other, meaning that they can be treated interchangeably in subroutine calls. This is only permitted in cases where, differences in ordering and explicit padding aside, they are semantically identical. Therefore, struct synonyms are a way to automate logic-free transmogrification of structured data. The syntax is as follows:

struct prima
{
bit[8] octet;
bit[24] _;
bit[32] alpha;
bit[64] beta;
};


struct secunda {
bit[64] beta;
bit[32] alpha;
bit[8] octet;
};


/* declare them synonyms */
struct secunda : struct prima;

/* directionality does not matter */
struct prima : struct secunda;

Attributes

An implementation will have a known list of attributes that convey certain information about a new primitive. These are construed through a braced list of string literals at the end of the typedef's body, before the name, like so:

typedef struct
{
bit _[8] { "signed2" };
}
s8;

In this example, the attribute signed2 conveys that the number is signed using two's complement. This means the most significant bit of the type will be treated as a sign bit by the implementation using two's complement.

Some attributes and their provisions include:

Attribute Description
signed1 Signed integers using one's complement
signed2 Signed integers using two's complement
ieee-bin16 IEEE 754 floating point binary16
ieee-bin32 IEEE 754 floating point binary32
ieee-bin64 IEEE 754 floating point binary64
ieee-bin128 IEEE 754 floating point binary128
ieee-bin256 IEEE 754 floating point binary256
bigint Unlimited precision integer

Flex structs

One of the major limitations of C is its inability to parameterise the size structure members as part of the overlying type signature. C has always been able to do this with constant expressions in its array syntax, and since C99 it can do this dynamically with VLAs. It just cannot do this with structures, but C* can. We call these flex structs.

Below is an example of how flex structs prove useful. It is an abbreviated implementation of a singly linked list node, first using the C-compatible pointer approach, and then using the C*-specific inline approach.

struct uni_1ll_node;

struct uni_1ll_node
{
u8 *
data;
struct uni_1ll_node * next;
};


void foo( void *, struct uni_1ll_node );

struct uni_1ll_inode[_];

struct uni_1ll_inode[x]
{

u8 data[x];
struct uni_1ll_inode[x] * next;
};


void fooi( void *, struct uni_1ll_inode[64] );

The value of these semantics is obvious, as it makes it possible to avoid the indirection of having pointers, without requiring abuse of the preprocessor or template/macro metaprogramming. The number, being a compile-time constant, is simply worked into the final definition of the type as if it were present in the struct body itself, dictating its ultimate size and the resulting ABI requirements.

Structure punning

C* provides a feature called structure punning, which allows members of a structure definition to be "faked out" for constant data instead of holding a variable value as they usually do. The programmer can also decide whether to hold space for such data in memory, allowing it to be cast variable later on. Observe:

struct foo
{
u16 a;
u16 b;
/* punned without storage */
u32 sig = 0xDEADBEEF;
};


struct bar
{
u16 a;
u16 b;
/* punned with storage */
u32 sig := 0xDEADBEEF;
};


struct baz
{
u16 a;
u16 b;
/* not punned */
u32 sig;
};


extern struct foo x1;
/* this would cause UB due to potential lack of storage allocated to y */
struct bar y1 = (struct bar)x1;
/* same problem */
struct baz z1 = (struct baz)x1;

extern struct bar x2;
/* this is not UB as it merely leaves inaccessible the sig data
* however it can be confounding as .sig is no longer being accessed
* from memory as it was */

struct foo y2 = (struct foo)x2;
/* this is USEFUL as it makes the backed pun not punny */
struct baz z2 = (struct baz)x2;

extern struct baz x3;
/* this is not UB as it merely leaves inaccessible the sig data
* however it can be confounding as .sig is no longer being accessed
* from memory as it was */

struct foo y3 = (struct foo)x3;
/* this WILL cause .sig to be overwritten with 0xDEADBEEF, thereby
* destroying whatever variable data was stored there */

struct bar z3 = (struct bar)x3;

Unions

C* also provides several new semantics for unions, most of which help achieve common high-performance optimisation patterns that C programmers would ordinarily be forced to rely on messy CPP macros or assembly code to achieve.

Union alignment

It is helpful to be able to align members of a union relative to one another on a bit level, similar to paragraph alignment to the left or right. This obviates the need for cumbersome struct boilerplate to create artificial alignment with other union members that it is not defined in direct reference to, which is more semantically straightforward. C* uses the >_> symbol following the member type name to signify rightward alignment (i.e. towards the least significant bit) with respect to the largest member, and likewise <_< to signify leftward alignment (i.e. towards the most significant bit). The default alignment (i.e. indeterminate alignment) can be denoted explicitly using the >_< symbol if desired.

typedef bit[16] myu16;
typedef bit[9] myu9;

union
{
/* largest member, no alignment needed, but given anyway */
myu16 >_< prima;
/* right align as carats point right */
myu9 >_> secunda;
};

Although alignment is not explicitly required by the language like padding is, the default alignment is indeterminate and the compiler reserves the right to align members however it pleases unless instructed otherwise with the above symbols.

Union punning

It is very useful to be able to pun the values of other union members in order to overload the bitfield in cases where one or more bits of a field may be zero and therefore usable for other purposes. A common example of this is flag storage in pointers, where a pointer may offer 1 or more bits on the least significant end that are always zero (guaranteed by either the hardware or by the allocator). Punning requires explicit union member alignment. Here is an example of a pointer type where the alignment is assumed to be at a minimum of 4, giving us two bits to use as flags:

typedef bit[32] ptri;

union
{
ptri ptr { _[0:1] = 0b00, flags = 0b00 };
bit[2] >_> flags;
};

This example is redundant but fully explains the feature at work here:

  1. we have the pointer itself, ptr
    • it's 32 bits wide as defined by the typedef for the sake of example
    • being the largest member, it needs no explicit alignment
  2. following its declaration is a braced list, which contains several items
    1. the pronoun, _, which refers to itself, ptr
      • the subscripting of the pronoun defines which range of bits we are dealing with
      • the "assignment" of the two least significant bits (as denoted before) to zero, meaning:
        • these bits of ptr will always read back as zero
        • writing to these bits has no effect
    2. flags, which refers to the member of that name in the union
      • setting it to zero, which causes ptr to be treated as if flags is always zero regardless of its actual value
        • this is redundant with the pronoun field given previously
        • if given by itself, it would cause reads of the two least significant bits to always return zero
        • however, it alone would not stop writing nonzero bits to it as part of a write to the ptr field
  3. finally we have the flags member field, which is a bit[2]
    • it lacks any punning annotations in a braced list, unlike ptr
    • it is explicitly aligned to the right, so that its bits correspond to the least significant bits of ptr

Enumerated unions

C* provides a way to enumerate unions, providing some scaffolding to leverage unions in the conventional manner without imposing the semantic restrictions of "tagged unions" typical to other languages.

/* using the enum law example code from above ... */

struct a
{
enum law types t;
union b : t
{
u16 >_> x;
u32 >_> y;
u64 >_< z;
};
};

/* access syntax: */
a.t = FIRST;
/* this is accessing x within */
a.b = 0xFFDD;
/* you can bypass the enumeration of the union and modify directly */
a.b.y = 0xFFFFFFFF;
/* a.b.x will then be equal to 0xFFFF, not 0xFFDD */

Notably, this functionality can be used to create a kind of synthetic "optional" data type that respects C*'s paradigm:

enum law bool
{
FALSE,
TRUE
};

struct optional_foo
{
enum law bool is;
union val : is
{
bit nop { _ = 0 };
struct foo data;
};
};


/* optional_foo would then be accessed like so: */
extern union optional_foo fooey;

switch(fooey.is)
{

case FALSE:
/* fooey.val is bit that is always zero */
break;
case TRUE:
/* fooey.val is a struct foo */
break;
}

Dealing with data

C* has made many useful departures from the archaic models of C in how it conceptualises data for the programmer. It sports a new and much-simplified abstract model of computer memory. It also has new semantics for string and character literals that not only add Unicode support but do so in an encoding-agnostic manner that fully leverages C*'s powerful new type system. Encoding of literals and numeric constant data in general has been supercharged by the inclusion of a pushdown automaton that performs transmogrification of data in source format into its desired binary form in the final program.

Abstract memory model

The language considers three broad categories of memory. All storage falls into one of these categories, regardless of its mechanism of storage. In other words, this is distinct from the other memory distinction between "automatic" stack-allocated memory and "manual" heap-allocated memory.

Types of memory
Private memory Same as in OpenCL parlance; in CUDA terms it may be called "registers" or "local memory"; in general-purpose CPU terms it is thread-local storage. Writable, and only accessible from a single execution context.
Shared memory Same as in CUDA parlance; in OpenCL terms it is called "local memory"; in CPU terms it is typical, often heap-allocated memory. Writable and accessible from potentially multiple concurrent execution contexts; this is the only memory category that demands manual synchronisation.
Constant memory Shared memory that is read-only for all execution contexts. This memory can be shared and used without need for synchronisation across multiple execution contexts, but is only modified at the point it was declared and initialised.

Literals

C* imposes several strong measures to help contain complexity in systems programming. Many of these show up in the specifics of construing data literally within source code using "literals".

C* does provide a binary literal notation that is identical to that of C++ and many other languages:

u32 foo = 0b10110011;

C* does not deviate from C in its syntax for octal literals.

There are literal notations for two kinds of "text": 7-bit "narrow" ASCII, and 21-bit "wide" Unicode. As in C, single quotes are used to construe character literals, and double quotes are used to construe string literals. C* uses the @ symbol prefixed to the opening marks to denote the literal as being Unicode instead of ASCII. Observe:

'a'; /* literal ASCII lowercase A (number 97) */
@'a'; /* literal Unicode lowercase A (U+0061) */
'\377'; /* ASCII DEL (number 127) */
@'\u2018'; /* Unicode opening single quote (U+2018) */
"Good morning, Vietnam!\n"; /* literal ASCII string */
@"Good morning, Vietnam!\n"; /* literal Unicode string */

Transmogrification

Notice
This feature is under heavy redevelopment. Its final form will probably be quite different from the work-in-progress you see here.

C*'s provisions for literals are usually not going to translate into their ideal storage medium as-is. Everything defaults to being bit-packed, including ASCII as 7-bit and Unicode as 21-bit, which is hostile to most CPU architectures. In order to help programmers work through such problems without the follies of metaprogramming, C* provides syntax for a kind of transmogrifier subroutine that is worked through to transform literals into their final form within the program.

Transmogrifier subroutine are the sole context for C*'s third fundamental primitive type, fifo, as well as two operators, <- and ->. Definitionally, a subroutine is a transmogrifier subroutine if its return type is fifo and its sole parameter is an anonymous fifo type. The <- is the output operator, streaming bits as output as the routine progresses; the -> is the input operator, streaming bits as input as the routine progresses.

With these, it is possible, for example, to write a transmogrifier that takes a Unicode string literal and outputs it as UTF-8:

fifo ustr2utf8( fifo )
{

bit c[21];
u8 n;

/* take in 21 bits from input FIFO
* n reports how many bits were available */

c, n <- 21;

if(n < 21)
{

/* do something different, potentially */
}

/* ... implementation ... */

/* send out 8 bits */

c & 0xFF -> 8;
}


fifo str2utf8( fifo )
{

bit c[7];
u8 n;

c, n <- 7;

if(n == 0)
{

return;
}

else if(n < 7)
{

/* Houston... we have a problem */
}

/* 7-bit ASCII into 8-bit stream */
0 -> 1;
c -> 7;
}


const u8 * my_unicode = ustr2utf8@"\u201CBlah blah\u201D";
const u8 * my_ascii = str2utf8"Good morning, Vietnam!";

Other features

Domains

What other languages often call "modules" or "namespaces" are provided by C* as domains. Domains are a simple semantic grouping tool for making coherent collections of symbols and identifiers. In contrast to C++ namespaces, they are not lexically "grouping", that is, they are merely declared to exist, and used in other declarations directly as desired. Observe:

domain sys;
domain sys.io, sys.mem, sys.utf;

using sys.io.printf; /* printf is now in scope unqualified */
using sys.io.printf = p; /* p refers to sys.io.printf now */
using struct sys.io.file; /* now struct file is in scope */
using struct sys.io.file = struct f; /* struct f declared */
using struct sys.io.file = f; /* ERROR: cannot cross namespaces */
typedef struct f f; /* if you really wanted to do that, this is how */

While in a vacuum, C*'s domains hardly justify their existence in light of the sufficiency of normal symbols as in ANSI C, the utility can be realised in how it makes possible smarter contextualisation of parameters for routine calls and structure initialisation, like so:

domain mylib;

enum mylib.foo
{
PRIMA,
SECUNDA
};

void mylib.bar( enum mylib.foo e );

/* regardless of the presence of using statements, the enum would be
* contextualised in the routine call so it never needs qualifying */

mylib.bar( PRIMA );

/* or, with using */
using mylib.bar;
bar( SECUNDA );
/* never brought in enum mylib.foo directly */

/* this can be avoided by globalising the call with a leading dot */

.mylib.bar( mylib.PRIMA );

/* or, with using */
using mylib.bar;
.
bar( mylib.SECUNDA );

The main danger of domains is obfuscation of interface – for this reason, C* disallows using statements outside of block scope, and additionally forbids any form of "wildcard" selectors in using statements entirely. Since the above feature of soft contextualisation applies to all identifiers in a given domain, the application of using statements as a general "decluttering" is avoided and refitted solely as a tool for bringing desired subroutines into scope. In this spirit, C* mandates that using statements are hoisted to the top of the block scope, before all variable declarations.

Segment routines

C* provides a way to export labels inside routine bodies as ABI symbols, giving it multiple points of entry. This is useful for bypassing certain kinds of housekeeping code for performance reasons when one knows that the variants held by such boilerplate hold without it executing. Consider this:

void foo( knot20 *, u32 );
void foo:quick( );

void foo( knot20 * cord, u32 knotcount )
{

u32 i, j;
u32 olimit = knotcount - 1;
u32 ilimit = 0x40000;

goto algo;

quick::
olimit = 0;
ilimit = knotcount * 0x40000;

algo:
for(j = 0; j <= olimit; ++j)
{

u32 * const d = (u32 *)cord[j];

for(i = 0; i < ilimit; ++i)
{

d[i] ^= d[i];
}
}
}

There are several things being described here. On the high level, we are conceptually dealing with an algorithm that can work with modular memory – that is, memory that has been intelligently segmented to be digestible on processors with small memories (think 16-bit). This routine was heavily modified to algebraically move all of the differences in execution into different starting variables that cause the desired behaviour. The algorithm inside is illustrative: it merely XORs the input data with itself, inverting it. The idea is to support bookkeeping that advances the algorithm's work linearly over one knot, and then adjusts the data pointer to work on the next knot, but with a catch: if we instead call into the :quick( ) segroutine, it will treat the first knot in the cord as the start of a contiguous block of knots, skipping all of the overhead of advancing from one knot to the next because we are told they immediately follow one another in memory.

Some other details that are important include:

  • forward declaration of segroutines must always have an empty parameter list
    • segroutines always take the same number and types of parameters as their parent routine
  • at the assembly level, segroutine labels in the implementation imply a hidden stack allocation to make space for all of the variables hoisted and declared at the top of the routine
    • beware of compound declaration-definitions! at the start of a segroutine label, the variables are only declared, not initialised
  • the hidden stack allocation also implies a hidden goto inserted immediately before it, targeted to the position immediately after it
    • therefore, explicit gotos like in the example above incur no performance penalty, and give the programmer full control over expression differentiation in the rest of the routine

While this example merely shows different initial values of stack variables for the purposes of illustration, a more real-world implementation of a modular memory aware algorithm may instead insert machine-specific plumbing code, such as incrementing a segment register or switching active banks, while offering the segroutine as a bypass to this potentially costly part of execution in cases where it is known to not be needed.

Flexible anonymous typing

Since C* embodies the maxim of "data is all we have," it does not trip up the programmer when they use a variety of different phrasings of what boils down to the same underlying bit structure. In other words, it lacks the abstract type system enforcement typical of C, which might confound or prevent a programmer from working with their data in a self-evident way. C* will cause an error when two different types are assigned to one another, unless they are structure synonyms or there is an explicit cast. The compiler should also warn the programmer if they are casting a smaller variable into a larger one, as this may cause UB due to lack of allocated memory. Since the type system is so self-evident, detecting this is almost always easy to do. This self-evident approach to type identity is only constrained by the facilities of law and order which act upon the type names they are applied to.

Multiple return values

Even though C*'s concrete type system is highly syntactically flexible, so much so that it is easy to write up an anonymous structure as a return type and handle it without issue, the language nonetheless provides multiple return values directly without such boilerplate, in much the same fashion as seen in other programming languages. Types are comma separated, and the subroutine call site can receive any variety of them by comma, dropping unneeded values using the pronoun _.

Statement suites

C* provides a high-level semantic parallelism with what are called statement suites, or simply suites. This is a maximisation of C's lack of ordering of expression evaluation (not to be confused with order of operator precedence): entire statements can be conjoined or "separated" using commas , instead of semicolons ;, destroying the ordering of their execution in the program semantics and allowing the statements to be executed in arbitrary time (ergo, in any order or all at once). Naturally, this precludes routines so grouped from having any data interdependency, so one cannot use the output or parameters of one function to feed another in the same suite. Despite this, statement suites prove to be the fundamental building block of fine-grained parallel computing in C* – they are conceptually analogous to the machinations of VLIW processors that dispatch several orders of logic at once.

int f1( void );
int, int f2( void );
int f3( void );

/* ANSI C approach: the compiler must guess it is parallelisable */

f1( ); f2( ); f3( );

/* C* statement suite approach: we say these can happen in any order */

f1( ), f2( ), f3( );

To cope with ambiguity in other situations where commas are used, one can use parentheses to disambiguate, like so:

/* capturing multiple return values */
f1( ), (a, b) = f2( ), f3( );

/* the parentheses of routine calls also disambiguates */
void g1( int, int );
g1( f1( ), f3( ) );
/* temporarily storing return values is necessary to forward multiple
 * return values into later subroutine calls */

a, b = f2( );
g1( a, b ); /* OK */
g1( f2( ) ); /* error, g1( ) expects 2 arguments, got only the first
              * value from f2( ) */

Explicit inlining

C* models subroutine inlining in reverse of C and most other languages. Instead of dictating the intent to inline at the callee's site, it is instead dictated at the caller's. As long as the subroutine is within the total system, it can be inlined using this technique. Furthermore, it is desirable to have a feeble compiler that always inlines upon request, and never does so otherwise, the opposite of what most C compilers do with the inline keyword (ignore it). Experienced systems programmers know this all too well, and in the real world, profile-guided manual optimisation is the name of the game anyway. So, this is a tool for that kind of task.

To achieve this, C* uses the back tick symbol (`) to prefix the subroutine identifier at the call site, like so:

void foo( void )
{

/* ... */
}

void bar( void )
{

/* this calls a proper separate subroutine implemented elsewhere */
foo( );

/* this inlines foo( ) right here, always */
`foo( );
}

Code transclusion

C has long struggled to cope with the problem of inline assembly code, given the diversity of architectures and dialects, as well as the lack of a viable path to standardisation. C* attempts to solve this with a feature it calls code transclusion. Observe:

void foo( void )
{

/* ... */

bar!( );
}

In this code, bar is a symbol resolved like other routines and data. However, it is middled with an exclamation point !, as it is not a routine call, with the usual implications for calling conventions. The contents of bar are transcluded into the point in foo( ) where it appears, which some programmers might call "naked" assembly in the old parlance. Since transclusions can never take parameters or offer return values, their forward declarations are neither necessary nor permitted.

In practise, bar might be written in a proper assembly language source file, and integrated in the build step along with the C* source and other sources.

New operators

C* introduces a menagerie of new arithmetic and logical operators.

Name Symbol Variant Notes
three-way compare <=>   the return value of this comparator is balanced tri-state logic represented as an ephemeral enumeration of , corresponding to open (high-Z), low and high circuit states respectively
minimum <? <?= the assignment variant has short-circuit logic: if the destination variable is smaller, it is left unchanged; otherwise, it is set to the smaller incoming value
maximum >? >?= the assignment variant has short-circuit logic: if the destination variable is larger, it is left unchanged; otherwise, it is set to the larger incoming value
count leading zeroes (unary) ^?    
count trailing zeroes (unary) ?^    
population count (unary) ^^    
arithmetic (signed) shift right >>> >>>=  
rotate left <<< <<<=  
short-circuit logical AND assignment   &&= this kind of assignment statement only sets the left-hand side variable if its contents are nonzero (truthy)
short-circuit logical OR assignment   ||= this kind of assignment statement only sets the left-hand side variable if its contents are zero (falsey)

Division and modulus

C* also overloads the meaning of both the division operator / and the modulus operator % in a way that maintains semantic compatibility with C. It uses the multiple return values feature of C* borrowed from Go to make the following semantic equivalences:

/* these all have the same effect */
a = x / y;
a, _ = x / y;
b = x % y;
b, _ = x % y;

a, b = x / y;
b, a = x % y;

This was done because it is virtually universal that division is performed as a single operation with two output values (the quotient and the remainder). It is prudent to have the language reflect that mechanical reality.

Additionally, C* also irons out the semantics of division and modulus, so that integer division will always round towards zero, and modulus will behave consistently so that the result always carries the sign of the second operand.

Source encoding

The language requires all source code to be ASCII compliant in its raw form. No other encodings of source text are supported, although there is the doc comment exception. This basically means that inside of what C* considers doc comments—that is, comments that begin with /** and end with */—non-ASCII octets are permitted and will be ignored like the rest of the content of the comment. This makes it possible to encode UTF-8 text in comments, for example, which is important for non-English languages.

Identifier limits

When C was originally standardised by ANSI in the 1980s, the standard came with some very conservative translation limits on symbols and other identifiers:

  • 31 significant initial characters in an internal identifier or a macro name
  • 6 significant initial characters in an external identifier
  • 511 external identifiers in one translation unit
  • 127 identifiers with block scope declared in one block
  • 1024 macro identifiers simultaneously defined in one preprocessing translation unit

In the 1999 update ratified by ISO, the limits were increased:

  • 63 significant initial characters in an internal identifier or a macro name
  • 31 significant initial characters in an external identifier
  • 4095 external identifiers in one translation unit
  • 511 identifiers with block scope declared in one block
  • 4095 macro identifiers simultaneously defined in one preprocessing translation unit

As Mike Kinghan explained on Stack Overflow[3]:

There weren't any pitchforks on the lawn of the ANSI C committee when it stipulated 6 initial significant characters for external identifiers. That meant a conforming compiler could be implemented on IBM mainframes; and it need not be one to which the PDP-11 assembler would be inadequate and need not be able to emit code that couldn't even be linked with Fortan 77. It was a wholly unsensational choice.

Moreover:

An IBM 3380E hard disc unit, 1985, had a capacity of 5.0GB and cost around $120K; $270K in today's money. It had a transfer rate of 24Mbps, about 2% of what my laptop's HD delivers. With parameters like that, every byte that the system had to store, read or write, every disc rotation, every clock cycle, weighed on the bottom line. And this had always been the case, only more so. A miser-like economy of storage, at byte granularity, was ingrained in programming practice and those short public symbol names was just one ingrained expression of it. The problem was not, of course, that the puny, fabulously expensive mainframes and minis that dominated the culture and the counsels of the 1980s could not have supported languages, compilers, linkers and programming practices in which this miserly economy of storage (and everything else) was tossed away. Of course they could, if everybody had one, like a laptop or a mobile phone. What they couldn't do, without it, was support the huge multi-user workloads that they were bought to run. The software needed to be excruciatingly lean to do so much with so little.

Doing so much with so little was a practical matter in the 1980s, and while it has been outmoded by uncritically functionalist programming styles today, we understand mechanicalism as this very same practise as a principle. It does not matter that a Nexus smartphone is a hundred times faster and a hundred times cheaper today than a mainframe was in 1905; waste is still waste.

As the preeminent mechanicalist systems programming language, C* also imposes limits on symbols and other identifiers. Specifically:

  • up to 4 levels of domain hierarchy including the final symbol
  • 15 significant initial characters in an internal identifier or a macro name
  • 15 significant initial characters in an externally visible identifier
    • with a maximised domain hierarchy usage this makes the maximum "fully-qualified name" size 60 characters.
  • 255 identifiers with block scope declared in one block
  • 65535 external identifiers in one translation unit
  • 65535 macro identifiers simultaneously defined in one preprocessing translation unit

Furthermore, C* imposes a somewhat stricter rule on the meaning of these limits: conforming implementations must not permit symbols or other identifiers that exceed the limits defined above. Interworking with foreign code is provided by the extern ABI feature.

extern ABI

While C* provides the extern linkage modifier as it exists in C, and implies it onto non-static functions as C does, it also provides a C++-like ABI specifier suffix to this keyword as well. Not only does this allow implementations to expose different symbol mangling regimes opaquely to the programmer, in C* it also serves as the veneer to incorporate long foreign symbols into C*'s constraints and, depending on the API design at hand, its domain module system. extern ABI symbols are exempted from the identifier limits imposed in the rest of the language; they can be as long as the compiling machine's memory permits. At the minimum, conforming compilers must support the extern "C" ABI, but may opt to support other ones such as their default C++ ABIs.

Work to be done

A comprehensive ABI

C* strives to provide the maximum possible power to exactly specify the function of a system. While there are many facilities for this "within the reservation", so to speak, much conceptual work still needs to be done about the Application Binary Interface in the popular sense of the term. Building up Oración should help finalise a comprehensive solution to this, so that it is easy for C* programmers to exactly specify the lingua franca of their programs in a machine-agnostic way.

What C* is not

Otherwise known as "criticisms from the dustbin", this is to be a collection of common criticisms and my answers thereof. Programming language theory has been a notorious hotbed of intellectual rot, so creating a kind of critic's FAQ will help immensely in pre-empting the handful of questions that will no doubt be asked a thousand times over before it is all said and done. There are very good reasons for why everything in C* is the way that it is.

Glossary

A glossary of terms can help readers familiarise themselves with the radically different approach that C* takes in dealing with computing and systems theory. It can also serve as a stimulus for further expansion on such topics by the writers.

References

  1. "Some were meant for C: the endurance of an unmanageable language." Association for Computing Machinery. Retrieved 2024-02-01.
  2. "ANSI Standards Action Vol. 36, #48" (PDF). American National Standards Institute. 2005-12-02. Archived from the original on 2016-03-04. Retrieved 2009-08-06.
  3. "Why did ANSI only specify six characters for the minimum number of significant characters in an external identifier?". Stack Overflow. 2016-06-26. Archived from the original on 2024-11-14. Retrieved 2024-11-14.