The Odin Programming Language Specification

Introduction #

This is a reference manual for the Odin programming language.

Odin is a general-purpose language designed for systems programming. It is a strongly typed language with manual memory management. Programs are constructed from packages.

NOTE: THIS SPECIFICATION IS UNFINISHED AND VERY OUT OF DATE.

Notation #

The syntax is specified using Extended Backus-Naur Form (EBNF):

Production  = production_name "=" [ Expression ] "." .
Expression  = Alternative { "|" Alternative } .
Alternative = Term { Term } .
Term        = production_name | token [ "…" token ] | Group | Option | Repetition .
Group       = "(" Expression ")" .
Option      = "[" Expression "]" .
Repetition  = "{" Expression "}" .

Productions are expressions constructed from terms and the following operators, in increasing precedence:

|   alternation
()  grouping
[]  option (0 or 1 times)
{}  repetition (0 to n times)

Source code representation #

Source code is Unicode text encoded in UTF-8. The text is not canonicalized, so a single accented code point is distinct from the same character constructed from combining an accent and a letter; those are treated as two separate code points. In this document, the term character will be used to refer to a Unicode code point in the source text.

Each code point is distinct; there is case sensitivity.

Implementation restriction: A compile must disallow the NUL character (U+0000) in the source text. Implementation restriction: A compile may ignore a UTF-8-encoded byte order mark (U+FEFF) if it is the first Unicode code point in the source text. A byte order mark must be disallowed anywhere else in the source text.

Characters #

The following terms are used to denote specific Unicode character classes:

newline        = /* the Unicode code point U+000A */
unicode_char   = /* an arbitrary Unicode code point except newline */
unicode_letter = /* a Unicode code point classified as "Letter" */
unicode_digit  = /* a Unicode code point classified as "Number, decimal digit" */

In The Unicode Standard 8.0, Section 4.5 “General Category” defines a set of character categories. Odin treats all characters in any of the Letter categories Lu, Ll, Lt, Lm, or Lo as Unicode letters, and those in the Number category Nd as Unicode digits.

Letters and digits #

The underscore character _ (U+005F) is considered a letter.

letter        = unicode_letter | "_" .
binary_digit  = "0" … "1" .
octal_digit   = "0" … "7" .
decimal_digit = "0" … "9" .
dozenal_digit = "0" … "9" | "A" … "B" | "a" … "b" .
hex_digit     = "0" … "9" | "A" … "F" | "a" … "f" .

binary_char  = binary_digit  | "_" .
octal_char   = octal_digit   | "_" .
decimal_char = decimal_digit | "_" .
dozenal_char = dozenal_digit | "_" .
hex_char     = hex_digit     | "_" .

Lexical elements #

Comments #

Comments serve as program documentation. There are three forms:

  1. Line comments start with the character sequence // and stop at the end of the line
  2. General comments start with the character sequence /* and stop with a pairing character sequence */ to allow for nested general comments
  3. Hash-bang comments start with the character sequence #! and stop at the end of the line

A comment cannot start inside a rune or string literal, or inside a line or hash-bang comment.

Tokens #

Tokens form the vocabulary of the Odin language. There four classes: identifiers, keywords, operators and punctuation, and literals. White space, formed from spaces (U+0020), horizontal tabs (U+0009), carriage returns (U+000D), and new lines (U+000A), is ignored except as it separates tokens that would otherwise combine into a single token