Skip to content

3. Lexical structure

The ESL lexer is a hand-written tokenizer in kernel/src/esl/lexer.rs. Whitespace and comments are discarded; everything else becomes a Token. ESL is whitespace-insensitive — line breaks are not significant — so source layout is purely cosmetic.

3.1. Whitespace and comments

Spaces, tabs, carriage returns, and newlines separate tokens but carry no meaning. Two comment forms:

// line comment to end of line
/* block comment;
may span multiple lines */

Block comments do not nest.

3.2. Keywords

Keywords are case-sensitive lowercase, except for Construct which is title-case (it occupies an expression position alongside identifiers and is capitalised to match the convention for type-construction operations).

Top-level (declaration) keywords mark the start of a top-level form:

namespace class property resource program data codata

Expression keywords appear inside program bodies:

let case match returning Construct map reduce corecord

Boolean literals lex as BoolLit tokens:

true false

There is no null/undefined literal — Eigon-JSON uses property absence to express “no value”.

3.3. Identifiers and qualified names

IDENT ::= [a-zA-Z_] [a-zA-Z0-9_]*

A bare identifier (Dog, name, short_name) is one Ident token. There’s no distinction at the lexer level between class names, property names, variable names, and component names — the parser disambiguates by position.

A qualified name like core:string or ex:Dog is three tokens: Ident("core") Colon Ident("string"). The parser stitches the three back into a single QualifiedName AST node. This is the same convention as EigenQL (EigenQL §3.3) and it has the same consequence: the colon is also a standalone punctuation token (used in type annotations like let x : T = ...), so the parser disambiguates core:string from let x : string purely by what tokens precede the colon.

Qualified names resolve through namespace aliases declared with namespace (chapter 4 §4.1). A bare identifier in a position that expects an IRI either resolves through context (e.g., a component name resolves to a registered component IRI) or is a plain field name.

3.4. Literals

STRING ::= '"' (ESC | [^"\\])* '"'
ESC ::= '\"' | '\\' | '\n' | '\r' | '\t'
INTEGER ::= '-'? [0-9]+
FLOAT ::= '-'? [0-9]+ '.' [0-9]+ ([eE] ('+'|'-')? [0-9]+)?
BOOLEAN ::= 'true' | 'false'

String escapes are limited to the five forms above. Numbers may begin with a leading - (parsed as part of the literal); subtraction operators are not currently in the expression grammar.

3.5. Operators and punctuation

TokenMeaning
=Assignment in let, field bindings in Construct, namespace declarations
->Function-type arrow, used in program ... : T -> U and codata observation types like {j < i} -> Stream(A, j)
\Lambda introducer (ASCII), e.g. \x -> e
λLambda introducer (Unicode, U+03BB), e.g. λx -> e
.Property projection (input.ex:name)
;Statement separator (between lets, between observations)
:Type annotation, qualified-name separator, parent class in class C : Parent
,List separator
<Size bound in bounded binders ({j < i}, {j : core:Size < i})
( )Function call args, parameter telescopes
{ }Block delimiters: declaration bodies, expression blocks, bounded binders
[ ]Reserved (currently unused in expressions)

The two lambda forms \ and λ are interchangeable. \ is the ASCII escape hatch for keyboards without easy Unicode entry; λ is the canonical form.

3.6. End-of-input

The lexer always emits a trailing Eof token. Parsers that consume all tokens up to Eof finish cleanly; consumers that stop early can use the position carried by Eof for end-of-file diagnostics.

3.7. What the lexer does not do

The lexer is intentionally minimal:

  • No keyword resolution beyond the fixed table. Dog is Ident("Dog"); whether it’s a class or a variable name is decided later.
  • No qualified-name composition. core:string is three tokens; the parser combines them.
  • No size-bound parsing. The < token is emitted whenever it appears; the parser decides whether < is a size bound (inside {...}) or some other use.
  • No bracket matching. Mismatched braces surface as parser errors, not lexer errors.

3.8. Comparison with EigenQL’s lexer

For readers coming from the EigenQL guide, the differences are:

  • ESL has no ? variable prefix — variables and property names look identical to identifiers (parser disambiguates by position).
  • ESL keywords are lowercase (except Construct); EigenQL keywords are uppercase (MATCH, WHERE, etc.).
  • ESL has two lambda forms (\ and λ); EigenQL has no anonymous functions.
  • ESL has the < token for size bounds; EigenQL uses < only as a comparison operator.
  • ESL has an explicit Construct keyword; EigenQL relies on RETURN [Class] { ... } for typed result construction.

Next: 4. Declarations →