Shift of focus
Usual ways
Problems
Two sides of the same coin
So why is reading parse nodes so much more complicated than writing them? ...It is curious after all, the reading and writing are two faces of a same coin, it is both ways of a bijection.
Ideally, it would be possible to simply "state" the bijection between syntax and internal representation. Then, read/write functions would be built upon this declaration.
This is exactly what we will do. A syntax declaration and associated read/write functions could look like this:
syntax-def s
FunDef ::= "function" name args "=" expr
args ::= List "(" Variable "," ")"
read = readSyntax(s)
write = writeSyntax(s)
syntax-def is some customized sub-language resulting in a Syntax data. Then, the readSyntax and writeSyntax are simply functions producing a parsing/display function based on the Syntax data given above.
In practice
To make this possible, the language needs to be able to support various concepts.
The first question is on how the Syntax entity is represented. After all, it is just declarative data made of:
- Result data
- rules
- strings ("function", "=",...)
- symbols (FunDef, name, args,...)
- special constructs (List, Either, Maybe,...)
- strings ("function", "=",...)
Moreover, it is important to notice that the symbols used in the rules are either attributes of the result data or the name of another rule.
So what do we need?
- Symbols themeselves as valid values
- Full introspection:
- Access to the list of attributes of a type
- Given constructor symbol and attribute symbols, build an instance of the target type (*)
- Access to the list of attributes of a type
- Ability have "build" functions based on data. I.e. functions that return new functions.
The last point is not a problem as it is a basic concept in functional programming. The first point is a little subbtle but depends mainly on early design choices of the language, like in scheme. Getting a list of attributes of a type is common place as well. The point that provides difficulty is (*). Handling symbol manipulation to this extend and providing type safety is a delicate matter. See the manual pages on symbolic data manipulation for more information.
Structured example
Below is an example of how syntax definition data could be structured:
data Syntax
result :: Data
content :: Content
rules :: {} of (Symbol -> Content)
type Content = [] of (String | Symbol | Construct)
type Construct = List | Either | Maybe
data List
left :: String
elem :: Data
sep :: String
right :: String
Using these definitions, the initial:
syntax-def s
FunDef ::= "function" name args "=" expr
args ::= List "(" Symbol "," ")"
...would be transformed into:
Syntax s
result = 'FunDef'
content = ["function", 'name', 'args', "=", 'expr']
rules = {
'args' -> [List("(", 'Variable', ",", ")")]
}
Doesn't this break type safety?
Yes yes yes!!!
symbols are typeless, they are just symbols! not connected to any identifier!
The following is crap:
It must be clear that one can only create valid symbols. That is, symbols that are defined in the scope. Since they are defined, the symbols are also well typed. Therefore, any manipulation of them is well typed too.
It brings in a kind of meta-programming. In the sense that the code is basically data and can be manipulated like just like any other data.
The last question is how to distinguish what can be preproccessed during compile time and what has to be run at runtime.
--------------------------------------------------------------------------
Examples
- It should be able to store the type 'FunDef' (the type itself, not an instance of it)
- It should be able to store the symbol 'name' (the symbol itself, and not the value it stands for)
So it should be possible to write:
Person joe
name = "Joe"
age = 45
x = joe # the person defined above
x° = 'joe' # the symbol 'joe'
y = age # error: no age defined in scope
y° = 'age' # the symbol 'age'
x == $x° # True
joe.age # 45
x.y # error: x is a Person and has no member attribute named 'y'
x°.y° # error: x° is a symbol and has no member named 'y°'
$x°.$y° # 45
macros:
macro getAttr x y
x.y
getAttr joe age # 45
getAttr 'joe' 'age' # error: symbol 'joe' has no member named 'age'
function getAttr2 (x,y) -> z
$x.$y
getAttr2 joe age # error: "joe" is not a symbol, "age" is not a symbol
getAttr2 'joe' 'age' #45
function getAttr (x :: Symbol, y :: Symbol) -> z
z = x.y
It works as follows:
syntax syntax-def syn
syn = Syntax
readSyntax node (e:es)
function (str :: Stream) -> (
NO!!!
sub-syntaxes must be compile time transformations!!!
# defines a struct (read, write)
syntax syntax-def = (read, write)
assert
syntax-def.header == []
size(syntax-def.body) > 0
for line in syntax-def.body
size(line) >= 3
line.(1) is Symbol
line.(2) == '::='
rules = map line->rule syntax-def.body
write = function (node) -> ans
ans = interperse ++ (map toStr rule.(result))
function toStr (x) = str
if x is String
str = x
if x is Symbol
if x in rules
str = write(x)
else
str = write(rules.(x))
if x is Construct
...