Symbol graph
WARNING: this text is written as personnal notes / draft.
In a sequential program, everything is a chain of statements. It is just a list, a sequence of instructions. Here, in the declarative Arplan programming style, it is just the opposite. You state that things are equal to some others, also called referential transparency because each symbol can be replaced by it's right hand side while keeping the program exactly equivalent.
That is, there is no notion of instruction, nor order. Something is a combination of other things, which in turn are some combination of other things and so on. It is like expanding a definition in sub-parts over and over until you reach basic vasic values & functions. It does not matter in which order you put the definition, you could mix them all up, it is strictly equivalent.
Therefore, the internal representation is not a list of statements. It is a graph of symbols. Each symbol is linked to an expression, containing other symbols, linked to their respective expression and so on. The aim of this section is to build this graph based on the parse tree, and to verify it's correctness afterwards. The parse tree is a representation of what was written, syntactically checked. The symbol graph is the final result conveying what exactly the program is.
In a program, each symbol stands for something. It may be a value, a function, a data type, etc... If the symbol stands for a given expression (or function, or...) then it is highly probable that this right hand side contains also symbols. What does each symbol mean? Well, that depends on the scope and the imported modules.
The parse tree is in fact a forest of parse trees. The latter one will be transformed in a graph. For instance:
a = b + c
c = 5
b = c + 2
c = 4
is a graph with 4 nodes: a, b, c & c'.
a points to b + c' where c' points to 5
b points to c + 2 where c points to 4
...
Following rules must be enforced:
- symbols can only be defined once on a same level
- all symbols in the "right hand side" must be defined
- no cycles in the graph
But this is just part of the story. There is more than pure symbol definitions. For example, handling conditionals:
if (condition-1-a)
val = "x"
if (condition-2-a)
a = 1
if (condition-2-b)
a = 2
else
a = 3
else
a = 4
How to handle this? Shall we make conditional definitions? Like:
a =
condition 1-a & 2-a -> 1
condition 1-a & 2-b -> 2
condition 1-a -> 3
true -> 4
val =
condition 1-a -> "x"
true -> undefined value under these conditions!
But what if I write the following code:
if cond-1
a = b + c
else
a = 5
b = 4
c = 3
Here, the result would be 3 connected symbols a, b and c ...Leading to:
a =
cond-1 -> b + c
true -> 5
b = not(cond-1) -> 4
c = not(cond-1) -> 5
Such a result is however totally inconsistent. Whenever the cond-1 is satisfied, the symbol a references b & c which are undefined in this case.
Determining the correctness of these conditional expressions afterwards is both difficult and impractical. It is much easier to check directly the content of each if-block directly to see if symbols connect correctly.
So shall we connect symbols only at run time?
...
If we do, we have nevertheless to check at compile time that it will be doable at run time. It is a bit like checking every case independently, preconnecting everything in all possible cases to ensure each one is correct on its own.
How to do that efficiently?
In sequential programming, it is easy, every symbol used in an expression must be declared or defined beforehand. Here, in a declarative paradigm it is a little more subtle since symbols can be defined before as well as after, inside a sub-scope or a parent scope.
In fact, inside an "if" can be considered as a scope itself. After all, these are arbitrary design decisions. The common symbols definitions between all ifs would then be considered as the symbols seen outside the "if"s. All others would be local to the "if" block content.
...this is a bit nasty ...but I see no better way.
Some change appear also in the translation from parsing to symbol graph. For example, named functions and the definition of anonymous functions are exactly the same. Also, function and operator calls are the same as well. And many other things. However, by homogeneizing and simplifying things we obviously loose information about how the program was formatted. This can make compile or runtime error reporting a bit trickier. Keeping the same form however makes the interpretation/compilation more redundant and complex.