Parse trees -> symbol graph
Connecting symbols is not as easy as it first looks. Let's first look at some sources of difficulty.
Difficulty one: different ways to link symbols
In the end, every symbol stands for "something", may it be a value, a function call, an error... However, a symbol cannot always be directly connected to a "something", there are 3 types of connections:
- Direct connection: when the symbol is directly defined somewhere
- Param connection: this symbol is a parameter, an open slot whose value will be filled later for each function call.
- Conditional connection: we don't know yet to what it will be linked to, it depends on some condition rules (case ... of ... -> ...)
Direct connection (of x and y):
a = x + y
x = 34 + 56
y = f(78)
Param connection (of x and y):
function f(x,y) -> ans
ans = 3.14 * (x + y) ^ 2
Conditional connection (of x):
x = case aaa of
True -> "Yeah!"
False -> case bbb of
1 -> "Hmm..."
2 -> 456 + 567
? -> Error("???")
In this latter case, we don't know yet what x will be linked to, it depends on the values that aaa and bbb will take.
Difficulty two: nested anonymous functions
Let's take an example:
function strange(x,y) -> combine(function(x) -> x * y, function(y) -> x * y)
This is exactly equivalent to:
function strange(x,y) -> combine(function(x') -> x' * y, function(y') -> x * y')
In other words, parameters can overshadow themselves. Distinct identical symbols may connect to different "somethings" depending where they are in the sentence.
...one possibility though would be to forbid using same parameter names in nested functions.
Difficulty three: ambiguity
What is the result of... ?
fun = function(x) -> 3 * x
x = 4
A function that returns 3 times the param or a function that always returns 12?
To avoid such ambiguity, we simply forbid such things. For instance, it is forbidden to redefine params in function bodies. ...Of course, all this must be checked.
Difficulty four: closures
function mult(m) -> ans
ans = function(x) -> m * x
mult4 = mult(4)
x = mult4(5) # x is equal to 20
...is that really a source of difficulty. I don't know exactly...
The output of the parsing is a list of parse trees. That is: symbol definitions, but also named functions, if-else blocks, etc...
For each list of parse trees:
- make a hashtable<symbol, expression> for
- definitions
- named functions (transform in anonymous function)
- named anything
- for if-else blocks, investigate them deeper:
- explore recursively the definitons in the if-else's and make conditional expressions out of it
- for prints & do's, see later
- report doubly defined symbols!
That way, we transform the tree in a context.
Each level is a context, containing:
- the hashtable of all symbols -> (expression, child-context)
- the parent context
For each expression defined by a symbol: inside the expression, connect the symbols as follows:
- Check if it is present in the child-context
- Check if it is present in the current context
- Check in the parent and move up recursively
- All imports are declared as "level 0"