Liscript — implementing TCO



In my last article Write a Lisp interpreter in Java I summarize and briefly told her about what I wrote a few interpreters Lisp-like language called Liscript — Haskell and Java. Nothing really unique and outstanding in this, but to me it was pleasant, interesting and informative pastime. Among other features, I mentioned about the implementation of TCO (tail call optimization) — optimizing interpreter, tail calls to functions. This question has aroused the interest of individual community members, and received an offer to reveal its details in a separate article, which I tried to do. Interested please under cat.

First, I will allow myself to briefly outline some of the basics, because I will in the future refer to them for completeness.

eval Apply no — no as Miklukho Maklay


The simplest expression interpreter languages of the Lisp family can be implemented in the form of a pair of functions (pseudocode):

the
eval (expr, env) // expr - expression env - environment
if typeof(expr) = Atom // type of the expression is the atom
return lookup(expr, env) // search for a value by name in the environment
elseif typeof(expr) = List // expression type - list
evalexpr = mapeval(expr, env) // calculate each value in the list
// recursively calling eval
// and forming a new list
return apply(evalexpr, env) // calculate the list value
else
return expr // return the input value

apply (expr, env) // expr - expression env - environment
op = head(expr) // operation - the first  element  of the list
args = tail(expr) // arguments - the rest of the list
case op of
.................. // here the calculation is done
.................. // depending on the type of operation 
.................. // or special form 
.................. // their composition defines a set of primitive
... // operations and commands of the language
Function f: // the operation type function
newenv = subenv(args, f.clojure) // new environment - the child frame
return eval(f.body, newenv) // calculate the function body
// in this new environment

In the above code, much simplified, for example, we do not need and even incorrectly calculate the entire list, if the type of its operations, the conditional expression, but this is detail. Could not allocate in a separate apply function implementing a case on operations directly in the body of eval, it does not matter. Interestingly, in these functions, there are no cycles and destructive assignment and change the values of variables, so in this form, the interpreter they can be implemented in any programming language that supports function calls, recursion and having (or having the ability to create) several data types such as linked list, tree of dictionaries for the environment and types of works (structures/classes with fields). Also, if you leave aside the interaction with the user (input-output) during the computation, these functions are "pure" in functional programming terms. But even if you enable the internal I / o in the middle of the calculation, the version of the interpreter is directly implemented even in a pure functional language such as Haskell, or any imperative language with sufficient minimum capacity. But in the latter case may limit the calculation function makes extensive use of recursion, and with a little stack depth it will overflow. The situation is especially worse, if the implementation language does not support tail-call optimisation — TCO.

Problem


It is a stack overflow problem I faced when I implemented the interpreter in the Java language. Every calculation I have run in a separate thread, and the default stack size for single thread is very small. I've seen the following solutions:
the
    the
  • increase the thread stack size is set using a special key when the program starts. Not fully solve the problem — the maximum size of the stack is also limited, but does not require any cost.
  • the
  • implement another structure of the interpreter, for example, on a register machine (see SICP). Not tried it, but I think this will solve the problem only tail recursion.
  • the
  • sequential computation in multiple threads. With the realization faced with the complexity to write a reliable option.
  • the
  • transfer calculation from the stack in a bunch of jumps, etc. a Naive implementation of the iterative transformation of the entire AST has been very slow, and the best will not do.

Most likely there is a good and reliable solution to this problem, but in the end I settled on the compromise realized in the above simple version of the interpreter TCO, and, together with the possibility of increasing the thread stack size this allows in many cases to avoid overflow. The interpreter was conceived as a toy, pet project, so it is a compromise solution yet I like. But I do not exclude the possibility that in the future will come up with a better solution.

Solution


The key to the solution is to, when possible, not to call again and again recursively eval, plunging deeper into the stack, and to arrange within it a clear cycle in which to make the necessary calculations. For this, you can do so is to divide the calculation result of applying the function to the arguments on 2 options — let's call them conventionally "strict" and "lazy". With simple calculation, the function will continue to return the final result of the calculation, and when the lazy — some intermediate object, which will contain the actual function (with her own body and surroundings that are part of its attributes) and the list computed values of the arguments with which it must be summoned. Let's call this object FuncCall, here's the code class and function FuncCall (no JavaDocs because it is an internal private class, but its essence trivial and self-explanatory):

the
 /** the type of language Liscript function */
public static class Func {
/** linked list of parameter names of the function */
public ConsList pars;
/** function body */
public Object body;
/** the environment in which the created function */
public Env clojure;

/** Constructor
* @param p a unidirectional list of the names of the function parameters
* @param b the body of the function
* @param c the environment where you created the function
*/
Func(ConsList p, Object b, Env c) { pars = p; body = b; clojure = c; }

/** @return the string representation of the function */
@Override
public String toString() { return showVal(this); }
}

private static class FuncCall {
public Func f;
public HashMap<String, Object> args;

FuncCall(Func _f, HashMap<String, Object> _a) { f = _f; args = _a; }

@Override
public String toString() { return "FUNCALL:" + args.toString(); }
}

To complete the picture under the spoiler code class Env implements a hierarchical environment, with the methods:

class Env
/**
* The hierarchical structure of the environment - vocabulary connections: a string key value and a link to
* the parent environment. Structure for storing the dictionary is NOT thread safe.
*/
public class Env {
/** dictionary ties the string key object value */
public HashMap<String, Object> map;
/** reference to the parent environment */
public Env parent;

/** The constructor with a dictionary and a parent.
* @param m dictionary of string key-value
* @param p the parent environment
*/
Env (HashMap<String, Object> m, Env p) { map = m; parent = p; }

/** Constructor without parameters. Returns the environment with the empty dictionary and parent
* entourage.
*/
Env () { this(new HashMap<String, Object>(), null); }

/** sets the value at key in the next dictionary from the hierarchical structure, where
* a value exists with the given key.
* @param var string key
* @param value a value object
*/
public void setVar(String var, Object value) {
Env env = this;
while (env != null) {
if (env.map.containsKey(var)) {env.map.put(var, value); break;}

}
}

/** gets the value of a key in the next dictionary from the hierarchical structure, where
* a value exists with the given key. If it does not find - returns the key itself as
* values.
* @param var string key
* @return a value object
*/
public Object getVar(String var) {
Env env = this;
while (env != null) {
if (env.map.containsKey(var)) return env.map.get(var);
env = env.parent;
}
return var;
}

/** sets the value for key in dictionary tekumel
* @param var string key
* @param value a value object
*/
public void defVar(String var, Object value) {
this.map.put(var, value);
}

/** returns true if the given key is associated with a value in any dictionary of hierarchy up
* from the current one.
* @param var string key
* @return true/false
*/
public boolean isBounded(String var) {
Env env = this;
while (env != null) {
if (env.map.containsKey(var)) return true;
env = env.parent;
}
return false;
}
}


The logic of the work was the following: in the text of the code, the programmer explicitly marks (in my case using an additional special form of) the tail function calls. When evaluating these calls returns a lazy FuncCall, and then in the loop is the calculation, while the type of the returned result is all the same FuncCall, and as soon as we get in the calculation of another type — return its as a result. In my implementation in a loop every time you create new objects, but takes care of them built-in Java garbage collector. But we implemented an iterative imperative loop inside a recursive function is evaluated, and we stopped growing the stack at tail calls. Everything worked, the only inconvenience was that in the text of the code required to explicitly specify tail calls. Without this, the calculations have been made as usual, with immersion in the stack. Like to the interpreter itself was determined by tail calls, and figured them in a cycle. This was implemented using an additional parameter to eval is a Boolean flag strict, strict / lazy evaluation of functions. It works as follows, regardless of the flag value when calculating the function first creates an object with the given FuncCall function and the calculated values of arguments. But simple calculation the object is then calculated in the cycle but with lazy computation, while the result is of type FuncCall. If the lazy computation returns immediately created FuncCall as a result. Anywhere else the value of this flag determines the logic, but had to choose which nested calls to calculate strictly and the passed flag value in the incoming parameter (to preserve the strictness / laziness of the external evaluation). But it was not difficult — the choice is always strict computations in addition to computing the last element of the list, the result of the conditional expression (the conditions are evaluated strictly) and Rustamovich macros — these 3 case are computed from the incoming value of the flag in all examples, the interpreter correctly identified the tail calls and optimize them, including cross-recursion. In code, this looks much more concise verbal descriptions:

the
 else if (op instanceof Func) {
Func f = (Func)op;
// calculate argument values and create an object FuncCall
FuncCall fcall = new FuncCall(f, getMapArgsVals(d, io, env, f.pars, ls, true));
if (strict) {
v = fcall;
while (v instanceof FuncCall) {
FuncCall fc = (FuncCall) v;
v = eval(d, false, io, new Env(fc.args, fc.f.clojure), fc.f.body);
}
return v;
} else return fcall;

An example of the interpreter (functions foldl and foldr are defined in the standard library, here their definitions are duplicated for clarity):

the
def a (list-from-to 1 100)
=> OK
def b (list-from-to 1 100000)
=> OK
defn foldl (f a l)
(cond (null? l) a
(foldl f (f (car l) a) (cdr l)) )
=> OK
defn foldr (f a l)
(cond (null? l) a
(f (car l) (foldr f a (cdr l))) )
=> OK
foldl + 0 a
=> 5050
foldr + 0 a
=> 5050
foldl + 0.0 b
=> 5.00005E9
foldr + 0.0 b
=> java.lang.StackOverflowError

Among the special forms of language there is a service team tray outputting the print the call stack with an indication of the step computation and level of nesting of the recursive call, it can be viewed "in the context of" what challenges are obtained by different variants:
factorial (implementation with battery) - without TCO
defn f (n a) (cond (


parity Checking (cross recursion) without TCO
defn is-even (n) (cond (= n 0) true (is-odd (- n 1)) )
=> OK
defn is-odd (n) (cond (= n 0) false (is-even (- n 1)) )
=> OK
tray (is-even 5)
=> 
1 ← (is-even 5)
2 ← is-even
2 → (lambda (n) (cond (= n 0) true (is-odd (- n 1))))
2 ← (cond (= n 0) true (is-odd (- n 1)))
3 ← (= n 0)
4 ← n
4 → 5
3 → false
3 ← (is-odd (- n 1))
4 ← is-odd
4 → (lambda (n) (cond (= n 0) false (is-even (- n 1))))
4 ← (- n 1)
5 ← n
5 → 5
4 → 4
4 ← (cond (= n 0) false (is-even (- n 1)))
5 ← (= n 0)
6 ← n
6 → 4
5 → false
5 ← (is-even (- n 1))
6 ← is-even
6 → (lambda (n) (cond (= n 0) true (is-odd (- n 1))))
6 ← (- n 1)
7 ← n
7 → 4
6 → 3
6 ← (cond (= n 0) true (is-odd (- n 1)))
7 ← (= n 0)
8 ← n
8 → 3
7 → false
7 ← (is-odd (- n 1))
8 ← is-odd
8 → (lambda (n) (cond (= n 0) false (is-even (- n 1))))
8 ← (- n 1)
9 ← n
9 → 3
8 → 2
8 ← (cond (= n 0) false (is-even (- n 1)))
9 ← (= n 0)
10 ← n
10 → 2
9 → false
9 ← (is-even (- n 1))
10 ← is-even
10 → (lambda (n) (cond (= n 0) true (is-odd (- n 1))))
10 ← (- n 1)
11 ← n
11 → 2
10 → 1
10 ← (cond (= n 0) true (is-odd (- n 1)))
11 ← (= n 0)
12 ← n
12 → 1
11 → false
11 ← (is-odd (- n 1))
12 ← is-odd
12 → (lambda (n) (cond (= n 0) false (is-even (- n 1))))
12 ← (- n 1)
13 ← n
13 → 1
12 → 0
12 ← (cond (= n 0) false (is-even (- n 1)))
13 ← (= n 0)
14 ← n
14 → 0
13 → true
12 → false
11 → false
10 → false
9 → false
8 → false
7 → false
6 → false
5 → false
4 → false
3 → false
2 → false
1 → false
false


factorial (implementation with battery) - TCO c
defn f (n a) (cond (


parity Checking (cross recursion) - c TCO
defn is-even (n) (cond (= n 0) true (is-odd (- n 1)) )
=> OK
defn is-odd (n) (cond (= n 0) false (is-even (- n 1)) )
=> OK
tray (is-even 5)
=> 
1 ← (is-even 5)
2 ← is-even
2 → (lambda (n) (cond (= n 0) true (is-odd (- n 1))))
2 ← (cond (= n 0) true (is-odd (- n 1)))
3 ← (= n 0)
4 ← n
4 → 5
3 → false
3 ← (is-odd (- n 1))
4 ← is-odd
4 → (lambda (n) (cond (= n 0) false (is-even (- n 1))))
4 ← (- n 1)
5 ← n
5 → 5
4 → 4
3 → FUNCALL: {n=4}
2 → FUNCALL: {n=4}
2 ← (cond (= n 0) false (is-even (- n 1)))
3 ← (= n 0)
4 ← n
4 → 4
3 → false
3 ← (is-even (- n 1))
4 ← is-even
4 → (lambda (n) (cond (= n 0) true (is-odd (- n 1))))
4 ← (- n 1)
5 ← n
5 → 4
4 → 3
3 → FUNCALL: {n=3}
2 → FUNCALL: {n=3}
2 ← (cond (= n 0) true (is-odd (- n 1)))
3 ← (= n 0)
4 ← n
4 → 3
3 → false
3 ← (is-odd (- n 1))
4 ← is-odd
4 → (lambda (n) (cond (= n 0) false (is-even (- n 1))))
4 ← (- n 1)
5 ← n
5 → 3
4 → 2
3 → FUNCALL: {n=2}
2 → FUNCALL: {n=2}
2 ← (cond (= n 0) false (is-even (- n 1)))
3 ← (= n 0)
4 ← n
4 → 2
3 → false
3 ← (is-even (- n 1))
4 ← is-even
4 → (lambda (n) (cond (= n 0) true (is-odd (- n 1))))
4 ← (- n 1)
5 ← n
5 → 2
4 → 1
3 → FUNCALL: {n=1}
2 → FUNCALL: {n=1}
2 ← (cond (= n 0) true (is-odd (- n 1)))
3 ← (= n 0)
4 ← n
4 → 1
3 → false
3 ← (is-odd (- n 1))
4 ← is-odd
4 → (lambda (n) (cond (= n 0) false (is-even (- n 1))))
4 ← (- n 1)
5 ← n
5 → 1
4 → 0
3 → FUNCALL: {n=0}
2 → FUNCALL: {n=0}
2 ← (cond (= n 0) false (is-even (- n 1)))
3 ← (= n 0)
4 ← n
4 → 0
3 → true
2 → false
1 → false
false


The source code of the interpreter is still available in my repository on Github.
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Automatically create Liquibase migrations for PostgreSQL

Vkontakte sync with address book for iPhone. How it was done

What part of the archived web