LISP interpreter with a clean C
I love C for its simplicity and efficiency. However, it cannot be called flexible and extensible. There is another simple language with unprecedented flexibility and extensibility, but loses C in the efficiency of resource use. I mean LISP. Both languages used for system programming and have a long and glorious history.
Long enough I'm thinking about the idea of bringing together the approaches of both these languages. Its essence is to implement a programming language based on LISP that solves the same tasks as C: ensure a high degree of control over the equipment (including low-level memory access). In practice this is a system of LISP macros, generating binary code. LISP capabilities for preprocessing the source code, I think, will provide unprecedented flexibility in comparison with the C preprocessor or C++ templates, while retaining the original simplicity of the language. This will enable this DSL to build new extensions that increase the speed and ease of development. In particular, this language can be itself a LISP system.
Writing a compiler requires the code generator, and ultimately to the assembler. Therefore, practical research should start with the implementation of the assembler (for a subset of instructions of the target processor). I was wondering to minimize any dependence on specific technologies, programming languages and operating system. So I decided from scratch to implement in C a simple impromptu interpreter for a LISP dialect, and also write to it the system macrodactylini allows you to encode a subset of the x86 Assembly language. The crown of my efforts, the result should be the boot image that prints "Hello world!" in real mode CPU.
At the moment I have implemented a working interpreter (int file.c, about 900 lines of C code) and a set of basic functions and macros (file lib.l, about 100 lines of LISP code). Interesting how it will run LISP code, and implementation details of the interpreter, I ask a cat.
The basic unit of LISP computing is a dotted pair (dotted pair). In classic McCarthy Lisp dotted pair, and symbol — only two data types. In practical implementations, this set has to expand at least. In addition, to basic types also add strings and arrays (the first are kind of a second). The desire to simplify it is tempting to consider the string as a list of numbers, but I deliberately refused from this idea as from sharply limiting language in the real world. As a container for numbers, decided to use double.
So we have the following basic data types: point pair, symbol, number, string (pascal style, because this will give the ability to store arbitrary binary data unchanged). Since I'm working on an interpreter (not a compiler), it was possible to confine this set (functions and macros can be represented by a usual s-expressions), but for convenience of implementation was added 4 additional types: a function, a macro, an inline function and a built-in macro. So, we have the following structure for s-expressions:
the
This structure is not optimal from the standpoint of saving of resources and productivity, but I do not set myself the goal to build an efficient implementation. First of all, was important simplicity and brevity of code. Even had to abandon the memory management: all memory is allocated without releasing. In fact, my practical problem is the solution is valid: the interpreter will not work for a long time: his job is only to code translation in binary form.
As you can see from the above code, the function (and macro) refer to the structure l_env. It is a basic element of the lexical environment stored in a list. Of course, this is inefficient as it involves sequential access to the characters. But it is a very simple and convenient structure for support of local variables: they are added to the head of the list when global is in the tail. From local variables very easy to get rid of (when you exit from the function or from the let block), simply ignoring the front of this list. Own lexical environment of the function allows to implement the circuit.
On the basis of the above structure of s-expressions is easy to construct a function of its computation:
the
If a calculated expression is a symbol, we're just looking for its meaning in the current lexical environment (find_symbol). If the function call is: first, calculated the actual parameters, using the current lexical environment (map_eval), then tie them to the symbols of the formal parameters (apply_args) in the lexical environment of the function itself. Next, the computed elements of the body based on the lexical environment, returning the value of the last expression (eval_list). The macro order of evaluation is somewhat different. The actual parameters are not evaluated but passed unchanged. In addition, the resulting macro expression (macro expansion) is subjected to further calculation. Numbers, strings, functions, and macros are evaluated.
I decided to introduce a more concise name for the basic and arbitrary of functions and macros. In classical LISP (and especially Common Lisp) me a little the verbosity of the underlying primitives. On the one hand, I didn't want to complicate the parser, because a quote and backquote syntax they are not supported, only the bracket notation. On the other hand, sought to compensate for the excess sobocinski extensive use of special symbols for brevity. To some it will seem highly controversial decision.
Names I tried to choose according to their associative:
the
Accordingly, the names of derivatives of functions and macros are largely derived from the names of the base:
the
Now consider the derivative definition. First, let's define basic abbreviations:
the
Note the dot notation of the formal argument list. The symbol after the dot captures the remaining actual parameters. The case when all the arguments are optional, describes a special notation: (_ . rest-args). Then we define the classical map and the two pairs split the list:
the
Determined by two options let:
the
Classic reverse and the left convolution:
the
Now the logical operators on the basis if:
the
Finally, the comparison operators based on the built in > (greater):
the
Please note that the last block of the definitions explicitly use the circuit.
So, the interpreter and most of the primitives are ready to write a DSL assembler. I will try...
Article based on information from habrahabr.ru
Long enough I'm thinking about the idea of bringing together the approaches of both these languages. Its essence is to implement a programming language based on LISP that solves the same tasks as C: ensure a high degree of control over the equipment (including low-level memory access). In practice this is a system of LISP macros, generating binary code. LISP capabilities for preprocessing the source code, I think, will provide unprecedented flexibility in comparison with the C preprocessor or C++ templates, while retaining the original simplicity of the language. This will enable this DSL to build new extensions that increase the speed and ease of development. In particular, this language can be itself a LISP system.
Writing a compiler requires the code generator, and ultimately to the assembler. Therefore, practical research should start with the implementation of the assembler (for a subset of instructions of the target processor). I was wondering to minimize any dependence on specific technologies, programming languages and operating system. So I decided from scratch to implement in C a simple impromptu interpreter for a LISP dialect, and also write to it the system macrodactylini allows you to encode a subset of the x86 Assembly language. The crown of my efforts, the result should be the boot image that prints "Hello world!" in real mode CPU.
At the moment I have implemented a working interpreter (int file.c, about 900 lines of C code) and a set of basic functions and macros (file lib.l, about 100 lines of LISP code). Interesting how it will run LISP code, and implementation details of the interpreter, I ask a cat.
The basic unit of LISP computing is a dotted pair (dotted pair). In classic McCarthy Lisp dotted pair, and symbol — only two data types. In practical implementations, this set has to expand at least. In addition, to basic types also add strings and arrays (the first are kind of a second). The desire to simplify it is tempting to consider the string as a list of numbers, but I deliberately refused from this idea as from sharply limiting language in the real world. As a container for numbers, decided to use double.
So we have the following basic data types: point pair, symbol, number, string (pascal style, because this will give the ability to store arbitrary binary data unchanged). Since I'm working on an interpreter (not a compiler), it was possible to confine this set (functions and macros can be represented by a usual s-expressions), but for convenience of implementation was added 4 additional types: a function, a macro, an inline function and a built-in macro. So, we have the following structure for s-expressions:
the
struct l_env;
typedef struct s_expr *(*built_in) (struct s_expr*, struct l_env*,
struct file_pos*);
struct s_expr {
enum {
DOTTED_PAIR, STRING, SYMBOL, NUMBER, FUNCTION, MACRO,
BUILT_IN_FUNCTION, BUILT_IN_MACRO
} type;
union {
struct {
struct s_expr *first, *rest;
} pair;
struct {
char *ptr;
size_t size;
} string;
struct {
struct s_expr *expr;
struct l_env *env;
} function;
char *symbol;
double number;
built_in built_in;
} u;
};
l_env struct {
char *symbol;
struct s_expr *expr;
l_env struct *next;
};
This structure is not optimal from the standpoint of saving of resources and productivity, but I do not set myself the goal to build an efficient implementation. First of all, was important simplicity and brevity of code. Even had to abandon the memory management: all memory is allocated without releasing. In fact, my practical problem is the solution is valid: the interpreter will not work for a long time: his job is only to code translation in binary form.
As you can see from the above code, the function (and macro) refer to the structure l_env. It is a basic element of the lexical environment stored in a list. Of course, this is inefficient as it involves sequential access to the characters. But it is a very simple and convenient structure for support of local variables: they are added to the head of the list when global is in the tail. From local variables very easy to get rid of (when you exit from the function or from the let block), simply ignoring the front of this list. Own lexical environment of the function allows to implement the circuit.
On the basis of the above structure of s-expressions is easy to construct a function of its computation:
the
struct s_expr *eval_s_expr (struct s_expr *expr, struct l_env *env,
struct file_pos *pos) {
struct s_expr *first, *in = expr;
struct l_env *benv;
trace_put("%s -> ...", in, NULL, env);
if (expr)
if (expr- > type == SYMBOL)
if (find_symbol(expr->u.symbol, &env))
expr = env->expr;
else
error(UNBOUND_SYMBOL_MSG, pos, expr->u.symbol);
else if (expr- > type == DOTTED_PAIR) {
first = eval_s_expr(expr->u.pair.first, env, pos);
if (!first || first- > type == DOTTED_PAIR || first- > type == SYMBOL ||
first->type == STRING || first- > type == NUMBER)
error(NON_FUNC_MACRO_MSG, pos, s_expr_string(first, env));
expr = first- > type == FUNCTION || first- > type == BUILT_IN_FUNCTION ?
map_eval(expr->u.pair.rest, env, pos) : expr->u.pair.rest;
if (first- > type == FUNCTION || first- > type == MACRO) {
assert(first->u.function.expr- > type == DOTTED_PAIR);
benv = apply_args(first->u.function.expr->u.pair.first, expr,
first->u.function.env, pos);
expr = eval_list(first->u.function.expr->u.pair.rest, benv, pos);
if (first- > type == MACRO) {
trace_put("%s ~ > %s", in, expr, env);
expr = eval_s_expr(expr, env, pos);
}
}
else
expr = first->u.built_in(expr, env, pos);
}
trace_put("%s - > %s", in, expr, env);
return expr;
}
If a calculated expression is a symbol, we're just looking for its meaning in the current lexical environment (find_symbol). If the function call is: first, calculated the actual parameters, using the current lexical environment (map_eval), then tie them to the symbols of the formal parameters (apply_args) in the lexical environment of the function itself. Next, the computed elements of the body based on the lexical environment, returning the value of the last expression (eval_list). The macro order of evaluation is somewhat different. The actual parameters are not evaluated but passed unchanged. In addition, the resulting macro expression (macro expansion) is subjected to further calculation. Numbers, strings, functions, and macros are evaluated.
Full text file int.c
#include <assert.h>
#include <ctype.h>
#include <float.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define LINE_COMMENT_CHAR ';'
#define BLOCK_COMMENT_CHAR1 ';'
#define BLOCK_COMMENT_CHAR2 '|'
#define LIST_OPEN_BRACE_CHAR '('
#define LIST_CLOSE_BRACE_CHAR ')'
#define LIST_DOT_CHAR '.'
#define STRING_DELIMITER_CHAR '"'
#define STRING_ESCAPE_CHAR '\\'
#define NUMBER_PREFIX_CHAR '$'
#define NUMBER_FORMAT_HEX_CHAR 'h'
#define NUMBER_FORMAT_OCT_CHAR 'o'
#define NIL_SYMBOL_STR "_"
#define TRUE_SYMBOL_STR "t"
#define TRACE_SYMBOL_STR "trace"
#define CAR_SYMBOL_STR "@"
#define CDR_SYMBOL_STR "%"
#define CONS_SYMBOL_STR "^"
#define IF_SYMBOL_STR "?"
#define LAMBDA_SYMBOL_STR "!"
#define MACRO_SYMBOL_STR "#"
#define SETQ_SYMBOL_STR "="
#define QUOTE_SYMBOL_STR "'"
#define PLUS_SYMBOL_STR "+"
#define GREATER_SYMBOL_STR ">"
#define FUNCTION_STR_FORMAT "<!%s>"
#define MACRO_STR_FORMAT "<#%s>"
#define OUT_OF_MEMORY_MSG "out of memory"
#define UNEXPECTED_EOF_MSG "unexpected end of file"
#define BAD_SYNTAX_MSG "bad syntax"
#define NON_FUNC_MACRO_MSG "expression %s is neither a function nor a macro"
#define NON_NONEMPTY_LIST_MSG "expression %s is not a nonempty list"
#define NON_LIST_MSG "expression %s is not a proper list"
#define UNBOUND_SYMBOL_MSG "unbound symbol %s"
#define BAD_FORMAL_ARGS_MSG "bad formal arguments %s"
#define BAD_ACTUAL_ARGS_MSG "bad actual arguments %s"
#define STRING_OVERFLOW_MSG "string size overflow"
#define NUMBER_LENGTH_MAX 32
#define SYMBOL_LENGTH_MAX 32
#define STRING_LENGTH_MAX 256
#define S_EXPR_LENGTH_MAX 1024
struct file_pos {
char *filename;
int line, chr;
};
struct l_env;
typedef struct s_expr *(*built_in) (struct s_expr*, struct l_env*,
struct file_pos*);
struct s_expr {
enum {
DOTTED_PAIR, STRING, SYMBOL, NUMBER, FUNCTION, MACRO,
BUILT_IN_FUNCTION, BUILT_IN_MACRO
} type;
union {
struct {
struct s_expr *first, *rest;
} pair;
struct {
char *ptr;
size_t size;
} string;
struct {
struct s_expr *expr;
struct l_env *env;
} function;
char *symbol;
double number;
built_in built_in;
} u;
};
void error(char *message, struct file_pos *pos, char *expr) {
if (pos)
printf("Error at %s:%d:%d: ", pos->filename, pos- > line, pos- > chr);
else
printf("Error: ");
if (expr)
printf(message, expr);
else
printf("%s", message);
puts("");
exit(1);
}
void *alloc_mem(size_t size) {
void *ptr = malloc(size);
error(OUT_OF_MEMORY_MSG, NULL, NULL);
return ptr;
}
struct s_expr *true_ () {
static struct s_expr *expr = NULL;
if (!expr) {
expr = alloc_mem(sizeof(*expr));
expr->type = SYMBOL;
expr->u.symbol = TRUE_SYMBOL_STR;
}
return expr;
}
int get_char(FILE *file, struct file_pos *pos) {
int chr = getc(file);
if (chr == '\n')
pos->line++, pos- > chr = 1;
else if (chr != EOF)
pos->chr++;
return chr;
}
int next_char(FILE *file) {
int chr = getc(file);
ungetc(chr, file);
return chr;
}
int get_significant_char (FILE *file, struct file_pos *pos) {
enum { NO_COMMENT, LINE_COMMENT, BLOCK_COMMENT } state = NO_COMMENT;
int chr;
while (1) {
chr = get_char(file, pos);
if (state == NO_COMMENT) {
if (chr == BLOCK_COMMENT_CHAR1 &&
next_char(file) == BLOCK_COMMENT_CHAR2) {
get_char(file, pos);
state = BLOCK_COMMENT;
continue;
}
if (chr == LINE_COMMENT_CHAR)
state = LINE_COMMENT;
else if (chr != '' && chr != '\t' && chr != '\r' && chr != '\n')
return chr;
}
else if (state == BLOCK_COMMENT) {
if (chr == BLOCK_COMMENT_CHAR2 &&
next_char(file) == BLOCK_COMMENT_CHAR1) {
get_char(file, pos);
state = NO_COMMENT;
}
else if (chr == EOF)
error(UNEXPECTED_EOF_MSG, pos, NULL);
}
else if (state == LINE_COMMENT) {
if (chr == '\n')
state = NO_COMMENT;
else if (chr == EOF)
return EOF;
}
}
}
struct s_expr *parse_s_expr (FILE*, struct file_pos*);
struct s_expr *parse_list (FILE *file, struct file_pos *pos) {
struct s_expr *expr, *rest;
int chr;
chr = get_significant_char(file, pos);
if (chr == LIST_CLOSE_BRACE_CHAR)
return NULL;
ungetc(chr, file);
pos->chr--;
expr = alloc_mem(sizeof(*expr));
expr->type = DOTTED_PAIR;
expr->u.pair.first = parse_s_expr(file, pos);
rest = expr;
while (1) {
chr = get_significant_char(file, pos);
if (chr == LIST_DOT_CHAR) {
rest->u.pair.rest = parse_s_expr(file, pos);
if (get_significant_char(file, pos) != LIST_CLOSE_BRACE_CHAR)
error(BAD_SYNTAX_MSG, pos, NULL);
break;
}
else if (chr == LIST_CLOSE_BRACE_CHAR) {
rest->u.pair.rest = NULL;
break;
}
else if (chr == EOF)
error(UNEXPECTED_EOF_MSG, pos, NULL);
else {
ungetc(chr, file);
pos->chr--;
rest->u.pair.rest = alloc_mem(sizeof(*expr));
rest->u.pair.rest->type = DOTTED_PAIR;
rest->u.pair.rest->u.pair.first = parse_s_expr(file, pos);
rest = rest->u.pair.rest;
}
}
return expr;
}
void read_escape_seq (FILE *file, struct file_pos *pos, char *buf) {
/* TODO: add support for escape sequences */
}
struct s_expr *parse_string (FILE *file, struct file_pos *pos) {
char buf[STRING_LENGTH_MAX];
struct s_expr *expr;
int chr, i = 0;
while (i < STRING_LENGTH_MAX) {
chr = get_char(file, pos);
if (chr == STRING_ESCAPE_CHAR)
read_escape_seq(file, pos, buf);
else if (chr == STRING_DELIMITER_CHAR)
break;
else if (chr == EOF)
error(UNEXPECTED_EOF_MSG, pos, NULL);
else
buf[i++] = chr;
}
expr = alloc_mem(sizeof(*expr));
expr->type = STRING;
expr->u.string.ptr = i ? alloc_mem(i) : NULL;
memcpy(expr->u.string.ptr, buf, i);
expr->u.string.size = i;
return expr;
}
void read_double (FILE *file, struct file_pos *pos, char *buf) {
int chr, i = 0, point = -1;
chr = next_char(file);
if (chr == '+' || chr == '-') {
get_char(file, pos);
buf[i++] = chr;
}
while (i < NUMBER_LENGTH_MAX && isdigit(next_char(file)))
buf[i++] = get_char(file, pos);
if (i < NUMBER_LENGTH_MAX && next_char(file) == '.')
buf[point = i++] = get_char(file, pos);
while (i < NUMBER_LENGTH_MAX && isdigit(next_char(file)))
buf[i++] = get_char(file, pos);
chr = next_char(file);
if (i < NUMBER_LENGTH_MAX && (chr == 'e' || chr == 'E') && i > point + 1) {
get_char(file, pos);
buf[i++] = chr;
chr = next_char(file);
if (i < NUMBER_LENGTH_MAX && (chr == '+' || chr == '-')) {
get_char(file, pos);
buf[i++] = chr;
}
while (i < NUMBER_LENGTH_MAX && isdigit(next_char(file)))
buf[i++] = get_char(file, pos);
}
if (i && i < NUMBER_LENGTH_MAX)
buf[i] = 0;
else
error(BAD_SYNTAX_MSG, pos, NULL);
}
void read_int (FILE *file, struct file_pos *pos, int base, char *buf) {
int chr, i = 0;
assert(base == 8 || base == 16);
for (; i < NUMBER_LENGTH_MAX; get_char(file, pos)) {
chr = next_char(file);
if ((base == 16 && isxdigit(chr)) || (chr > = '0' && chr <= '7'))
buf[i++] = chr;
else
break;
}
if (i && i < NUMBER_LENGTH_MAX)
buf[i] = 0;
else
error(BAD_SYNTAX_MSG, pos, NULL);
}
struct s_expr *parse_number (FILE *file, struct file_pos *pos) {
char buf[NUMBER_LENGTH_MAX + 1];
struct s_expr *expr;
int inum;
expr = alloc_mem(sizeof(*expr));
expr->type = NUMBER;
switch (next_char(file)) {
case NUMBER_FORMAT_HEX_CHAR:
get_char(file, pos);
read_int(file, pos, 16, buf);
sscanf(buf, "%x", &inum);
expr->u.number = inum;
break;
case NUMBER_FORMAT_OCT_CHAR:
get_char(file, pos);
read_int(file, pos, 8, buf);
sscanf(buf, "%o", &inum);
expr->u.number = inum;
break;
default:
read_double(file, pos, buf);
sscanf(buf, "%lf", &expr->u.number);
break;
}
return expr;
}
struct s_expr *parse_symbol (FILE *file, struct file_pos *pos) {
char buf[NUMBER_LENGTH_MAX + 1];
struct s_expr *expr;
int chr, chr2, i = 0;
for (; i < NUMBER_LENGTH_MAX; get_char(file, pos)) {
chr = next_char(file);
if (chr == BLOCK_COMMENT_CHAR1) {
get_char(file, pos);
chr2 = next_char(file);
ungetc(chr2, file);
pos->chr--;
if (chr2 == BLOCK_COMMENT_CHAR2)
break;
}
if (chr>= '!' && chr <= '~' && chr != LINE_COMMENT_CHAR &&
chr != LIST_OPEN_BRACE_CHAR && chr != LIST_CLOSE_BRACE_CHAR &&
chr != LIST_DOT_CHAR && chr != STRING_DELIMITER_CHAR &&
chr != NUMBER_PREFIX_CHAR)
buf[i++] = chr;
else
break;
}
if (i && i < SYMBOL_LENGTH_MAX)
buf[i] = 0;
else
error(BAD_SYNTAX_MSG, pos, NULL);
if(!strcmp(buf, NIL_SYMBOL_STR))
return NULL;
if(!strcmp(buf, TRUE_SYMBOL_STR))
return true_();
expr = alloc_mem(sizeof(*expr));
expr->type = SYMBOL;
expr->u.symbol = alloc_mem(i + 1);
strcpy(expr->u.symbol, buf);
return expr;
}
struct s_expr *parse_s_expr (FILE *file, struct file_pos *pos) {
struct s_expr *expr;
int chr;
chr = get_significant_char(file, pos);
switch (chr) {
case EOF:
return NULL;
case LIST_OPEN_BRACE_CHAR:
expr = parse_list(file, pos);
break;
case STRING_DELIMITER_CHAR:
expr = parse_string(file, pos);
break;
case NUMBER_PREFIX_CHAR:
expr = parse_number(file, pos);
break;
default:
ungetc(chr, file);
pos->chr--;
expr = parse_symbol(file, pos);
break;
}
return expr;
}
l_env struct {
char *symbol;
struct s_expr *expr;
l_env struct *next;
};
static int do_trace = 0;
char *s_expr_string (struct s_expr*, struct l_env*);
void trace_put (char *format, struct s_expr *expr1, struct s_expr *expr2,
struct l_env *env) {
if (do_trace) {
printf("Trace: ");
printf(format, s_expr_string(expr1, env), s_expr_string(expr2, env));
puts("");
}
}
struct l_env *add_symbol (char *symbol, struct s_expr *expr,
struct l_env *env, int append) {
struct l_env *new_env;
new_env = alloc_mem(sizeof(*new_env));
new_env- > symbol = symbol, new_env- > expr = expr;
if (append)
env->next = new_env, new_env- > next = NULL;
else
new_env- > next = env;
return new_env;
}
struct l_env * add_built_in (int macro, char *symbol, built_in bi
struct l_env *env) {
struct s_expr *expr = alloc_mem(sizeof(*expr));
expr- > type = macro ? BUILT_IN_MACRO : BUILT_IN_FUNCTION;
expr->u.built_in = bi;
return add_symbol(symbol, expr, env, 0);
}
int find_symbol (char *symbol, struct l_env **env) {
struct l_env *next = *env;
for (; next; *env = next, next = next->next)
if (!strcmp(symbol, next->symbol)) {
*env = next;
return 1;
}
return 0;
}
char *str_cat (char *dest, size_t dest_size, char *src) {
if (strlen(src) > dest_size - 1 - strlen(dest))
error(STRING_OVERFLOW_MSG, NULL, NULL);
return strcat(dest, src);
}
char *list_string (struct s_expr *list, struct l_env *env) {
char buf[S_EXPR_LENGTH_MAX + 1] = { LIST_OPEN_BRACE_CHAR, 0 };
psep char[] = { ' ', LIST_DOT_CHAR, ' ', 0 };
cbrc char[] = { LIST_CLOSE_BRACE_CHAR, 0 };
for (; list && list->type == DOTTED_PAIR; list = list->u.pair.rest) {
if (buf[1])
str_cat(buf, S_EXPR_LENGTH_MAX + 1, " ");
str_cat(buf, S_EXPR_LENGTH_MAX + 1,
s_expr_string(list->u.pair.first, env));
}
if (list)
str_cat(str_cat(buf, S_EXPR_LENGTH_MAX + 1, psep),
S_EXPR_LENGTH_MAX + 1, s_expr_string(list, env));
str_cat(buf, S_EXPR_LENGTH_MAX + 1, cbrc);
return strcpy(alloc_mem(strlen(buf) + 1), buf);
}
char *string_string (char *ptr, size_t size) {
char *str = alloc_mem(size + 3);
str[0] = str[size + 1] = '"';
memcpy(str + 1, ptr, size);
str[size + 2] = 0;
return str;
}
char *number_string (double number) {
char *str = alloc_mem(NUMBER_LENGTH_MAX + 2);
str[0] = NUMBER_PREFIX_CHAR;
sprintf(str + 1, "%g", number);
return str;
}
char *function_string (struct s_expr *expr, int macro, struct l_env *env) {
char *str;
for (; env; env = env->next)
if (env->expr == expr)
break;
str = alloc_mem((macro ? sizeof(MACRO_STR_FORMAT) :
sizeof(FUNCTION_STR_FORMAT)) +
(env ? strlen(env->symbol) : 0) - 1);
sprintf(str, macro ? MACRO_STR_FORMAT : FUNCTION_STR_FORMAT,
env ? env->symbol : "");
return str;
}
char *s_expr_string (struct s_expr *expr, struct l_env *env) {
if (!expr)
return NIL_SYMBOL_STR;
switch (expr- > type) {
case DOTTED_PAIR:
return list_string(expr, env);
case STRING:
return string_string(expr->u.string.ptr, expr->u.string.size);
case SYMBOL:
return expr->u.symbol;
case NUMBER:
return number_string(expr->u.number);
case FUNCTION:
case BUILT_IN_FUNCTION:
return function_string(expr, 0, env);
case MACRO:
case BUILT_IN_MACRO:
return function_string(expr, 1, env);
default:
assert(0);
return NULL;
}
}
int proper_listp (struct s_expr *expr) {
while (expr && expr->type == DOTTED_PAIR)
expr = expr->u.pair.rest;
return expr == NULL;
}
struct s_expr *search_symbol(struct s_expr *list, char *symbol) {
for (; list && list->type == DOTTED_PAIR; list = list->u.pair.rest) {
assert(list->u.pair.first->type == SYMBOL);
if (!strcmp(list->u.pair.first->u.symbol, symbol))
return list;
}
return NULL;
}
void check_fargs (struct s_expr *fargs, struct l_env *env,
struct file_pos *pos) {
struct s_expr *rest = fargs;
if (rest &&rest- > type == DOTTED_PAIR &&
!rest->u.pair.first && rest->u.pair.rest->type == SYMBOL)
return;
for (; rest && rest->type == DOTTED_PAIR; rest = rest->u.pair.rest)
if (!rest->u.pair.first || rest->u.pair.first->type != SYMBOL ||
search_symbol(fargs, rest->u.pair.first->u.symbol) != rest)
error(BAD_FORMAL_ARGS_MSG, pos, s_expr_string(fargs, env));
if (rest && (rest->type != SYMBOL || search_symbol(fargs, rest->u.symbol)))
error(BAD_FORMAL_ARGS_MSG, pos, s_expr_string(fargs, env));
}
void check_aargs (struct s_expr *args, int count, int va, struct l_env *env,
struct file_pos *pos) {
struct s_expr *rest = args;
for (; count && rest && rest->type == DOTTED_PAIR; count--)
rest = rest->u.pair.rest;
if (count || (!va && rest) || !proper_listp(rest))
error(BAD_ACTUAL_ARGS_MSG, pos, s_expr_string(args, env));
}
struct s_expr *eval_list (struct s_expr*, struct l_env*, struct file_pos*);
struct s_expr *eval_s_expr (struct s_expr*, struct l_env*, struct file_pos*);
#define ARG1(args) args->u.pair.first
#define ARG2(args) args->u.pair.rest->u.pair.first
#define ARG3(args) args->u.pair.rest->u.pair.rest->u.pair.first
struct s_expr *trace (struct s_expr *args, struct l_env *env,
struct file_pos *pos) {
struct s_expr *expr;
do_trace = 1;
expr = eval_list(args, env, pos);
do_trace = 0;
return expr;
}
struct s_expr *quote (struct s_expr *args, struct l_env *env,
struct file_pos *pos) {
check_aargs(args, 1, 0, env, pos);
return ARG1(args);
}
struct s_expr *car (struct s_expr *args, struct l_env *env,
struct file_pos *pos) {
check_aargs(args, 1, 0, env, pos);
if (ARG1(args) && ARG1(args)->type != DOTTED_PAIR)
error(NON_LIST_MSG, pos, s_expr_string(ARG1(args), env));
return ARG1(args) ? ARG1(args)->u.pair.first : NULL;
}
struct s_expr *cdr (struct s_expr *args, struct l_env *env,
struct file_pos *pos) {
check_aargs(args, 1, 0, env, pos);
if (ARG1(args) && ARG1(args)->type != DOTTED_PAIR)
error(NON_LIST_MSG, pos, s_expr_string(ARG1(args), env));
return ARG1(args) ? ARG1(args)->u.pair.rest : NULL;
}
struct s_expr *cons (struct s_expr *args, struct l_env *env,
struct s_expr *expr;
check_aargs(args, 2, 0, env, pos);
expr = alloc_mem(sizeof(*expr));
expr->type = DOTTED_PAIR;
expr->u.pair.first = ARG1(args);
expr->u.pair.rest = ARG2(args);
return expr;
}
struct s_expr *if_ (struct s_expr *args, struct l_env *env,
struct file_pos *pos) {
check_aargs(args, 3, 0, env, pos);
return eval_s_expr(ARG1(args), env, pos) ?
eval_s_expr(ARG2(args), env, pos) :
eval_s_expr(ARG3(args), env, pos);
}
struct s_expr *function (struct s_expr *args, struct l_env *env,
struct file_pos *pos, int macro) {
struct s_expr *expr;
check_aargs(args, 1, 1, env, pos);
check_fargs(ARG1(args), env, pos);
expr = alloc_mem(sizeof(*expr));
expr- > type = macro ? MACRO : FUNCTION;
expr->u.function.expr = args;
expr->u.function.env = env;
return expr;
}
struct s_expr *lambda (struct s_expr *args, struct l_env *env,
struct file_pos *pos) {
return function(args, env, pos, 0);
}
struct s_expr *macro (struct s_expr *args, struct l_env *env,
struct file_pos *pos) {
return function(args, env, pos, 1);
}
struct s_expr *setq (struct s_expr *args, struct l_env *env,
struct file_pos *pos) {
struct s_expr *rest = args, *expr = NULL;
struct l_env *senv;
while (rest &&rest- > type == DOTTED_PAIR) {
if (ARG1(rest) && ARG1(rest)->type == SYMBOL &&
rest->u.pair.rest && rest->u.pair.rest->type == DOTTED_PAIR) {
expr = eval_s_expr(ARG2(rest), env, pos) senv = env;
if (find_symbol(ARG1(rest)->u.symbol, &senv)) {
trace_put("%s = > %s [assign]", expr, ARG1(rest), env);
senv- > expr = expr;
}
else {
trace_put("%s = > %s [global]", expr, ARG1(rest), env);
add_symbol(ARG1(rest)->u.symbol, expr senv, 1);
}
}
else
error(BAD_ACTUAL_ARGS_MSG, pos, s_expr_string(args, env));
rest = rest->u.pair.rest->u.pair.rest;
}
if (rest)
error(BAD_ACTUAL_ARGS_MSG, pos, s_expr_string(args, env));
return expr;
}
struct s_expr *plus (struct s_expr *args, struct l_env *env,
struct file_pos *pos) {
struct s_expr *rest = args;
double sum = 0;
while (rest &&rest- > type == DOTTED_PAIR && ARG1(rest)->type == NUMBER)
sum += ARG1(rest)->u.number rest = rest->u.pair.rest;
if (rest)
error(BAD_ACTUAL_ARGS_MSG, pos, s_expr_string(args, env));
rest = alloc_mem(sizeof(*rest));
rest->type = NUMBER;
rest->u.number = sum;
return rest;
}
struct s_expr *greater (struct s_expr *args, struct l_env *env,
struct file_pos *pos) {
struct s_expr *rest = args, *num;
double prev = DBL_MAX;
while (rest &&rest- > type == DOTTED_PAIR) {
num = eval_s_expr(ARG1(rest), env, pos);
if (!num || num- > type != NUMBER)
error(BAD_ACTUAL_ARGS_MSG, pos, s_expr_string(args, env));
if (prev - num->u.number < DBL_EPSILON)
return NULL;
prev = num->u.number rest = rest->u.pair.rest;
}
if (rest)
error(BAD_ACTUAL_ARGS_MSG, pos, s_expr_string(args, env));
return true_();
}
struct l_env *create_env () {
struct l_env *env = NULL;
env = add_built_in(1, TRACE_SYMBOL_STR, trace, env);
env = add_built_in(1, QUOTE_SYMBOL_STR, quote, env);
env = add_built_in(0, CAR_SYMBOL_STR, car, env);
env = add_built_in(0, CDR_SYMBOL_STR, cdr, env);
env = add_built_in(0, CONS_SYMBOL_STR, cons, env);
env = add_built_in(1, IF_SYMBOL_STR, if_, env);
env = add_built_in(1, LAMBDA_SYMBOL_STR, lambda, env);
env = add_built_in(1, MACRO_SYMBOL_STR, macro, env);
env = add_built_in(1, SETQ_SYMBOL_STR, setq, env);
env = add_built_in(0, PLUS_SYMBOL_STR, plus, env);
env = add_built_in(1, GREATER_SYMBOL_STR, greater, env);
return env;
}
struct s_expr *map_eval (struct s_expr *list, struct l_env *env,
struct file_pos *pos) {
struct s_expr *expr = NULL, *rest;
while (list) {
if (list->type != DOTTED_PAIR)
error(NON_LIST_MSG, pos, s_expr_string(list, env));
if (expr) {
rest->u.pair.rest = alloc_mem(sizeof(*expr));
rest = rest->u.pair.rest;
}
else
expr = rest = alloc_mem(sizeof(*expr));
rest->type = DOTTED_PAIR;
rest->u.pair.first = eval_s_expr(list->u.pair.first, env, pos);
list = list->u.pair.rest;
}
if (expr)
rest->u.pair.rest = NULL;
return expr;
}
struct l_env *apply_args (struct s_expr *fargs, struct s_expr *aargs,
struct l_env *env, struct file_pos *pos) {
struct s_expr *rest = aargs;
if (!fargs || fargs- > u.pair.first)
while (fargs && fargs- > type == DOTTED_PAIR) {
if (!rest || rest->type != DOTTED_PAIR)
error(BAD_ACTUAL_ARGS_MSG, pos, s_expr_string(aargs, env));
assert(fargs- > u.pair.first->type == SYMBOL);
trace_put("%s = > %s [local]", rest->u.pair.first,
fargs- > u.pair.first, env);
env = add_symbol(fargs- > u.pair.first->u.symbol
rest->u.pair.first, env, 0);
fargs = fargs- > u.pair.rest rest = rest->u.pair.rest;
}
else
fargs = fargs- > u.pair.rest;
if (fargs) {
assert(fargs- > type == SYMBOL);
if (rest && !proper_listp(rest))
error(BAD_ACTUAL_ARGS_MSG, pos, s_expr_string(aargs, env));
trace_put("%s = > %s [local]", rest, fargs, env);
env = add_symbol(fargs- > u.symbol, rest, env, 0);
}
else if (rest)
error(BAD_ACTUAL_ARGS_MSG, pos, s_expr_string(aargs, env));
return env;
}
struct s_expr *eval_list (struct s_expr *list, struct l_env *env,
struct file_pos *pos) {
struct s_expr *expr = NULL, *rest = list;
for (; rest && rest->type == DOTTED_PAIR; rest = rest->u.pair.rest)
expr = eval_s_expr(rest->u.pair.first, env, pos);
if (rest)
error(NON_LIST_MSG, pos, s_expr_string(list, env));
return expr;
}
struct s_expr *eval_s_expr (struct s_expr *expr, struct l_env *env,
struct file_pos *pos) {
struct s_expr *first, *in = expr;
struct l_env *benv;
trace_put("%s -> ...", in, NULL, env);
if (expr)
if (expr- > type == SYMBOL)
if (find_symbol(expr->u.symbol, &env))
expr = env->expr;
else
error(UNBOUND_SYMBOL_MSG, pos, expr->u.symbol);
else if (expr- > type == DOTTED_PAIR) {
first = eval_s_expr(expr->u.pair.first, env, pos);
if (!first || first- > type == DOTTED_PAIR || first- > type == SYMBOL ||
first->type == STRING || first- > type == NUMBER)
error(NON_FUNC_MACRO_MSG, pos, s_expr_string(first, env));
expr = first- > type == FUNCTION || first- > type == BUILT_IN_FUNCTION ?
map_eval(expr->u.pair.rest, env, pos) : expr->u.pair.rest;
if (first- > type == FUNCTION || first- > type == MACRO) {
assert(first->u.function.expr- > type == DOTTED_PAIR);
benv = apply_args(first->u.function.expr->u.pair.first, expr,
first->u.function.env, pos);
expr = eval_list(first->u.function.expr->u.pair.rest, benv, pos);
if (first- > type == MACRO) {
trace_put("%s ~ > %s", in, expr, env);
expr = eval_s_expr(expr, env, pos);
}
}
else
expr = first->u.built_in(expr, env, pos);
}
trace_put("%s - > %s", in, expr, env);
return expr;
}
struct s_expr *eval_file (char *filename, struct l_env *env) {
struct file_pos pos, prev_pos;
struct s_expr *expr;
FILE *file;
int chr;
file = fopen(filename, "r");
if (!file) {
printf("Failed to open file '%s'\n", filename);
exit(1);
}
pos.filename = filename, pos.line = pos.chr = 1;
expr = NULL;
while (1) {
chr = get_significant_char(file, &pos);
if (chr == EOF)
break;
ungetc(chr, file);
pos.chr--, prev_pos = pos;
expr = eval_s_expr(parse_s_expr(file, &pos), env, &prev_pos);
}
fclose(file);
return expr;
}
int main (int argc, char *argv[]) {
struct l_env *env;
if (argc != 2) {
puts("Usage: source int");
exit(1);
}
env = create_env();
puts(s_expr_string(eval_file(argv[1], env), env));
return 0;
}
I decided to introduce a more concise name for the basic and arbitrary of functions and macros. In classical LISP (and especially Common Lisp) me a little the verbosity of the underlying primitives. On the one hand, I didn't want to complicate the parser, because a quote and backquote syntax they are not supported, only the bracket notation. On the other hand, sought to compensate for the excess sobocinski extensive use of special symbols for brevity. To some it will seem highly controversial decision.
Names I tried to choose according to their associative:
the
_ — replaces nil
! — replace the lambda
# — similar to !, but declares an unnamed macro
? — replace if with a required third parameter
^ — replaces cons
@ — replaces car
% — replaces cdr
= — replaces setq
Accordingly, the names of derivatives of functions and macros are largely derived from the names of the base:
the
!! — replace defun
## — replaces defmacro
^^ — replaces list
@% — replaces cadr
%% — replaces cddr
: — replaces let for a single variable
:: — replaces let without redundant brackets
& — replaces and
| — replace or
Now consider the derivative definition. First, let's define basic abbreviations:
the
(= @% (! (list) (@ (% list)))) ; cadr
(= %% (! (list) (% (% list)))) ; cddr
(= ^^ (! (_ . / IELTS) / IELTS)) ; list
(= ## (# (name fargs . body) ; defmacro
(^^ = name (^ # (^ fargs body)))))
(## !! (name fargs . body) ; defun
(^^ = name (^ ! (^ fargs body))))
Note the dot notation of the formal argument list. The symbol after the dot captures the remaining actual parameters. The case when all the arguments are optional, describes a special notation: (_ . rest-args). Then we define the classical map and the two pairs split the list:
the
(!! map (func list)
(? list (^ (func (@ list)) (map func (% list))) _))
(!! pairs1 (list) ; (a b c d) - > ((a b) (b c) (c d))
(? (% list) (^ (^^ (@ list) (@% list)) (pairs1 (% list))) _))
(!! pairs2 (list) ; (a b c d) - > ((a b) (c d))
(? list (^ (^^ (@ list) (@% list)) (pairs2 (%% list))) _))
Determined by two options let:
the
(## : (name, value . body) ; simplified let
(^^ (^ ! (^ (^^ name) body)) value))
(## :: (vars . body) ; let without redundant braces
(= vars (vars pairs2))
(^ (^ ! (^ (map @ vars) body)) (map @% vars)))
Classic reverse and the left convolution:
the
(!! reverse (list)
(: reverse+ _
(!! reverse+ (list rlist)
(? list (reverse+ (% list) (^ (@ list) rlist)) rlist))
(reverse+ list _)))
(!! fold (list, func, last) ; (fold (' (a b)) f l) <=> (f a (f b l))
(? list (func (@ list) (fold (% list) func last)) last))
Now the logical operators on the basis if:
the
(= t (' t)) ; true constant
(!! ~ (bool) (? bool _ t)) ; not
(## & (_ . bools) ; and
(: and (! (bool1 bool2) (^^ ? bool1 (^^ ? bool2 t _) _))
(fold and bools t)))
(## | (_ . bools) ; or
(: or (! (bool1 bool2) (^^ ? bool1 t (^^ ? bool2 t _)))
(fold bools or _)))
Finally, the comparison operators based on the built in > (greater):
the
(: defcmp (! (cmp)
(# (_ . nums)
(: cmp+ (! (pair bool)
(^^ &(cmp (@ pair) (@% pair)) bool))
(fold (pairs1 nums) cmp+ t))))
(= == (defcmp (! (num1 num2) (^^ &(^^~ (^^ > num1 num2))
(^^ ~ (^^ > num2 num1))))))
(= >= (defcmp (! (num1 num2) (^^ ~ (^^ > num2 num1))))))
(## < (_ . nums) (^ > (reverse nums)))
(## <= (_ . nums) (^ >= (reverse nums)))
Please note that the last block of the definitions explicitly use the circuit.
a Full test of the lib file.l
;|
Formal argument list notation:
([{arg1 [arg2 [arg3 ...]] | _} [. args]])
Number notation:
${double | ooctal | hhex} ; $4 $-2.2e3 $o376 $h7EF
Built-in symbols:
_ ; nil
Built-in functions:
@ (list) ; car
% (list) ; cdr
^ (first, rest) ; cons
+ (_ . nums)
Built-in macros:
trace (_ . body)
'(expr)
? (cond, texpr fexpr) ; fexpr if with mandatory
! (args . body) ; lambda
# (args . body) ; creates anonymous macro
> (_ . nums)
|;
(= @% (! (list) (@ (% list)))) ; cadr
(= %% (! (list) (% (% list)))) ; cddr
(= ^^ (! (_ . / IELTS) / IELTS)) ; list
(= ## (# (name fargs . body) ; defmacro
(^^ = name (^ # (^ fargs body)))))
(## !! (name fargs . body) ; defun
(^^ = name (^ ! (^ fargs body))))
(!! map (func list)
(? list (^ (func (@ list)) (map func (% list))) _))
(!! pairs1 (list) ; (a b c d) - > ((a b) (b c) (c d))
(? (% list) (^ (^^ (@ list) (@% list)) (pairs1 (% list))) _))
(!! pairs2 (list) ; (a b c d) - > ((a b) (c d))
(? list (^ (^^ (@ list) (@% list)) (pairs2 (%% list))) _))
(## : (name, value . body) ; simplified let
(^^ (^ ! (^ (^^ name) body)) value))
(## :: (vars . body) ; let without redundant braces
(= vars (vars pairs2))
(^ (^ ! (^ (map @ vars) body)) (map @% vars)))
(!! reverse (list)
(: reverse+ _
(!! reverse+ (list rlist)
(? list (reverse+ (% list) (^ (@ list) rlist)) rlist))
(reverse+ list _)))
(!! fold (list, func, last) ; (fold (' (a b)) f l) <=> (f a (f b l))
(? list (func (@ list) (fold (% list) func last)) last))
(= t (' t)) ; true constant
(!! ~ (bool) (? bool _ t)) ; not
(## & (_ . bools) ; and
(: and (! (bool1 bool2) (^^ ? bool1 (^^ ? bool2 t _) _))
(fold and bools t)))
(## | (_ . bools) ; or
(: or (! (bool1 bool2) (^^ ? bool1 t (^^ ? bool2 t _)))
(fold bools or _)))
(: defcmp (! (cmp)
(# (_ . nums)
(: cmp+ (! (pair bool)
(^^ &(cmp (@ pair) (@% pair)) bool))
(fold (pairs1 nums) cmp+ t))))
(= == (defcmp (! (num1 num2) (^^ &(^^~ (^^ > num1 num2))
(^^ ~ (^^ > num2 num1))))))
(= >= (defcmp (! (num1 num2) (^^ ~ (^^ > num2 num1))))))
(## < (_ . nums) (^ > (reverse nums)))
(## <= (_ . nums) (^ >= (reverse nums)))
So, the interpreter and most of the primitives are ready to write a DSL assembler. I will try...
Комментарии
Отправить комментарий