概述
语义分析对上一步生成的 AST 进行进一步的检查,同时对每个 AST
结点生成相应的语义信息,以便后续的代码生成。通常来说语义分析的过程可以拆分为多遍的
AST 遍历。
虽然 CFG 能够表示许多语言,但编程语言中的某些结构仍然无法使用 CFG
表示,例如下面这个语言:
\[\{wcw \mid w\in L((a \mid
b)*)\}\]
一个类似的例子是在许多编程语言中,变量需要先定义后使用。这些语法结构将由语义分析器检查,并筛出不正确的代码。
语义分析是编译器前端部分的最后一步,需要负责语言层面的剩余的所有检查,通常有作用域检查和类型检查。经过语义分析后的代码会被认为是一个合法的程序,由后端生成汇编代码。
作用域
一个标识符的作用域(scope) 是指程序的一个部分,在该部分内标识符是可访问的(accessible)。
绝大多数编程语言中作用域是静态的(static),意思是标识符的作用域仅与代码有关,而与运行时的行为无关,这种作用域便可以通过语义分析来识别;然而有一部分于语言的作用域是动态的(dynamic),这类语言通常会用运行时的方法来检查,比如将未定义的变量默认为空。
同一个标识符在不同的代码部分中代表的含义可能不同。在大多数编程语言中,大多数标识符遵循最近嵌套原则(most-closely
nested
rule) :代码由许多嵌套的代码块组成,对于一个标识符的引用,从该引用所处的代码块由里向外寻找该标识符的定义。
但是也有许多例外,例如在许多编程语言中,函数(或者方法)与类可以在声明之前使用;在大多数面向对象的语言中,类中的方法可以不在该类所在的块中定义而定义在其基类中,同时方法还可以由子类重定义。这些例外大多需要特殊处理。
由于涉及作用域的问题,所以符号表(symbol
table) 的结构不能像前两个步骤一样仅由一个哈希表完成。一般需要支持如下的操作:
进入新的作用域;
在当前符号表中寻找某个符号(标识符);
在当前作用域中寻找某个符号;
添加符号;
退出作用域;
其中第三个操作是因为一些语言不允许同一个作用域内以及函数的形参重定义同名的变量,也由于该操作所以通常符号表会设计成包含哈希表的栈。
对于例外的那些情况,例如全局定义的类或方法,常用的方法是先遍历一遍收集这些符号,再重新遍历一遍检查符号的引用。
类型
类型(type) 的定义在各种编程语言中不尽相同。但通常来说,类型是一类能够进行相同操作的变量集合,以及一些在该集合上可以进行的操作的统称。对于面向对象的语言来说,类型(type)和类(class)基本上是同义词。
汇编语言没有类型的概念,所以对变量进行类型检查,保证对变量的操作合法是编译器的责任。
类型系统的实现有两种方式:静态类型(static
typed)语言是指大多数或者绝大多数对类型的检查在编译期完成的语言,如
C、Java 等语言;动态类型(dynamic
typed)语言是指绝大多数对类型的检查在运行时完成的语言,如
Scheme、Python、Javascript 等语言。
通常类型检查会写成命题逻辑的形式,例如对加法表达式进行类型检查可以写成如下形式:
\[(e_1:\mathrm{Int} \land
e_2:\mathrm{Int}) \Rightarrow (e_1 + e_2):\mathrm{Int}\]
冒号表示某个表达式是哪个类型。传统上也会写成如下形式:
\[
\frac{
\begin{gathered}
O,M,C\vdash e_1:\mathrm{Int}\\
O,M,C\vdash e_2:\mathrm{Int}\\
\end{gathered}
}{
O,M,C\vdash (e_1+e_2):\mathrm{Int}
}
\]
前面的 \(O,M,C\) 称为类型环境(Type
environments) 。
类型环境
对于一部分表达式,仅靠命题逻辑形式本身是无法进行类型检测的,需要给出外部条件(例如对变量类型的检查),这类外部条件称为类型环境 。
有了类型环境,便可以写出变量的类型检查逻辑:
\[\frac{O,M,C\vdash O(x) = T}{O,M,C\vdash
x:T}\]
其中 \(O\) 代表
Object,是一个从变量标识符到类型的映射,可以使用之前所述的符号表来维护。
变量定义的类型检查逻辑可以写为(COOL 语法):
\[\frac{O[T_0/x],M,C\vdash
e_1:T_1}{O,M,C\vdash (\mathrm{let}\ x:T_0\ \mathrm{in}\
e_1):T_1}\]
其中中括号记号的含义如下: \[
f[y_0/x_0](x) = \begin{cases}
y_0 &, x=x_0 \\
f(x) &, \mathrm{otherwise}
\end{cases}
\]
至于 COOL 语言中的其它的类型环境,\(M\) 代表
Method,接受方法名与类名为参数,返回各形参类型与返回值类型;\(C\) 代表 Class,代表当前表达式所在的类,与
self 类型有关。
子类型
对于面向对象的语言来说有子类型(subtype) 的概念。有子类型就需要类型系统满足里氏替换原则,即所有基类可以出现的地方都可以出现其子类型。从集合的角度来说子类型应该是基类的子集。
可以在类型上建立如下的偏序关系 \(\leq\) 来描述这个概念:
当类型(类)\(X\) 继承自类型(类)\(Y\) 时,有 \(X \leq
Y\) ;
自反性:对于类型 \(X, X \leq
X\) ;
传递性:\((X\leq Y \land Y\leq Z)
\Rightarrow X\leq Z\) 。
有了该关系便可以对一些需要里氏替换原则的表达式进行类型检测。例如,对于赋值语句:
\[
\frac{
\begin{gathered}
O(x) = T_0 \\
O, M, C \vdash e_1 : T_1 \\
T_1 \leq T_0
\end{gathered}
}{
O, M, C \vdash (x \leftarrow e_1) : T_1
}
\]
同时可以定义一个重要的概念:最小上限(Least Upper Bound, abbr.
lub)。形式化地:
\[\begin{aligned}
& (X \leq Z \land Y \leq Z) \land (\forall Z' \in \{W \mid X
\leq W \land Y \leq W\}. Z \leq Z') \\
\Rightarrow & \mathrm{lub}(X, Y) = Z
\end{aligned}\]
对于绝大多数只有单继承的语言(如 Java)来说,其继承关系是一棵树,则
lub 即为两个类在树继承树上的最近公共祖先(Least Common Ancestor, abbr.
LCA);对于有多继承的语言(如
C++)则要复杂许多,其继承关系是有向无环图(Directed Acyclic Graph, abbr.
DAG),lub 为两个点的最近公共支配点。
有了最小上限的概念则可以对分支结构进行类型检查:
\[
\frac{
\begin{gathered}
O, M, C \vdash e_0 : \mathrm{Bool} \\
O, M, C \vdash e_1 : T_1 \\
O, M, C \vdash e_2 : T_2
\end{gathered}
}{
O, M, C \vdash (\mathrm{if}\ e_0\ \mathrm{then}\ e_1\ \mathrm{else}\
e_2) : \mathrm{lub}(T_1, T_2)
}
\]
因为运行前并不知道 \(e_0\) 求值的结果是真是假,所以在编译期这里只能做最保险的假设,也即 \(\mathrm{lub}(T_1, T_2)\) 。
self 类型
有的时候方法可能需要将自身作为返回值返回。对于没有继承特性的语言来说没有什么问题,但对于许多面向对象的语言来说,如果这个方法在基类当中,则对返回值类型的检查可能会造成问题。因此需要引入
self 类型。
对于一个方法,如果声明其返回值为 self
类型,则返回值的类型与调用对象的类型相同;而调用对象的类必须是方法定义所在的类或其子类,因此
self
类型所代表的是某个类及其子类的一个集合,对于没有多继承的语言来说即为继承树上的一个子树。
令 \(S_C\) 为出现在类 \(C\) 中的 self 类型,则有如下关系:
\(S_C \leq C, S_C \leq S_C\) ;
\(C \leq T \Rightarrow S_C \leq
T\) ;
\(\neg(T \leq S_C)\) ;
对于两个不同的类,其 self 类型的关系未定义。
第三条规则是因为 \(S_C\) 代表了 \(C\) 的所有子类,无法保证某个类型 \(T\) 与 \(C\) 的所有子类的关系;第四条规则是因为两个不同类的
self 类型不会同时出现,因此这样的关系没有意义。
定义了偏序关系后同样可以定义 self 类型的 lub:
\(\mathrm{lub}(S_C, S_C) =
S_C\) ;
\(\mathrm{lub}(S_C, T) = \mathrm{lub}(T,
S_C) = \mathrm{lub}(C, T)\) 。
对于 self 类型声明可能出现的位置,唯一需要注意的是 self
类型声明不能出现在方法形参的类型声明中。因为对于方法调用有如下规则:
\[
\frac{
\begin{gathered}
O, M, C \vdash e_0 : T_0 \\
O, M, C \vdash e_1 : T_1 \\
\vdots \\
O, M, C \vdash e_n : T_n \\
M(T_0, f) = (T'_1, T'_2, \cdots, T'_n, T_{n+1}) \\
\forall 1 \leq i \leq n. T_i \leq T'_i
\end{gathered}
}{
O, M, C \vdash (e_0.f(e_1, e_2, \cdots, e_n)) : T_{n+1}
}
\]
然而由于上面的规则,如果形参的类型声明中出现 self
类型,则最后一条 \(T_i\leq
T'_i\) 对该形参为恒假,因此 self
类型声明不能出现在方法形参的类型声明中。
错误恢复
同样地,语义分析器也需要错误恢复来尽可能多的发现代码中的错误。不过由于已经建立起了
AST,因此相比前面的步骤,在语义分析中进行错误恢复要更简单,例如对于未定义或重定义的变量,直接报告错误即可。
语义分析中的错误恢复的重点是类型检查中的错误恢复。一个朴素的解决方法是将类型检查错误的表达式指定为
Object
类型,即所有类型的根类型。但这样会导致级联报错,因为作为根类型,绝大多数的操作是无法对
Object 类进行的,而这又会导致错误并被指定为 Object 类型。
另一个方案是引入一个新的类型 No_type,并定义对所有类型 \(C\) ,有 \(\mathrm{No\\_type} \leq C\) 。这样所有操作对
No_type 均有定义,报错时只会报告发现错误的位置。
但引入 No_type 有一个问题:对于无多继承的语言来说,引入 No_type
后其继承关系将会由树变为一个
DAG,这会使得对类型的处理变得十分复杂。一个解决方法是将 No_type
视为一个特例,对其它的类型仍然使用基于树的算法。
复合类型
许多语言都有复合类型的概念,例如 C 中的数组、指针,C++
的模板类等。这些复合类型的声明需要使用类型表达式(type
expression),且需要语义分析器具有识别类型表达式的功能,并给出合适的类型签名。
对于数组、指针这类的类型,可能会导致别名(aliasing)问题:基类的指针可以指向子类指针所指向的空间并使用基类的初始化函数,此时子类指针想要调用基类中没有的方法便会产生错误。
一个方法是对这些类型的类型检查中禁用子类替换,这可能会损失一部分表达性;另一种方法是类似于
Java 使用运行时的机制来检测这类问题。
类型转换
许多语言都有不同类型的 “数”,例如有符号和无符号整数、不同长度的整数、浮点数等等。这些类型之间在进行算术运算的时候往往并不是通过定义每个类型之间的运算来实现,而是通过类型转换(type
conversion)来实现的。
某些类型转换必须要通过某种声明来触发,这些类型转换称为显式类型转换(explicit
~ /
cast);还有一些类型转换可以由编译器来识别并进行,称为隐式类型转换(implicit
~ /coercion)。
通常能够进行隐式类型转换的关系可以表示为一棵树,若两个被运算数的类型不同则将两个数都转换到该树的
LCA 上。
Lab
作业四需要使用 C++ 或 Java 来实现 COOL 语言的语义分析器。
首先在头文件中定义继承树的结点类:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 typedef InheritsNode_class* InheritsNode;typedef list_node<InheritsNode> InheritsNodes_class;typedef InheritsNodes_class* InheritsNodes;InheritsNodes nil_InheritsNodes () ;InheritsNodes single_InheritsNodes (InheritsNode) ;InheritsNodes append_InheritsNodes (InheritsNodes, InheritsNodes) ;class InheritsNode_class {private : Class_ cls; int depth, tree_size; InheritsNode parent; InheritsNode link_top, heavy_son; InheritsNodes sons; bool flag_errtype, flag_prim; void calc (int dep) ; void link () ; InheritsNode_class (const InheritsNode_class& rhs): cls (rhs.cls), depth (rhs.depth), tree_size (rhs.tree_size), parent (rhs.parent), link_top (rhs.link_top), heavy_son (rhs.heavy_son), sons (rhs.sons), flag_errtype (rhs.flag_errtype), flag_prim (rhs.flag_prim) {} int put_sym (attr_class*, SymbolTable<Symbol,InheritsNode_class>*) ; int put_sym (method_class*, SymbolTable<Symbol,InheritsNode_class>*) ; int put_sym (Feature, SymbolTable<Symbol,InheritsNode_class>*) ; public : InheritsNode_class (Class_ cls): cls (cls), depth (0 ), tree_size (0 ), parent (nullptr ), link_top (this ), heavy_son (nullptr ), sons (nil_InheritsNodes ()), flag_errtype (false ), flag_prim (false ) {} SymbolTable<Symbol, InheritsOprand> obj_symtab; SymbolTable<Symbol, Types_class> meth_symtab; void set_parent (InheritsNode) ; bool is_inited () { return depth > 0 ; } bool is_errtype () { return flag_errtype; } bool is_prim () { return flag_prim; } void dump (std::ostream&, int ) ; InheritsNode copy () { return new InheritsNode_class (*this ); }; Class_ get_class () { return cls; }; InheritsNode get_parent () { return parent; } InheritsNodes get_sons () { return sons; } friend InheritsNode Node_errtype (Class_) ; friend InheritsNode Node_prim () ; friend InheritsNode Node_object (Class_) ; friend InheritsNode lca (InheritsNode, InheritsNode) ; friend bool le (InheritsNode, InheritsNode) ; Types get_meth_type (Symbol) ; Type get_obj_type (Symbol) ; int gather_symbols (SymbolTable<Symbol,InheritsNode_class>*) ; int check (ClassTable*) ; void init () ; };
flag_errtype
为 true
代表该结点为 Err_type
(即为前文所述的
No_type,由于作业中要求将 no_expr 的类型标记为 No_type
因此改了个名字);flag_prim
为 true
代表该节点为特殊类型 prim_slot
(其实去掉好像也行);其它的属性为与树结构相关的信息,方便后续的树上操作。
然后定义判断类型关系与求最近公共祖先的函数。由于有 self 类型和
Err_type,为了方便起见先定义一个类 InheritsOperand
,对于类型关系之间的操作都在该类上实现:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 class InheritsOprand ;typedef InheritsOprand Type_class;typedef InheritsOprand* Type;typedef list_node<Type> Types_class;typedef Types_class* Types;class InheritsOprand {private : InheritsNode node; bool flag_self; public : InheritsOprand (InheritsNode node, bool is_self=false ): node (node), flag_self (is_self) {} InheritsOprand (const InheritsOprand& rhs): node (rhs.node), flag_self (rhs.flag_self) {} InheritsNode get_node () { return node; } bool is_self () { return flag_self; } Type trans_self (Type clsenv) { return flag_self ? clsenv : this ; } Symbol get_name () ; friend bool operator <= (InheritsOprand, InheritsOprand); void dump (std::ostream&, int ) ; Type copy () { return new InheritsOprand (*this ); }; }; InheritsOprand lca (InheritsOprand, InheritsOprand) ;Types nil_Types () ;Types single_Types (Type) ;Types append_Types (Types, Types) ;
对类型关系的操作使用了重链剖分 。首先需要初始化结点中的相关信息:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 void InheritsNode_class::calc (int dep=1 ) { this ->depth = dep; this ->tree_size = 1 ; for (auto i = sons->first (); sons->more (i); i = sons->next (i)) { auto son = sons->nth (i); son->calc (dep + 1 ); this ->tree_size += son->tree_size; if (!this ->heavy_son || this ->heavy_son->tree_size < son->tree_size) { this ->heavy_son = son; } } } void InheritsNode_class::link () { if (this ->heavy_son) { this ->heavy_son->link_top = this ->link_top; this ->heavy_son->link (); } for (auto i = sons->first (); sons->more (i); i = sons->next (i)) { auto son = sons->nth (i); if (this ->heavy_son != son) { son->link_top = son; son->link (); } } } void InheritsNode_class::init () { if (flag_errtype) { return ; } this ->calc (); this ->link (); }
calc()
用于计算各结点深度与子树大小,以及选择重儿子;link()
用于连接重链,标记重链链顶;init()
仅需根结点调用,用于对整棵树进行初始化。
接下来便是实现关系判定以及求 LCA 的函数:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 InheritsNode lca (InheritsNode lhs, InheritsNode rhs) { if (!lhs || lhs->is_errtype ()) { return rhs; } if (!rhs || rhs->is_errtype ()) { return lhs; } while (lhs->link_top != rhs->link_top) { if (lhs->link_top->depth < rhs->link_top->depth) { auto chs = lhs; lhs = rhs; rhs = chs; } lhs = lhs->link_top->parent; } return lhs->depth <= rhs->depth ? lhs : rhs; } bool le (InheritsNode lhs, InheritsNode rhs) { if (lhs->is_errtype ()) { return true ; } if (rhs->is_errtype () || lhs->depth < rhs->depth) { return false ; } while (lhs->link_top != rhs->link_top && lhs->depth >= rhs->depth) { lhs = lhs->link_top->parent; } return lhs->depth >= rhs->depth; } InheritsOprand lca (InheritsOprand lhs, InheritsOprand rhs) { return { lca (lhs.get_node (), rhs.get_node ()), lhs.is_self () && rhs.is_self () }; } bool operator <= (InheritsOprand lhs, InheritsOprand rhs) { if (lhs.is_self () && rhs.is_self ()) { return true ; } if (rhs.is_self () && !lhs.get_node ()->is_errtype ()) { return false ; } return le (lhs.get_node (), rhs.get_node ()); }
然后修改一下 ClassTable
的定义使其支持手写的继承树:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 class ClassTable {private : int semant_errors; void install_basic_classes () ; InheritsNode node_obj, node_err; ostream& error_stream; InheritsNode search_node (Symbol) ; public : ClassTable (Classes); SymbolTable<Symbol, InheritsNode_class>* inherits_tree; InheritsNode get_node_obj () { return node_obj; } InheritsNode get_node_err () { return node_err; } void gather_features (InheritsNode) ; void check (InheritsNode node) ; Type get_sym_type (Symbol, Type) ; Types get_meth_type (Type,Symbol) ; Type get_obj_type (Type,Symbol) ; int errors () { return semant_errors; } ostream& semant_error () ; ostream& semant_error (Class_ c) ; ostream& semant_error (Symbol filename, tree_node *t) ; };
其中的 inherits_tree
即为继承树的结点,node_obj
为继承树的根,node_err
代表特殊的
Err_type 结点。
然后修改一下入口函数 program_class::semant()
的定义来确定语义分析的步骤:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 void program_class::semant () { initialize_constants (); ClassTable *classtable = new ClassTable (classes); auto halt_if_err = [&]() { if (classtable->errors ()) { cerr << "Compilation halted due to static semantic errors." << endl; exit (1 ); } }; halt_if_err (); classtable->gather_features (classtable->get_node_obj ()); halt_if_err (); classtable->check (classtable->get_node_obj ()); halt_if_err (); }
语义分析的步骤首先是建立继承树,然后收集属性和方法名,接着根据继承树和属性与方法符号表进行检查。
然后是定义 ClassTable
的构造函数,根据入口函数 program_class::semant()
的注释,该构造函数作用是检测各个类的声明是否合法并构造继承树:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 ClassTable::ClassTable (Classes classes) : semant_errors (0 ) , error_stream (cerr) { inherits_tree = new SymbolTable <Symbol, InheritsNode_class>(); inherits_tree->enterscope (); install_basic_classes (); for (auto i = classes->first (); classes->more (i); i = classes->next (i)) { Class_ cls = classes->nth (i); Symbol name = cls->get_name (); if (inherits_tree->probe (name)) { semant_error (cls) << "Redefinition of class " << name->get_string () << std::endl; } else { InheritsNode node_cls = new InheritsNode_class (cls); inherits_tree->addid (name, node_cls); } } for (auto i = classes->first (); classes->more (i); i = classes->next (i)) { Class_ cls = classes->nth (i); InheritsNode node_cls = inherits_tree->probe (cls->get_name ()), node_parent = inherits_tree->probe (cls->get_parent ()); if (node_parent == NULL ) { semant_error (cls) << "Parent for class " << cls->get_name ()->get_string () << " not found" << std::endl; node_cls->set_parent (node_obj); } else if ( cls->get_parent ()->equal_string ("Int" , 3 ) || cls->get_parent ()->equal_string ("Bool" , 4 ) || cls->get_parent ()->equal_string ("String" , 6 ) ) { semant_error (cls) << "Class " << cls->get_parent ()->get_string () << " cannot be inherited" << std::endl; node_cls->set_parent (node_parent); } else { node_cls->set_parent (node_parent); } } node_obj->init (); for (auto i = classes->first (); classes->more (i); i = classes->next (i)) { Class_ cls = classes->nth (i); InheritsNode node_cls = inherits_tree->probe (cls->get_name ()); if (!node_cls->is_inited ()) { semant_error (cls) << "Class " << cls->get_name ()->get_string () << " is in cyclic inheritance" << std::endl; } } }
在 install_basic_classes()
中也需要添加语句将原始类型接到继承树上:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 void ClassTable::install_basic_classes () { Class_ Err_class = class_ (Err_type, No_class, nil_Features (), filename); this ->node_obj = Node_object (Object_class); this ->node_err = Node_errtype (Err_class); InheritsNode node_int = new InheritsNode_class (Int_class), node_bool = new InheritsNode_class (Bool_class), node_str = new InheritsNode_class (Str_class), node_io = new InheritsNode_class (IO_class), node_prim = Node_prim (); inherits_tree->addid (Object, node_obj); inherits_tree->addid (Err_type, node_err); inherits_tree->addid (prim_slot, node_prim); inherits_tree->addid (IO, node_io); inherits_tree->addid (Int, node_int); inherits_tree->addid (Bool, node_bool); inherits_tree->addid (Str, node_str); node_io->set_parent (node_obj); node_int->set_parent (node_obj); node_bool->set_parent (node_obj); node_str->set_parent (node_obj); }
然后是收集符号,将属性和方法分别收集到每个类的变量符号表和方法符号表中:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 void ClassTable::gather_features (InheritsNode node) { this ->semant_errors += node->gather_symbols (inherits_tree); for (auto i = node->get_sons ()->first (); node->get_sons ()->more (i); i = node->get_sons ()->next (i)) { gather_features (node->get_sons ()->nth (i)); } } int InheritsNode_class::gather_symbols (SymbolTable<Symbol,InheritsNode_class>* inh_tree) { this ->obj_symtab.enterscope (); this ->meth_symtab.enterscope (); this ->obj_symtab.addid (self, new InheritsOprand (this , true )); Features fs = this ->cls->get_features (); int errs = 0 ; for (auto i = fs->first (); fs->more (i); i = fs->next (i)) { Feature f = fs->nth (i); errs += put_sym (f, inh_tree); } return errs; } int InheritsNode_class::put_sym (attr_class* feature, SymbolTable<Symbol,InheritsNode_class>* inh_tree) { if (obj_symtab.probe (feature->get_name ())) { semant_error (this ->cls, feature) << "Redefinition for attribute " << feature->get_name () << endl; return 1 ; } if (feature->get_type_decl () == SELF_TYPE) { obj_symtab.addid (feature->get_name (), new InheritsOprand (this ,true )); } else { InheritsNode type = inh_tree->lookup (feature->get_type_decl ()); if (type) { obj_symtab.addid (feature->get_name (), new InheritsOprand (type)); } else { semant_error (this ->cls, feature) << "Not found type for type declaration in attribute " << feature->get_name () << endl; obj_symtab.addid (feature->get_name (), new InheritsOprand (inh_tree->lookup (Err_type))); return 1 ; } } return 0 ; } int InheritsNode_class::put_sym (method_class* feature, SymbolTable<Symbol,InheritsNode_class>* inh_tree) { Symbol name = feature->get_name (); if (meth_symtab.probe (name)) { semant_error (this ->cls, feature) << "Redefinition for method " << name << endl; return 1 ; } Formals fs = feature->get_formals (); Types ts = nil_Types (); int errs = 0 ; auto fn_append_ts = [&](Symbol type_decl) -> bool { if (type_decl == SELF_TYPE) { ts = append_Types (ts, single_Types (new InheritsOprand (this , true ))); return true ; } else { InheritsNode type = inh_tree->lookup (type_decl); if (type) { ts = append_Types (ts, single_Types (new InheritsOprand (type))); return true ; } else { ts = append_Types (ts, single_Types (new InheritsOprand (inh_tree->lookup (Err_type)))); ++errs; return false ; } } }; for (auto i = fs->first (); fs->more (i); i = fs->next (i)) { Formal f = fs->nth (i); Symbol type_decl = f->get_type_decl (); if (type_decl == SELF_TYPE) { semant_error (this ->cls, feature) << "Formal type cannot be SELF_TYPE" << endl << pad (2 ) << "In method " << name << endl; ++errs; continue ; } if (!fn_append_ts (type_decl)) { semant_error (this ->cls, feature) << "Not found type name for type declaration in formal " << f->get_name () << endl << pad (2 ) << "In method " << name << endl; } } if (!fn_append_ts (feature->get_return_type ())) { semant_error (this ->cls, feature) << "Not found type name for type declaration in return type" << endl << pad (2 ) << "In method " << name << endl; } if (!errs) { meth_symtab.addid (name, ts); } return errs; } int InheritsNode_class::put_sym (Feature feature, SymbolTable<Symbol,InheritsNode_class>* inh_tree) { if (feature->is_attr ()) { return put_sym (dynamic_cast <attr_class*>(feature), inh_tree); } else { return put_sym (dynamic_cast <method_class*>(feature), inh_tree); } }
然后是类型检查,根据手册上给出的类型检查规则写即可:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 void ClassTable::check (InheritsNode node) { this ->semant_errors += node->check (this ); for (auto i = node->get_sons ()->first (); node->get_sons ()->more (i); i = node->get_sons ()->next (i)) { check (node->get_sons ()->nth (i)); } } Types ClassTable::get_meth_type (Type t, Symbol s) { auto node = t->get_node (); if (node->is_errtype () || node->is_prim ()) { return NULL ; } while (node != node_obj) { Types ts = node->get_meth_type (s); if (!ts) { node = node->get_parent (); } else { return ts; } } return node->get_meth_type (s); } Type ClassTable::get_obj_type (Type start_t , Symbol s) { auto node = start_t ->get_node (); if (node->is_errtype () || node->is_prim ()) { return nullptr ; } while (node != node_obj) { Type ret_t = node->get_obj_type (s); if (!ret_t ) { node = node->get_parent (); } else { return ret_t ; } } return node->get_obj_type (s); } InheritsNode ClassTable::search_node (Symbol s) { return inherits_tree->probe (s); }Type ClassTable::get_sym_type (Symbol s, Type clsenv) { if (s == SELF_TYPE) { return clsenv; } else { auto node = search_node (s); return node ? new Type_class (node) : nullptr ; } } Types InheritsNode_class::get_meth_type (Symbol s) { return this ->meth_symtab.lookup (s); }Type InheritsNode_class::get_obj_type (Symbol s) { return this ->obj_symtab.lookup (s); }int attr_class::check (ClassTable* clstab, Type clsenv) { InheritsNode node = clsenv->get_node (); int errs = this ->init->check (clstab, clsenv); if (init->get_type () == No_type) { return errs; } Type init_t = clstab->get_sym_type (init->get_type (), clsenv), decl_t = node->get_obj_type (this ->name)->trans_self (clsenv); if (!(*init_t <= *decl_t )) { semant_error (node->get_class (), this ) << "Type mismatch in attribute declaration" << endl << pad (2 ) << "Type declared as " << decl_t ->get_name () << " while initialed by " << init_t ->get_name () << endl; ++errs; } return errs; } int method_class::check (ClassTable* clstab, Type clsenv) { InheritsNode node = clsenv->get_node (); int errs = 0 ; node->obj_symtab.enterscope (); Types meth_t = node->meth_symtab.probe (this ->name); int i = this ->formals->first (); for (; this ->formals->more (i); i = this ->formals->next (i)) { Formal f = formals->nth (i); if (f->get_name () == self) { semant_error (node->get_class (), this ) << "Formal parameter name cannot be self" << endl; ++errs; continue ; } if (node->obj_symtab.probe (f->get_name ())) { semant_error (node->get_class (), this ) << "Redefinition of formal parameter " << f->get_name () << endl; ++errs; continue ; } node->obj_symtab.addid (f->get_name (), meth_t ->nth (i)->trans_self (clsenv)); } if (errs) { return errs; } errs += this ->expr->check (clstab, clsenv); if (this ->expr->get_type () != No_type) { Type expr_t = clstab->get_sym_type (this ->expr->get_type (), clsenv), ret_t = meth_t ->nth (i)->trans_self (clsenv); if (!(*expr_t <= *ret_t )) { semant_error (node->get_class (), this ) << "Type mismatch in method declaration" << endl << pad (2 ) << "Return type declared as " << ret_t ->get_name () << " while type of body is " << expr_t ->get_name () << endl; ++errs; } } node->obj_symtab.exitscope (); return errs; } int no_expr_class::check (ClassTable* clstab, Type clsenv) { this ->set_type (No_type); return 0 ; }int int_const_class::check (ClassTable* clstab, Type clsenv) { this ->set_type (Int); return 0 ; }int bool_const_class::check (ClassTable* clstab, Type clsenv) { this ->set_type (Bool); return 0 ; }int string_const_class::check (ClassTable* clstab, Type clsenv) { this ->set_type (Str); return 0 ; }int isvoid_class::check (ClassTable* clstab, Type clsenv) { int errs = this ->e1->check (clstab, clsenv); this ->set_type (Bool); return errs; } int new__class::check (ClassTable* clstab, Type clsenv) { if (clstab->get_sym_type (this ->type_name, clsenv)) { this ->set_type (this ->type_name); return 0 ; } else { semant_error (clsenv->get_node ()->get_class (), this ) << "Type name " << this ->type_name << " undefined" << endl; this ->set_type (Err_type); return 1 ; } } int loop_class::check (ClassTable* clstab, Type clsenv) { int errs = this ->pred->check (clstab, clsenv) + this ->body->check (clstab, clsenv); if (this ->pred->get_type () == Bool) { this ->set_type (Object); } else { semant_error (clsenv->get_node ()->get_class (), this ) << "Type mismatch in loop clause" << endl << pad (2 ) << "Bool required in prelude but found " << pred->get_type () << endl; this ->set_type (Err_type); ++errs; } return errs; } int comp_class::check (ClassTable* clstab, Type clsenv) { int errs = this ->e1->check (clstab, clsenv); if (e1->get_type () == Bool) { this ->set_type (Bool); } else { semant_error (clsenv->get_node ()->get_class (), this ) << "Type mismatch in complement operation" << endl << pad (2 ) << "Bool required but found " << e1->get_type () << endl; this ->set_type (Err_type); ++errs; } return errs; } int neg_class::check (ClassTable* clstab, Type clsenv) { int errs = this ->e1->check (clstab, clsenv); if (e1->get_type () == Int) { this ->set_type (Int); } else { semant_error (clsenv->get_node ()->get_class (), this ) << "Type mismatch in negative operation" << endl << pad (2 ) << "Int required but found " << e1->get_type () << endl; this ->set_type (Err_type); ++errs; } return errs; } int plus_class::check (ClassTable* clstab, Type clsenv) { int errs = this ->e1->check (clstab, clsenv) + this ->e2->check (clstab, clsenv); if (e1->get_type () == Int && e2->get_type () == Int) { this ->set_type (Int); } else { semant_error (clsenv->get_node ()->get_class (), this ) << "Type mismatch in plus operation" << endl << pad (2 ) << "Two ints required but found " << e1->get_type () << " on left and " << e2->get_type () << " on right" << endl; this ->set_type (Err_type); ++errs; } return errs; } int sub_class::check (ClassTable* clstab, Type clsenv) { int errs = this ->e1->check (clstab, clsenv) + this ->e2->check (clstab, clsenv); if (e1->get_type () == Int && e2->get_type () == Int) { this ->set_type (Int); } else { semant_error (clsenv->get_node ()->get_class (), this ) << "Type mismatch in substract operation" << endl << pad (2 ) << "Two ints required but found " << e1->get_type () << " on left and " << e2->get_type () << " on right" << endl; this ->set_type (Err_type); ++errs; } return errs; } int mul_class::check (ClassTable* clstab, Type clsenv) { int errs = this ->e1->check (clstab, clsenv) + this ->e2->check (clstab, clsenv); if (e1->get_type () == Int && e2->get_type () == Int) { this ->set_type (Int); } else { semant_error (clsenv->get_node ()->get_class (), this ) << "Type mismatch in multiply operation" << endl << pad (2 ) << "Two ints required but found " << e1->get_type () << " on left and " << e2->get_type () << " on right" << endl; this ->set_type (Err_type); ++errs; } return errs; } int divide_class::check (ClassTable* clstab, Type clsenv) { int errs = this ->e1->check (clstab, clsenv) + this ->e2->check (clstab, clsenv); if (e1->get_type () == Int && e2->get_type () == Int) { this ->set_type (Int); } else { semant_error (clsenv->get_node ()->get_class (), this ) << "Type mismatch in divide operation" << endl << pad (2 ) << "Two ints required but found " << e1->get_type () << " on left and " << e2->get_type () << " on right" << endl; this ->set_type (Err_type); ++errs; } return errs; } int lt_class::check (ClassTable* clstab, Type clsenv) { int errs = this ->e1->check (clstab, clsenv) + this ->e2->check (clstab, clsenv); if (e1->get_type () == Int && e2->get_type () == Int) { this ->set_type (Bool); } else { semant_error (clsenv->get_node ()->get_class (), this ) << "Type mismatch in lt operation" << endl << pad (2 ) << "Two ints required but found " << e1->get_type () << " on left and " << e2->get_type () << " on right" << endl; this ->set_type (Err_type); ++errs; } return errs; } int leq_class::check (ClassTable* clstab, Type clsenv) { int errs = this ->e1->check (clstab, clsenv) + this ->e2->check (clstab, clsenv); if (e1->get_type () == Int && e2->get_type () == Int) { this ->set_type (Bool); } else { semant_error (clsenv->get_node ()->get_class (), this ) << "Type mismatch in leq operation" << endl << pad (2 ) << "Two ints required but found " << e1->get_type () << " on left and " << e2->get_type () << " on right" << endl; this ->set_type (Err_type); ++errs; } return errs; } int eq_class::check (ClassTable* clstab, Type clsenv) { int errs = this ->e1->check (clstab, clsenv) + this ->e2->check (clstab, clsenv); Symbol e1_t_sym = e1->get_type (), e2_t_sym = e2->get_type (); auto is_prim = [](Symbol s) -> bool { return s == Int || s == Bool || s == Str; }; if (((is_prim (e1_t_sym) || is_prim (e2_t_sym)) && e1_t_sym == e2_t_sym) || !(is_prim (e1_t_sym) || is_prim (e2_t_sym))) { this ->set_type (Bool); } else { semant_error (clsenv->get_node ()->get_class (), this ) << "Type mismatch in eq operation" << endl << pad (2 ) << "Found " << e1->get_type () << " on left and " << e2->get_type () << " on right" << endl; this ->set_type (Err_type); ++errs; } return errs; } int cond_class::check (ClassTable* clstab, Type clsenv) { int errs = this ->pred->check (clstab, clsenv) + this ->then_exp->check (clstab, clsenv) + this ->else_exp->check (clstab, clsenv); if (pred->get_type () == Bool) { Type then_t = clstab->get_sym_type (then_exp->get_type (), clsenv), else_t = clstab->get_sym_type (else_exp->get_type (), clsenv); this ->set_type (lca (*then_t , *else_t ).get_name ()); } else { semant_error (clsenv->get_node ()->get_class (), this ) << "Type mismatch in condition clause" << endl << pad (2 ) << "Bool required in prelude but found " << pred->get_type () << endl; this ->set_type (Err_type); ++errs; } return errs; } int block_class::check (ClassTable* clstab, Type clsenv) { int errs = 0 ; for (auto i = this ->body->first (); this ->body->more (i); i = this ->body->next (i)) { Expression expr = this ->body->nth (i); errs += expr->check (clstab, clsenv); this ->set_type (expr->get_type ()); } return errs; } int object_class::check (ClassTable* clstab, Type clsenv) { Type t = clstab->get_obj_type (clsenv, name); if (t) { this ->set_type (t->get_name ()); return 0 ; } else { semant_error (clsenv->get_node ()->get_class (), this ) << "Identifier " << this ->name << " used without declaration" << endl; this ->set_type (Err_type); return 1 ; } } int assign_class::check (ClassTable* clstab, Type clsenv) { int errs = 0 ; Type obj_t = clstab->get_obj_type (clsenv, name); if (!obj_t ) { semant_error (clsenv->get_node ()->get_class (), this ) << "Identifier " << this ->name << " used without declaration" << endl; obj_t = new Type_class (clstab->get_node_obj ()); ++errs; } expr->check (clstab, clsenv); Type expr_t = clstab->get_sym_type (expr->get_type (), clsenv); if (!(*expr_t <= *obj_t )) { semant_error (clsenv->get_node ()->get_class (), this ) << "Type mismatch in assignment" << endl << pad (2 ) << "Variety type is " << obj_t ->get_name () << " but type of expression is " << expr_t ->get_name () << endl; ++errs; } this ->set_type (errs ? Err_type : expr_t ->get_name ()); return errs; } int dispatch_class::check (ClassTable* clstab, Type clsenv) { int errs = this ->expr->check (clstab, clsenv); Type expr_t = clstab->get_sym_type (this ->expr->get_type (), clsenv); Types meth_ts = clstab->get_meth_type (expr_t , this ->name); if (!meth_ts) { semant_error (clsenv->get_node ()->get_class (), this ) << "Method " << this ->expr->get_type () << "." << this ->name << " not found " << endl; this ->set_type (Err_type); return errs + 1 ; } if (meth_ts->len () - 1 != actual->len ()) { semant_error (clsenv->get_node ()->get_class (), this ) << "Parameters length mismatch for dispatch " << this ->expr->get_type () << "." << this ->name << endl; this ->set_type (Err_type); return errs + 1 ; } int i; for (i = this ->actual->first (); this ->actual->more (i); i = this ->actual->next (i)) { Type form_t = meth_ts->nth (i); Expression act_e = actual->nth (i); errs += act_e->check (clstab, clsenv); Type act_t = clstab->get_sym_type (act_e->get_type (), clsenv); if (!(*act_t <= *form_t )) { semant_error (clsenv->get_node ()->get_class (), this ) << "Parameter mismatch for " << i << "th parameter in dispatch " << this ->expr->get_type () << "." << this ->name << endl << pad (2 ) << form_t ->get_name () << " required but found " << act_e->get_type () << " in actual parameter" << endl; ++errs; } } if (errs) { this ->set_type (Err_type); } else { Type ret_t = meth_ts->nth (i); this ->set_type (ret_t ->is_self () ? expr->get_type () : ret_t ->get_name ()); } return errs; } int static_dispatch_class::check (ClassTable* clstab, Type clsenv) { int errs = this ->expr->check (clstab, clsenv); Type expr_t = clstab->get_sym_type (this ->expr->get_type (), clsenv); if (this ->type_name == SELF_TYPE) { semant_error (clsenv->get_node ()->get_class (), this ) << "Type declaration in static dispatch cannot be SELF_TYPE" << endl; this ->set_type (Err_type); return errs + 1 ; } Type decl_t = clstab->get_sym_type (this ->type_name, clsenv); if (!decl_t ) { semant_error (clsenv->get_node ()->get_class (), this ) << "Type " << this ->type_name << " not found " << endl; this ->set_type (Err_type); return errs + 1 ; } if (!(*expr_t <= *decl_t )) { semant_error (clsenv->get_node ()->get_class (), this ) << "Type for static dispatch " << this ->expr->get_type () << "@" << this ->type_name << " mismatch " << endl; this ->set_type (Err_type); return errs + 1 ; } Types formal_ts = clstab->get_meth_type (decl_t , this ->name); if (!formal_ts) { semant_error (clsenv->get_node ()->get_class (), this ) << "Method " << this ->type_name << "." << this ->name << " not found " << endl; this ->set_type (Err_type); return errs + 1 ; } if (formal_ts->len () - 1 != actual->len ()) { semant_error (clsenv->get_node ()->get_class (), this ) << "Parameters length mismatch for method " << this ->expr->get_type () << "." << this ->name << endl; this ->set_type (Err_type); return errs + 1 ; } int i; for (i = formal_ts->first (); formal_ts->more (i); i = formal_ts->next (i)) { Type form_t = formal_ts->nth (i); Expression act_e = actual->nth (i); errs += act_e->check (clstab, clsenv); Type act_t = clstab->get_sym_type (act_e->get_type (), clsenv); if (!(*act_t <= *form_t )) { semant_error (clsenv->get_node ()->get_class (), this ) << "Parameter mismatch for " << i << "th parameter in method " << this ->expr->get_type () << "." << this ->name << endl << pad (2 ) << form_t ->get_name () << " required but found " << act_e->get_type () << " in actual parameter" << endl; ++errs; } } if (errs) { this ->set_type (Err_type); } else { Type ret_t = formal_ts->nth (i); this ->set_type (ret_t ->is_self () ? expr->get_type () : ret_t ->get_name ()); } return errs; } int let_class::check (ClassTable* clstab, Type clsenv) { InheritsNode node = clsenv->get_node (); node->obj_symtab.enterscope (); int errs = 0 ; if (identifier == self) { semant_error (clsenv->get_node ()->get_class (), this ) << "Identifier cannot be self" << endl; ++errs; } else { Type decl_t = clstab->get_sym_type (this ->type_decl, clsenv); if (!decl_t ) { semant_error (clsenv->get_node ()->get_class (), this ) << "Type declaration not found for " << this ->type_decl << endl; ++errs; node->obj_symtab.addid (this ->identifier, new Type_class (clstab->get_node_err ())); } else { node->obj_symtab.addid (this ->identifier, decl_t ); errs += this ->init->check (clstab, clsenv); if (this ->init->get_type () != No_type) { Type init_t = clstab->get_sym_type (this ->init->get_type (), clsenv); if (!(*init_t <= *decl_t )) { semant_error (clsenv->get_node ()->get_class (), this ) << "Type for let clause mismatch" << endl << pad (2 ) << this ->type_decl << " required but found " << init_t ->get_name () << endl; ++errs; } } else { node->obj_symtab.addid (this ->identifier, decl_t ); } } } errs += this ->body->check (clstab, clsenv); node->obj_symtab.exitscope (); this ->set_type (body->get_type ()); return errs; } int typcase_class::check (ClassTable* clstab, Type clsenv) { int errs = this ->expr->check (clstab, clsenv); Type ret_t = new Type_class (clstab->get_node_err (), true ); for (auto i = this ->cases->first (); this ->cases->more (i); i = this ->cases->next (i)) { Case c = this ->cases->nth (i); errs += c->check (clstab, clsenv); Type c_t = clstab->get_sym_type (c->get_type (), clsenv); ret_t = new Type_class (lca (*ret_t , *c_t )); } this ->set_type (ret_t ->get_name ()); return errs; } int branch_class::check (ClassTable* clstab, Type clsenv) { int errs = 0 ; InheritsNode node = clsenv->get_node (); node->obj_symtab.enterscope (); if (this ->name == self) { semant_error (clsenv->get_node ()->get_class (), this ) << "Identifier cannot be self" << endl; ++errs; } else { Type decl_t = clstab->get_sym_type (this ->type_decl, clsenv); if (!decl_t ) { Type decl_t = clstab->get_sym_type (this ->type_decl, clsenv); node->obj_symtab.addid (this ->name, new Type_class (clstab->get_node_err ())); semant_error (clsenv->get_node ()->get_class (), this ) << "Type declaration not found for " << this ->type_decl << endl; ++errs; } else { node->obj_symtab.addid (this ->name, decl_t ); } } errs += this ->expr->check (clstab, clsenv); node->obj_symtab.exitscope (); return errs; }