Assignment 3 (Due: Thursday, December 26, 2013)

Table of Contents

1 Overview

This assignment builds on top of the previous one. This means that if your solutions to the first two questions in assignment #2 do not work properly, you will have to fix them by the time you submit assignment #3, lest your code fail to run as expected.

This assignment is concerned with the semantic analysis of the Scheme code. Both problem #1 & #2 require that you modify scheme-compiler/tag_parser.py.

You will need to extend the abstract syntax tree for Scheme expressions. If you have made use of the visitor pattern to traverse your AST, then you will need to make some minor adjustments to your visitors as well.

The substance of assignment #3 is to compute the lexical address for each variable occurrence, as well as annotate all applications in tail position. If you have used visitors for assignment #2, each of these problems can be handled by adding a new visitor; Otherwise, you need to add two methods to each of the classes that inherit from AbstractSchemeExpr.

2 Computing the lexical address

For this problem, you need to implement code for computing the lexical address of each variable occurrence in your abstract syntax trees:

  • Define three new classes: VarFree(Variable), VarParam(Variable), VarBound(Variable).
    • VarFree(Variable) needs no new state.
    • VarParam(Variable) should have the field: minor.
    • VarBound(Variable) should have the two fields: major, minor.
  • The constructors for these classes should call the superclass.
  • If you used visitors, make sure to add the above classes to your abstract visitor class.
  • Make sure the str() procedure works properly with these new classes. The string representation should be the same as for Variable.
  • Define the instance method debruijn(self) in AbstractSchemeExpr, so that when invoked from a Scheme expression e, returns an expression similar to e, in which each Variable object has been replaced with the corresponding VarFree, VarParam, or VarBound, as appropriate, and all the lexical addresses computed accordingly.

3 Annotating tail calls

For this problem, you need to implement code for annotating applications in tail position:

  • Define a new class: ApplicTP(Applic).
  • The constructor for this class should class the superclass.
  • If you used visitors, make sure to add ApplicTP to your abstract visitor class.
  • Make sure the str() procedure works properly with instances of ApplicTP. The string representation should be the same as for Applic.
  • Define the instance method annotateTC(self) in AbstractSchemeExpr, so that when invoked from a Scheme expression e, returns an expression similar to e, in which applic Applic records that are in tail position have been replaced by corresponding instances of ApplicTP.

You should define an instance method semantic_analysis(self) in AbstractSchemeExpr as follows:

def semantic_analysis(self):
    return self.debruijn().annotateTC()

4 Generating a Parser for a Set-based Data Structure Language

The goal of this exercise is to implement a parser for a language called ADTL (for Abstract Data Type Language), which we describe next, using the CUP LALR parser generator.

Definition of ADTL

An ADTL program defines a data type in a similar way to a class in most object-oriented languages, in addition to features such as: sets are first-class types, locking statements, and path expressions. A variation of ADTL was used in a paper to guide an advanced static analysis in finding optimization opportunities for graph algorithms with optimistic parallelization.

Here is a link to an example ADTL program, which defines a graph data type.

Lexical Considerations.

  • Identifiers start with an underscore or a letter and may continue with an underscore, letter, or a digit.
  • Integer literals do not have extra leading 0's. Examples of legal literals are \0" and \-5". An example of an illegal literal is \007".
  • String literals may span at most one line and may contain escape sequences for whitespace, new line, tab, backslash, and quotation marks.
  • ADTL supports single-line comments and multi-line comments as in other languages such as C++ and Java.

The following table lists all token types in ADTL, associating a (Java constant name) with a description of each one:

Description Constant (in sym.java)
/ DIVIDE
{ LCBR
<= LTE
rev REV
; SEMI
integer constants INT
- MINUS
assert ASSERT
| BAR
! NOT
< LT
choose CHOOSE
in IN
( LP
, COMMA
lock LOCK
) RP
+ PLUS
quoted strings QUOTE
= ASSIGN
if IF
identifiers ID
. DOT
} RCBR
&& LAND
return RETURN
new NEW
+= ASSIGN_PLUS
|| LOR
!= NEQ
-= ASSIGN_MINUS
== EQ
>= GTE
* TIMES
: COLON
else ELSE
@ AT
> GT
set SET

Grammatical Considerations.

The full grammar of ADTL is shown below.

We use the following notational conventions:

  • Non-terminals start with a capital letter and appear in boldface. For example: Stmt.
  • Terminals appear between single quotation marks. For example '||' is the terminal used for logical or.
  • The terminals for identifiers, integer constants and quoted strings appear as id, int, and quote.
  • We use the Kleene star to denote possibly empty sequences. For example Stmt* is a sequence of 0 or more statements. Similarly, the + version is used to denote non-empty sequences. So, Stmt+ is a sequence of 1 or more statements.
  • We denote a possibly-empty sequence of elements separated by a token by the notation Element⊗token and non-empty sequences by Element⊕token. Examples are Arg⊗',', which denotes a sequence of 0 or more arguments separated by commas, and PathElement⊕'.', which denotes a sequence of 1 or more path elements separated by dots.
  • We write [X] to denote that X is optional (used for PathElement).

 id             Identifier token
 int            Integer token
 quote          Quoted string token
 
 ADT → id '{' Element* '}'
 ElementFieldDef | MethodDef
 FieldDef → id ':' Type';'
 Type → id | 'set'
 MethodDef → id '(' Arg⊗',' ')' '{' Stmt* '}'
 Arg → id ':' Type

Stmt → 'return' Expr';'
        | Assign';'
        | 'if' '('Expr')' SingleOrBlockStmt
        | 'if' '('Expr')' SingleOrBlockStmt 'else' SingleOrBlockStmt
        | 'assert' Expr';'
        | 'assert' Expr ':' quote';'
        | 'lock' PathExpr';'
        
AssignPathExpr '=' Expr
AssignPathExpr '+=' Expr
AssignPathExpr '-=' Expr
SingleOrBlockStmtStmt | '{' Stmt+ '}'

Expr → 'choose' Expr
        | int
        | '|' Expr '|'
        | 'new' Type '(' AssignedArg⊗',' ')'
        | '-' Expr
        | Expr '+' Expr
        | Expr '*' Expr
        | Expr '-' Expr
        | Expr '/' Expr                        
        | Expr '<' Expr
        | Expr '<=' Expr
        | Expr '>' Expr
        | Expr '>=' Expr
        | Expr '==' Expr
        | Expr '!=' Expr
        | '!' Expr
        | Expr '&&' Expr
        | Expr '||' Expr
        | Expr 'in' Expr
        | PathExpr
        | '(' Expr ')'

AssignedArg → id '=' Expr
        
PathExprPathElement⊕'.'
PathElement → id ['@'] | 'rev' '(' id ')' ['@']
 

What to Implement

You will implement the parser using the CUP LALR parser generation tool. You will add semantic actions to the grammar productions in order to construct the abstract syntax tree (AST). The rest of the job - traversing the generated AST and pretty-printing - is done by code given to you. This code includes:

  • AST The adtl.ast package contains a class hierarchy for the abstract syntax tree node types;
  • Printing Visitor A visitor (ASTToString) that traverses a given tree and handles the printing for each node type;
  • String Templates The file ASTToString.stg contains templates that handle the formatting using the string template library.
  • Driver Program The Main class handles parsing the command-line arguments, reading the input files, running the parser, and finally printing to the output file (or console, if no file is specified).

Please read the code and make sure you understand it.

Running java adtl.Main -format tests/Graph.adt (assuming the CLASSPATH contains the build sub-directory and references to antlr-4.1-complete.jar, java-cup-11a.jar, and java-cup-11a-runtime.jar) results in printing the text shown below.

Graph {
  ns : set;

  es : set;

  getNeighbors(n : Node) {
    lock n.rev(src).dst;
    lock n.rev(dst).src;
    assert (n in ns) : "Attempt to get neighbors of a node that is not contained in the graph!";
    nghbors = (n.rev(src).dst + n.rev(dst).src);
    return new Set(content=nghbrs);
  }

  addEdge(f : Node, t : Node, d : ED) {
    lock f;
    lock t;
    lock f.rev(src).dst.t;
    lock t.rev(dst).src.f;
    eft = f.rev(src)@.dst.t;
    etf = t.rev(dst)@.src.f;
    assert (eft == etf) : "Graph is not undirected: detected a directed edge!";
    assert (|eft| == 1) : "Detected parallel edges!";
    if ((eft in es)) {
      ne = (new Edge(src=f, dst=t, ed=d) + new Edge(src=t, dst=f, ed=d));
      es = (es + ne);
    }

  }

  addNode(n : Node) {
    lock n;
    assert !(n in ns) : "Attempt to add a node already contained in the graph!";
    ns = (ns + n);
  }

  removeNode(n : Node) {
    lock n.rev(src).dst;
    lock n.rev(dst).src;
    if ((n in ns)) {
      es = (es - (n.rev(src) + n.rev(dst)));
      ns = (ns - n);
    }

  }

  removeEdge(f : Node, t : Node) {
    lock f.rev(src).dst.t;
    lock t.rev(dst).src.f;
    eft = f.rev(src)@.dst.t;
    etf = t.rev(dst)@.src.f;
    assert (eft == etf) : "Graph is not undirected: detected a directed edge!";
    if ((eft in es)) {
      es = (es - (eft + etf));
    }

  }

  getNodeData(n : Node) {
    lock n;
    return n.nd;
  }

  getEdgeData(e : Edge) {
    lock e;
    lock e.src;
    lock e.dst;
    return e.ed;
  }

  setEdgeData(e : Edge, d : ED) {
    lock e;
    lock e.src;
    lock e.dst;
    e.ed = d;
  }
}

You are also given an Ant build.xml file to help you automate tasks such as generating the parser, compiling the sources, cleaning, etc. Using the build file and Ant is completely optional and up to you (we will not consider it part of your submission).

The lib sub-directory already contains the java-cup-11a.jar, which you should use to generate the parser and java-cup-11a-runtime.jar, which you should use for compiling the sources and running the program (should be available on the CLASSPATH). It also contains the antlr-4.1-complete.jar, which is used to format the output (should also be available on the CLASSPATH).

You should complete the ADTL.cup file with all the terminal and non terminal definitions, production definitions, any precedence directives that may be needed, semantic actions for building the abstract syntax tree, and any other content necessary to achieve the assignment. Running CUP on this file should generate the files Parser.java and sym.java, which should be placed under the adtl package and compiled with the rest of the sources (the ANT file does this automatically).

The program should be invoked using the following format:
java adtl.Main -format <file.adt> <file.out.adt>

where <file.adt> is the name of the input file and <file.out.adt> is the name of the output file. Use the -help option to view all command-line options.

Testing the Parser

We expect you to perform your own testing of the parser. You should develop a thorough test suite that exercises all productions and as many syntax errors as you can think of. We will test your parser against our own test cases, including programs that are syntactically correct, and also programs that contain syntax errors.

Pitfalls and Suggestions

We suggest that you start by first building your parser without adding the semantic actions to build the AST. This would let you first handle issues related to LR parsing such as shift/reduce and reduce/reduce conflicts. You can use the -parse command-line option to invoke the parser on inputs and check whether it accepts/rejects them as you expect. If the parser does not behave as you expect it, you can use the -debug parse option to have the parser print extra information during its operations. Once you gain confidence in your parser you can start adding the semantic actions. You will sometimes meet compiler errors in Parser.java. Recall that any changes you make to that code will be over-written next time you generate the parser code.

Make sure you understand how the AST classes correspond to the grammar non-terminals. Notice that there are no corresponding AST node for accumulating assignments (e.g., a += b). You should figure out how to use the existing classes to represent such assignments.

What to Turn In

You should submit your ADTL.cup file. You should also submit the test suite programs you used to test your parser on (input text files).

Please use the following naming convention for your test suite programs: file-name.adt and file-name.formatted.adt for syntactically-correct inputs and result of formatting, and error_description.adt for inputs containing syntax error where error_description is a short description of the type of syntactic error contained in the file (for example, error_empty_file.adt for a program with an empty list of elements).

5 How to submit

  • You should submit a zip file hw3.zip that creates a directory hw3 containing:
  • scheme-compiler
    • pc.py
    • sexprs.py
    • reader.py
    • tag_parser.py
  • problem-3
    • See list of files & specific instructions in the statement of the problem.
  • A file readme.txt that contains the names, IDs, and email addresses of all group members (you may either work alone, or with a single partner), as well as the following statement:

I(We), <fill in your name(s)>, declare the following to be true:

  • We worked on this assignment alone. We did not consult with others about this assignment, other than the teaching staff for this course.
  • We did not copy code either from other students or from the internet.
  • We did not make available this code to other students taking this course.
  • We are aware of the University, faculty (as specified in the ‏שנתון‎), departmental, and course policies (as specified in the syllabus) on academic dishonesty: We realize that if we are found to have committed academic dishonesty that we shall be sent before the disciplinary committee (‏ועדת משמעת‎), fail the course, and possibly incur other penalties as decided by the disciplinary committee.

We will not grade any work submitted that does not include the above statement. Please do not forget to include it, with your names spelled out. If you do not do this, for any reason, then at the very least, you will incur the 5% per diem late penalty (as specified in the course syllabus).

5.1 Last minute instructions

  • Unlike the first assignment, we are now asking you to include the parsing combinators package pc.py with your work.
  • PLEASE do make sure that your work runs on the departmental Linux machines. It will save you much grief later on!
  • PLEASE do go over the submission instructions carefully, and make sure you did not leave out anything.

Author: Mayer Goldberg

Created: 2013-12-17 Tue 21:53

Validate XHTML 1.0