This page serves as the catalog for my lab materials for CS320 Concepts of Programming Languages, Fall 2016, by Prof. Assaf Kfoury. The course website is available here.

This page will constantly be updated to include new material.

Logistics

Lectures

Tue, Thu 12:30 pm - 2:00 pm, in Room SCI 113 (Metcalf Science Building).

Labs

  • Wed 10:00 am - 11:00 am, in Room SOC B57 (Sociology Building).
  • Wed 11:00 am - 12:00 noon, in Room SOC B63 (Sociology Building).
  • Wed 12:00 noon - 1:00 pm, in Room MCS B23 (Math and Computer Science).
  • Wed 3:00 pm - 4:00 pm, in Room CAS 315 (College of Arts and Sciences).

TA Info

If you could not make any office hours (and only if so), make a reservation here.

Submission

We use gsubmit. Please read the manuel very carefully. Note that we are submitting to cs320, case sensitive. You will need a CS account, follow instructions here.

Links

Some Basic Concepts

The following sections give extremely simplified (read "inaccurate") explanations to some basic concepts.

Syntax and Semantics

Syntax defines strings/programs that are considered valid in the programming language. Semantics defines meanings of programs.

The core elements of any syntax are variables, function definitions, and function applications.

Expressions, Statements and Side Effects

In a sense, programs are expressions formed again by (sub-)expressions. We "run a program" by evaluating it to a value. Values are expressions that can't be further simplified/evaluated. e.g. Expression 3 + 4 evaluates to value 7.

In math, functions are stateless/pure, meaning, they always result in the same output given the same input. In programming languages, pure expressions are also stateless in the same sense, and they don't mutate any states. e.g. add(3, 4) always evaluates to 7 assuming add is a function for summing up two numbers in the usual sense.

On the contrary, a function has side-effects if it mutates some states. e.g. print("Hello World") will change the state of system output buffer, var x = 10 will update the memory represented by x to 10. Side effect makes our programs useful since it changes states thus interacting with the world.

When we say expression, we may refer to the building block of programs, or to an expression that is pure. When an expression is not pure, we call it statements. We evaluate an expression primarily for its value/meaning. We evaluate a statement primarily for its side effects.

Evaluation Order

Evaluation is to compute the meaning/value/result of a program. Note that the meaning of a program could be, but not limited to, a value of some computation.

There are different evaluation strategies.

  • Eager/strict evaluation: perform computation as early as possible. add(minus(4, 3), 6) => ... => add(1, 6) => 1 + 6 => 7
  • Lazy evaluation: delay computation until it is absolutely needed. add(minus(4, 3), 6) => minus(4, 3) + 6 => ... => 1 + 6 => 7

and others. Lazy evaluation makes it possible to represent infinite data since computation is delayed and performed on demand.

Python/Haskell Quick Start

Please read and try Appendix B and Appendix C from Lapets' notes. You can try them on http://glot.io.

Exercise

Try to implement find_min that finds the minimum number in an integer list.

Types

When analysing programs, there are many aspects. We previously mentioned that the value of a program could be the meaning. And we evaluate the program to get its meaning by examining its value. But from another point of view, the type of a program could be its meaning, too. We usually say that types are the specification of programs. It specifies, for instance, what type of value it will be evaluating to, given inputs of specified types. We do type inference to infer its meaning at type level, and we do type checking to make sure it is meaningful w.r.t. types.

Types have structures, and they are usually recursively defined. For example

T = int | bool | unit | T -> T 

which says, int, bool and unit are types (called base types, and if T is a type, then T -> T is a type (called function type), too. -> is called a type constructor that takes in two types as arguments, and return a function type.

Types, from a naive point of view, represents a set of expressions. Say, int represnets all integers, and expressions that evaluate to integers. int -> int represents all functions whose only argument is an integer and whose return type is integer.

Algebric Data Types

It is a kind of composite type, which means it is formed by combining other types. Two main classes of algebraic data type are product types and sum types.

Product Types

The values of this type typically contain several values/fields. e.g. tuple type, record type. It is called "product" because the set of values of a product type, is the set-theory product of all values of its fields. e.g. the set of values of this tuple (int, bool) is formed by computing Cartesian product of {all integers} * {all boolean values}.

Sum Types

The values of this type are typically grouped into different variants. Those possible values of a variants is defined by its corresponding constructor. It is called sum type because the values of a sum type is a union/sum of values of those variants.

Pattenr Matching

It is used to identify a value by its constructor or field name. It try to match a value of some data type with one of the varients of that data type.

Examples

data Tree = Empty
          | Leaf Int
          | Node Tree Tree

Here, we are defining a data type called Tree, and its possible values are splited into three different variants, which are defined by three constructors Empty, Leaf and Node. Empty takes no parameter and generates a value of type Tree. It carries no data. Leaf takes an integer, and returns a value of type Tree. It carries one piece of data of type integer. Node takes two values of type Tree itself, and return another value of type Tree. It takes two pieces of data of type Tree, and it is recursive.

When building a value of Tree, we are carring data by using constructors. We can later retrieve those data by destructing them, which is exactly the pattern matching mentioned in the lecture. By matching, we know which variant the value belongs to, therefore we know which constructor is used to build that value, and further more we know what data is carried by this constructor.

depth :: Tree -> Int
depth Empty = 0
depth (Leaf n) = 1
depth (Node l r) = 1 + max (depth l) (depth r)

And you can think of that as

    int depth (Tree data) {
    switch (data.constructor) {
        case Empty:
            return 0;
        case Leaf:
            int n = data.field1
            return 1;
        case Node:
            int l = data.field1
            int r = data.field2
            return 1 + max (depth (l), depth (r));
    }
}

Excerise

  1. Define a single linked list.
  2. Define a function that get the n-th element of that list.
  3. Define a function that reverse the list.
data IList = INil
           | ICons Int IList
           deriving Show

list_len :: IList -> Int
list_len INil        = 0
list_len (ICons _ l) = 1 + list_len l

list_get :: IList -> Int -> Int
list_get (ICons d _) 0 = d
list_get (ICons _ l) n = list_get l (n - 1)

list_reverse :: IList -> IList 
list_reverse l = list_reverse_do l INil
    where list_reverse_do INil l = l
          list_reverse_do (ICons a l) b = list_reverse_do l (ICons a b)
          
main = print (list_reverse (ICons 10 (ICons 11 (ICons 12 INil))))

Curry and Uncurry

Please read https://wiki.haskell.org/Currying

Formal Language

We use programming languages to write programs. We use grammars to describe programming languages. We use notations to describe grammars. We implement grammars as automata. We use automata to recognize programming languages.

  1. It has an alphabet, which is a finite set of symbols, and is usually denoted as \(\Sigma\).
  2. String is a finite sequence of symbols from the alphabet, including empty string \(\epsilon\).
  3. A formal language is a set of strings defined over an alphabet, including the empty set \(\emptyset\).

Formal Grammar

Formal grammar is a set of production rules which generate all the strings of a corresponding formal language.

Types of Grammars

Different grammars have different abilities of describing languages. According to Chomsky wiki, there are four types of grammars in descending order w.r.t. their abilities.

Type 0
Unrestricted grammars. This type of grammars generate recursively enumerable languages.
Type 1
Context-sensitive grammars. These grammars generate context-sensitive languages.
Type 2
Context-free grammars. These grammars generate context-free languages.
Type 3
Regular grammars. These grammars generate regular languages.

Note

Note that actually, people can add restrictions onto these four types of grammars, and use those subset grammars to generate subset languages. For example, there are some important subsets of context-free grammars, like LL(k) and LR(k) grammars. You don't need to learn it for now. Just get some sense of those terminologies and their relationship.

BNF: Backus Naur Form

BNF stands for Backus Naur Form, which is a notation technique to describe context-free grammars wiki.

As mentioned, those grammars correspond to different type of languages, and they use different notations to describe themselves. BNF is one of the notations that can describe context-free grammars.

<number> ::= <digit> | <digit> <number>
<digit>  ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' 

This can be explained line by line in English as follows:

  • A number consists of a digit, or alternatively, a digit followed by a number recursively.
  • And a digit consists of any single digit from 0 to 9.

This may not be a perfect definition for numbers, but you can get a sense.

BNF Syntax

  • Each group containing a ::= is a rule, where the LHS will be further expanded into RHS.
  • Those names on the LHS of ::= are rule names.
  • The vertical bar | can be read as "or alternatively" as used in the above explanation. It seperates different expansion alternatives of a single rule.
  • Those names that only appear in the RHS are terminals. And those names appear on LHS, or on both sides, are non-terminals

Extensions

BNF has some extentions, and they are generally for the sake of simplicity and succinctness. Please google EBNF and ABNF for ideas.

Here I want to present some commonly used notions.

  • + means repeating one or more times. e.g. <number> ::= <digit>+
  • * means repeating zero or more times. e.g. <number> ::= <digit> <digit>*
  • ? means repeating zero or one time, namely an optional part. e.g. <function_call> ::= <function_name> '(' <params>?')'
  • [] means the same as ?. e.g. <function_call> ::= <function_name> '(' [<params>] ')'
  • {} means repeating zero or more times, just as *. e.g. <id> ::= <letter> {<letter> | <digit>}
  • () means a group. e.g. <id> ::= <letter> (<letter> | <digit>)*

Regular Language and Regular Expression

Regular language is a formal language, regular expression (in formal language theory) is a way to describe regular grammar.

Regular Language

Recall that a language is essentially a set of strings.

  • The empty set is a regular language.
  • Every symbol of \(\Sigma \cup \{\epsilon\}\) is a regular language.
  • If \(L_1, L_2\) are regular languages, then
    • \(L_1 \cdot L_2 = \{xy \mid x \in L_1, y \in L_2\}\) is a regular language. It is formed by concatenate strings in both languages. Sometimes it is written as \(L_1L_2\).
    • \(L_1 \cup L_2\) is a regular language. It is simply the union of both languages.
    • \(L^*\) is a regular language. This is called the Kleene-Star, or Kleene closure. It is formed by concatenating any strings in \(L\) any times (including zero times). e.g. \(\{a,b\}^* = \{\epsilon, a, b, ab, aa, bb, abb, aab, aaa, baa, bba, bbb, ...\}\).
  • And there is no other regular languages.

Examples

Assume \(\Sigma=\{a, b\}\). \(\{\epsilon\},\emptyset, \{a\}, \{a, a\}, \{abaab, babba\}\) are regular languages. \(\{a^nb^n\mid n \in \mathbb{N}\}\) is not.

Regular Expression

  • \(\epsilon\) and \(\emptyset\) are regular expressions denoting \(\{\epsilon\}\) and \(\emptyset\) regular languages respectively.
  • Every symbol in \(\Sigma\) is a regular expression denoting the regular language containing only that symbol.
  • If \(r,s\) are regular expressions, then \((r),\quad rs,\quad r \mid s,\quad r^*\) are regular expressions. Sometimes, people write \(r\cdot s\) for \(rs\), and \(r+s\) for \(r\mid s\).
  • No other expressions are regular expressions.

Examples

\(ba^*, a(a|b)^*, (a|b)^*(aa|bb)(a|b)^*\)

There is another symbol, the Kleene plus as appeared in \((ab)^+\), which means repeating one or more times. In this case, it is the set \(\{ab, abab, ababab, \cdots\}\).

As mentioned, regular expression is only a way of describing regular grammar. And grammar is actually a set of production rules. So we can actually rewrite regular expressions into production rules. And we can borrow BNF notation for these production rules.

Say we have a regular expression 00[0-9]* (this is a coder's way of regexp, a math people would write \(00(0|1|2|3|4|5|6|7|8|9)^*\) instead), it can be written as

<term>  ::= '0' '0' <digit>*
<digit> ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' 

Continuation Passing Style

In short, to write programs in CPS, quoting Matt Might,

No procedure is allowed to return to its caller -- ever.

And this is made possible by

Procedures can take a callback to invoke upon their return value.

CPS makes several aspects explicit, as compared to direct style,

  1. control flow
  2. intermediate values
  3. order of evaluation
  4. and more

Please read

And example here,

Direct Style

def id(x):
    return x

def fact(x):
    if x == 0:
        return 1 
    else: 
        return x * fact(x-1)

CPS

def id(x, k):
    k(x)

def fact(x, k):
    if x == 0:
        k(x)
    else: 
        fact(x-1, lambda r: k(x * r))

fact(10, lambda x: print(x))

Exercise

Fold Fantasy

Let's get straight to the exercises.

  1. Implement myfoldl, myfoldr, that has type [a] -> b -> (a -> b -> b) -> b
  2. Implement mymap and myfilter based on myfoldr/myfoldl
  3. Implement mysplit of type [a] -> Int -> a -> [a] from the last homework, using myfoldr/myfoldl
  4. Implement myperm of type [a] -> [[a]] that generates the permutation of a list. You can assume that the elements of the input list are unique, no duplicates.